Is there a way to bump the number of old entries downloaded when first subscribed?

thebearinboulder · July 6, 2019, 5:14am

I noticed that the last 20 or so entries of an RSS feed are loaded when I subscribe to a feed. Is it possible to bump that number?

I know that most people think of RSS and TT-RSS as a way to see new articles and if anything want to limit the number of them but I also see it as a way to index dozens of curated technical blogs that are only updated a few times a week at most. If I have a question there’s too many blogs to search individually but it would be a huge win if I had them in one place to search.

I did a quick scan of the code but I’m not a python person - I didn’t see where the limit is set, much less how it could be modified.

So far this has been mostly conceptual - I’ve been running TT-RSS intermittently for a long time but the database schema changes frequently enough that it’s clear the best approach is to either quickly feed the articles to a separate process or have a way to build it on demand.

fox · July 6, 2019, 5:35am

yes you need to use an invocation of Moloch and then missing entries suddenly appear in the XML, as if by magic.

have you tried thinking before you post?

so, no.

JustAMacUser · July 6, 2019, 7:19am

Well, for starters, TT-RSS is written in PHP not Python; that might be why you can’t find what you’re looking for.

Also, no. How would it do this? Content providers set the limit for what appears in their site’s feeds and TT-RSS has no way to change that.

And TT-RSS isn’t designed to be a compendium of human knowledge… This is why search engines exist… Especially for tech stuff, which goes out of date rather quickly. If you want to keep a repository of information I’d suggest setting up a self-hosted wiki, that’s probably a better tool for the job.

thebearinboulder · July 6, 2019, 3:05pm

Okay, some scripted language that starts with a P… Python’s on my mind since we use it at work and on the few times I’ve needed to figure it out it’s always an adventure. Why can’t everything be written in Java, or at least C?

I knew that RSS information is paged - I guess I assumed that the libraries TT-RSS called could follow the links but hit an arbitrary counter since most people will only be interested in the most recent entries. I guess it does make more sense that it just reads the first page of results.

Finally, of course I know that it isn’t designed as a search engine… but there’s so much cruft and outdated information on those compendiums that I prefer to start with the dozens of technical blogs written by people who work on these projects or are otherwise knowledgeable about what I’m searching for. I keep notes in the work and/or personal instances of Confluence as I needed but it would be a full-time job to update the documentation every time someone posted.

Hmm… maybe the solution is the one I hoped to avoid - create my own with an existing RSS reader and then populating the TT-RSS database.

m0zes · July 6, 2019, 3:34pm

You have a fundamental misunderstanding here. RSS is not paginated. The reader grabs all of the data in the feed everytime the url is checked. The only way to get data is if it is in the xml itself. If the reader doesn’t check the url often enough, that information is lost forever.

If there are 20 items in the feed, it is because that is all the data the provider is sending and all you will be able to get until new items show up. Then the provider will often remove old ones.

thebearinboulder · July 6, 2019, 4:32pm

Ah, I had invoked Baal.

The fact that I didn’t see a limit doesn’t mean it doesn’t exist somewhere, e.g., as a value stored in the database as a per-feed value. The other answer made it clear that there’s not an artificial limitation like article count or date.

Athanasius · July 6, 2019, 4:40pm

Except that often is the limitation, just set by the site serving the RSS feed, rather than by any reader like tt-rss.

I run a game related feed which includes the last 28 days worth of relevant posts for instance, but that could as easily be “the last 100”.

thebearinboulder · July 6, 2019, 4:42pm

Yeah, about 30 minutes after I posted that I realized that I was probably thinking about a different type of feed. I normally work with messaging systems (JMS, Kafka, etc.) that have both queues and topics and some (not all) offer a way to retrieve previously sent information.

Sigh. At least many (most?) of the sites I’m interested in use wordpress so it may be straightforward to write a scrapper. I could then do a combination of creating my own RSS feeds/populating the database directly for the UI side and writing it to a second database with better search functionality.