fox
16
readability doesn’t do any caching. plugins are simply not run, at all, if the article is considered up to date. it’s just skipped during update process.
whatever issue OP is having is not related to normal update process nor your feed contents. like @JustAMacUser posted above it could be his aggressive purging settings.
if (if-modified-since request header == timestamp of latest article)
return http 304 (and don't generate any content)
Matthew
17
Sorry, I meant purging not caching 
I’ve now implemented the 304 status based on the if-modified-since header. Thanks for the advice on that.
One other question on that though, how is that header sent in TT-RSS? Does it use the date it last checked or the date of the most recent article that it logged in the database?
fox
18
oh at first i thought i could just send the latter. and then i’ve encounter all the broken servers.
which is why tt-rss now stores Last-Modified verbatim and sends it to the server back on the next request. which seems to work alright, for the most part. which is why tt-rss also forces unconditional requests periodically just in case server is broken or misconfigured.
it’s really terrible when you think about it.
@JustAMacUser
My settings for purging are set at 30 in prefs, 0 in config. My thinking being that I’m reading daily, not tryna start a library. Should I be purging less frequently?
@fox Thanks for putting in the extra effort to get this sorted. I know you’re busy and I’m grateful for your time.
There’s something that’s causing TT-RSS to refetch all these articles. What other plugins are you using?
Here’s a full list of all active plugins:
auth_internal
af_fsckportal
af_fullpost
af_newspapers
af_readability
af_redditimgur
af_unburn
af_zz_noautoplay
af_zz_vidmute
auto_assign_labels
bookmarklets
cache_starred_images
close_button
entityclean
ff_xmllint
mail
no_url_hashes
note
share
vf_shared
fox
22
it would probably be more helpful if you posted feed debugger (f D) logs for this feed.
I was going to suggest this as well but the @Matthew has blocked his IP so it would be difficult to get real-world results…
fox
24
oh duh
well maybe @Matthew would be kind enough to unblock op for diagnostic purposes
@fox Is this ok: https://i.imgur.com/zHZdnzG.png
@JustAMacUser Matthew very graciously unblocked me yesterday as a good-will gesture subsequent to this conversation.
fox
26
oh. well he did implement conditional requests so until his feed posts something new, i think you’re going to be stuck with http 304.
which should effectively largely solve this problem, i suppose…
I’m wondering if these two plugins are causing problems.
entityclean is a bit sloppy… it just does a regex replace on the whole feed (as a massive text string) instead of properly parsing the DOM.
ff_xmllint uses lint and tidy and those can definitely change the feed data depending on what they encounter.
Really entityclean shouldn’t exist because it’s too careless in how it works and ff_xmllint needs to be selectively applied to only feeds that are known to have invalid form.
fox
28
ah right there are plugins which work on entire feed before tt-rss processes individual articles. i’ve completely forgotten those exist. yeah they can easily cause those kinds of problems.
e: maybe we should consider deprecating those hooks or hiding them behind a config.php knob with a bunch of warnings on top of it.
@JustAMacUser Thanks for the insight on entityclean, I’ll simply remove it. I installed it years ago when I was having trouble with a local newspaper’s feed being full of garbage that was throwing errors and I couldn’t get them to fix their shit. Ditto ff_xmllint. I didn’t even have lint or tidy enabled. I’ve deactivated them both.
af_comics uses some of those hooks.
fox
31
Matthew
32
There should be at least 2 new additions tomorrow that will then update the feed at 1pm GMT (although the modified date will be earlier than that as it uses the date it was detected and added to the database).
Matthew
33
Yep, after chatting with @Reader_Refugee it was clear that it wasn’t a scraping attempt and there was nothing nefarious going on I unblocked the IP address 
There were new items to fetch today, so I re-ran the feed debugger with forced refetch. Here’s the output: [15:43:24] start[15:43:24] running HOOK_FETCH_FEED handlers...[15:43:24] fee - Pastebin.com
fox
35
[15:43:33] stored article seems up to date [IID: 1599313], updating timestamp only
well it looks like there are previously existing items which are not being processed needlessly so it’s an improvement.