Tiny Tiny RSS: Community

Readability got me banned from a server?

readability doesn’t do any caching. plugins are simply not run, at all, if the article is considered up to date. it’s just skipped during update process.

whatever issue OP is having is not related to normal update process nor your feed contents. like @JustAMacUser posted above it could be his aggressive purging settings.

if (if-modified-since request header == timestamp of latest article)
     return http 304 (and don't generate any content)

Sorry, I meant purging not caching :slight_smile:

I’ve now implemented the 304 status based on the if-modified-since header. Thanks for the advice on that.

One other question on that though, how is that header sent in TT-RSS? Does it use the date it last checked or the date of the most recent article that it logged in the database?

oh at first i thought i could just send the latter. and then i’ve encounter all the broken servers.

which is why tt-rss now stores Last-Modified verbatim and sends it to the server back on the next request. which seems to work alright, for the most part. which is why tt-rss also forces unconditional requests periodically just in case server is broken or misconfigured.

it’s really terrible when you think about it.

@JustAMacUser

My settings for purging are set at 30 in prefs, 0 in config. My thinking being that I’m reading daily, not tryna start a library. Should I be purging less frequently?

@fox Thanks for putting in the extra effort to get this sorted. I know you’re busy and I’m grateful for your time.

There’s something that’s causing TT-RSS to refetch all these articles. What other plugins are you using?

Here’s a full list of all active plugins:

auth_internal

af_fsckportal

af_fullpost

af_newspapers

af_readability

af_redditimgur

af_unburn

af_zz_noautoplay

af_zz_vidmute

auto_assign_labels

bookmarklets

cache_starred_images

close_button

entityclean

ff_xmllint

mail

no_url_hashes

note

share

vf_shared

it would probably be more helpful if you posted feed debugger (f D) logs for this feed.

I was going to suggest this as well but the @Matthew has blocked his IP so it would be difficult to get real-world results…

oh duh

well maybe @Matthew would be kind enough to unblock op for diagnostic purposes

@fox Is this ok: https://i.imgur.com/zHZdnzG.png

@JustAMacUser Matthew very graciously unblocked me yesterday as a good-will gesture subsequent to this conversation.

oh. well he did implement conditional requests so until his feed posts something new, i think you’re going to be stuck with http 304.

which should effectively largely solve this problem, i suppose…

I’m wondering if these two plugins are causing problems.

entityclean is a bit sloppy… it just does a regex replace on the whole feed (as a massive text string) instead of properly parsing the DOM.

ff_xmllint uses lint and tidy and those can definitely change the feed data depending on what they encounter.

Really entityclean shouldn’t exist because it’s too careless in how it works and ff_xmllint needs to be selectively applied to only feeds that are known to have invalid form.

ah right there are plugins which work on entire feed before tt-rss processes individual articles. i’ve completely forgotten those exist. yeah they can easily cause those kinds of problems.

e: maybe we should consider deprecating those hooks or hiding them behind a config.php knob with a bunch of warnings on top of it.

@JustAMacUser Thanks for the insight on entityclean, I’ll simply remove it. I installed it years ago when I was having trouble with a local newspaper’s feed being full of garbage that was throwing errors and I couldn’t get them to fix their shit. Ditto ff_xmllint. I didn’t even have lint or tidy enabled. I’ve deactivated them both.

af_comics uses some of those hooks.

let’s continue hook discussion here - https://community.tt-rss.org/t/troublesome-hooks-or-not/2890

There should be at least 2 new additions tomorrow that will then update the feed at 1pm GMT (although the modified date will be earlier than that as it uses the date it was detected and added to the database).

Yep, after chatting with @Reader_Refugee it was clear that it wasn’t a scraping attempt and there was nothing nefarious going on I unblocked the IP address :smiley:

There were new items to fetch today, so I re-ran the feed debugger with forced refetch. Here’s the output: https://pastebin.com/AgFpFQfM

[15:43:33] stored article seems up to date [IID: 1599313], updating timestamp only

well it looks like there are previously existing items which are not being processed needlessly so it’s an improvement.