@Matthew,

I’ve update your account, you should be able to post links.

TT-RSS uses the guid tag to uniquely identify a feed item, or the link tag if guid is missing. It absolutely does check this each time the feed is fetched for the express purpose of knowing whether it has already added each article to its database.

I took a look at the feed you posted above and everything seems in order. There should be no reason for TT-RSS hitting your site and fetching the same articles over and over again.

I agree that’s a little ridiculous for a single user.

@Reader_Refugee,

What do you have your purge articles setting at? Please check the individual feed, global setting, and your config.php file. If the purge article setting is too low TT-RSS will drop articles out of the database only to fetch them again later.

Also, do you have any other plugins enabled? If so, which ones?

only new items, of course. unless the previous articles also were modified somehow during this update OR their guid (in your feeds’ case, link) has changed.

so far i’m not seeing anything wrong with your feed.

one minor issue is not returning HTTP Last-Modified which prevents conditional requests (HTTP If-Modified-Since) from working, you might want to look into that if you want to minimize RSS reader traffic.

Do you mean the RSS feed or the article pages themselves?

RSS feed, of course.

Cheers - now added :slight_smile:

yep, last-modified is there now but you’re not returning HTTP 304 not modified on a conditional request:

[13:13:41/19351] last unconditional update request: 2019-11-13 13:13:36
[13:13:41/19351] not using CURL due to open_basedir restrictions
[13:13:41/19351] stored last modified for conditional request: Tue, 12 Nov 2019 22:07:09 GMT
[13:13:41/19351] fetching [https://usa.newonnetflix.info/feed] (force_refetch: )...
[13:13:41/19351] fetch done.
[13:13:41/19351] source last modified: Tue, 12 Nov 2019 22:07:09 GMT

the transcript should look like this instead:

[13:14:38/20146] start
[13:14:38/20146] local cache will not be used for this feed
[13:14:38/20146] last unconditional update request: 2019-11-13 07:57:56
[13:14:38/20146] not using CURL due to open_basedir restrictions
[13:14:38/20146] stored last modified for conditional request: Thu, 15 Aug 2019 11:07:30 GMT
[13:14:38/20146] fetching [https://fakecake.org/testfeeds/random.xml] (force_refetch: )...
[13:14:38/20146] fetch done.
[13:14:38/20146] source last modified: Thu, 15 Aug 2019 11:07:30 GMT
[13:14:38/20146] unable to fetch: HTTP/1.1 304 Not Modified [304]
[13:14:38/20146] source claims data not modified, nothing to do.

by the way, i’ve just run your feed manually again and there was one new (or updated) item:

[13:11:52/18280] start
[13:11:53/18280] local cache will not be used for this feed
...
[13:11:53/18280] processing articles...
[13:11:53/18280] guid 2,https://usa.newonnetflix.info/info/81034946 / SHA1:429215afba4703cb273af127ef31d3b8f95be216
[13:11:53/18280] orig date: 1573596429
[13:11:53/18280] title 13th Nov: Maradona in Mexico (2020), Limited Series [TV-MA] (6/10)
[13:11:54/18280] link https://usa.newonnetflix.info/info/81034946
[13:11:54/18280] language en
[13:11:54/18280] author 
[13:11:54/18280] looking for tags...
[13:11:54/18280] tags found: 
[13:11:54/18280] done collecting data.
[13:11:54/18280] article hash: 22c03ba0648e517b7778b90d1a4c035558a0fb30 [stored=]
[13:11:54/18280] hash differs, applying plugin filters:
[13:11:54/18280] ... Af_Comics
[13:11:54/18280] === 0.0000 (sec)
[13:11:54/18280] ... Af_GoodShowSir
[13:11:54/18280] === 0.0003 (sec)
[13:11:54/18280] ... Af_Psql_Trgm
[13:11:54/18280] === 0.0011 (sec)
[13:11:54/18280] ... Af_Readability
[13:11:54/18280] === 0.0000 (sec)
[13:11:54/18280] ... Af_RedditImgur
[13:11:54/18280] === 0.0000 (sec)
[13:11:54/18280] ... Af_Tumblr_1280
[13:11:54/18280] === 0.0000 (sec)
[13:11:54/18280] ... Auto_Assign_Labels
[13:11:54/18280] === 0.0016 (sec)
[13:11:54/18280] ... Af_Img_Phash
[13:11:54/18280] === 0.0000 (sec)
[13:11:54/18280] ... Af_Video_Fill_Poster
[13:11:54/18280] === 0.0002 (sec)
[13:11:54/18280] plugin data: af_comics,af_goodshowsir,af_psql_trgm,af_readability,af_redditimgur,af_tumblr_1280,auto_assign_labels,af_img_phash,af_video_fill_poster,
[13:11:54/18280] date 1573596429 [2019/11/12 22:07:09]
[13:11:54/18280] num_comments: 0
[13:11:54/18280] force catchup: 
[13:11:54/18280] base guid [2,https://usa.newonnetflix.info/info/81034946 or SHA1:429215afba4703cb273af127ef31d3b8f95be216] not found, creating...
[13:11:54/18280] base guid found, checking for user record
[13:11:54/18280] initial score: 0 [including plugin modifier: 0]
[13:11:54/18280] user record not found, creating...
[13:11:54/18280] resulting RID: 9952107, IID: 6596795
[13:11:54/18280] article updated, but we're forbidden to mark it unread.
[13:11:54/18280] assigning labels [other]...
[13:11:54/18280] assigning labels [filters]...
[13:11:54/18280] looking for enclosures...
[13:11:54/18280] article processed
[13:11:54/18280] guid 2,https://usa.newonnetflix.info/info/81078466 / SHA1:8c2c97526f4c5cd17b27b6ef9d1a1a5b398cdcd8
[13:11:54/18280] orig date: 1573520829
[13:11:54/18280] title 12th Nov: Jeff Garlin: Our Man In Chicago (2019), 58m [TV-MA] (6/10)
[13:11:54/18280] link https://usa.newonnetflix.info/info/81078466
[13:11:54/18280] language en
[13:11:54/18280] author 
[13:11:54/18280] looking for tags...
[13:11:54/18280] tags found: 
[13:11:54/18280] done collecting data.
[13:11:54/18280] article hash: ef23215fddee52f6045b878eab54ed7627c4c435 [stored=ef23215fddee52f6045b878eab54ed7627c4c435]
[13:11:54/18280] stored article seems up to date [IID: 9951104], updating timestamp only
[13:11:54/18280] guid 2,https://usa.newonnetflix.info/info/80178941 / SHA1:c307fd8089b0314282140a649d57caeed532ba2d
[13:11:54/18280] orig date: 1573506551
[13:11:54/18280] title 12th Nov: Harvey Girls Forever! (2019), 3 Seasons [TV-Y7] - New Episodes (6.35/10)
[13:11:54/18280] link https://usa.newonnetflix.info/info/80178941
[13:11:54/18280] language en
[13:11:54/18280] author 
[13:11:54/18280] looking for tags...
[13:11:54/18280] tags found: 
[13:11:54/18280] done collecting data.
[13:11:54/18280] article hash: db7a7edf7f33cc4b075356b9b05c3b70209ed633 [stored=db7a7edf7f33cc4b075356b9b05c3b70209ed633]
[13:11:54/18280] stored article seems up to date [IID: 9951105], updating timestamp only
[13:11:54/18280] guid 2,https://usa.newonnetflix.info/info/81070963 / SHA1:1ecac424811f3af42cba53aa2c62fe69f68b005f
[13:11:54/18280] orig date: 1573456151
[13:11:54/18280] title 11th Nov: Chief of Staff (2019), 2 Seasons [TV-14] - New Episodes (6.95/10)
[13:11:54/18280] link https://usa.newonnetflix.info/info/81070963
[13:11:54/18280] language en
[13:11:54/18280] author 
[13:11:54/18280] looking for tags...
[13:11:54/18280] tags found: 
[13:11:54/18280] done collecting data.
[13:11:54/18280] article hash: d64547edb5e3011415f788cc6b18d8aed9110ebb [stored=d64547edb5e3011415f788cc6b18d8aed9110ebb]
[13:11:54/18280] stored article seems up to date [IID: 9951106], updating timestamp only
...

as you can see, plugins like Readability are only applied to the new item, everything else is skipped. so it looks like your feed is working properly.

Hmm, I’ll have to look into that as I’m not sure, off-hand, how I would do that programatically when the feed is generated.

Yes, one new addition today.

I’m guessing then it could be down to settings in Readability - the caching that was mentioned earlier int he thread.

Thanks for the info and checking things.

readability doesn’t do any caching. plugins are simply not run, at all, if the article is considered up to date. it’s just skipped during update process.

whatever issue OP is having is not related to normal update process nor your feed contents. like @JustAMacUser posted above it could be his aggressive purging settings.

if (if-modified-since request header == timestamp of latest article)
     return http 304 (and don't generate any content)

Sorry, I meant purging not caching :slight_smile:

I’ve now implemented the 304 status based on the if-modified-since header. Thanks for the advice on that.

One other question on that though, how is that header sent in TT-RSS? Does it use the date it last checked or the date of the most recent article that it logged in the database?

oh at first i thought i could just send the latter. and then i’ve encounter all the broken servers.

which is why tt-rss now stores Last-Modified verbatim and sends it to the server back on the next request. which seems to work alright, for the most part. which is why tt-rss also forces unconditional requests periodically just in case server is broken or misconfigured.

it’s really terrible when you think about it.

@JustAMacUser

My settings for purging are set at 30 in prefs, 0 in config. My thinking being that I’m reading daily, not tryna start a library. Should I be purging less frequently?

@fox Thanks for putting in the extra effort to get this sorted. I know you’re busy and I’m grateful for your time.

There’s something that’s causing TT-RSS to refetch all these articles. What other plugins are you using?

Here’s a full list of all active plugins:

auth_internal

af_fsckportal

af_fullpost

af_newspapers

af_readability

af_redditimgur

af_unburn

af_zz_noautoplay

af_zz_vidmute

auto_assign_labels

bookmarklets

cache_starred_images

close_button

entityclean

ff_xmllint

mail

no_url_hashes

note

share

vf_shared

it would probably be more helpful if you posted feed debugger (f D) logs for this feed.

I was going to suggest this as well but the @Matthew has blocked his IP so it would be difficult to get real-world results…

oh duh

well maybe @Matthew would be kind enough to unblock op for diagnostic purposes

@fox Is this ok: https://i.imgur.com/zHZdnzG.png

@JustAMacUser Matthew very graciously unblocked me yesterday as a good-will gesture subsequent to this conversation.

oh. well he did implement conditional requests so until his feed posts something new, i think you’re going to be stuck with http 304.

which should effectively largely solve this problem, i suppose…

I’m wondering if these two plugins are causing problems.

entityclean is a bit sloppy… it just does a regex replace on the whole feed (as a massive text string) instead of properly parsing the DOM.

ff_xmllint uses lint and tidy and those can definitely change the feed data depending on what they encounter.

Really entityclean shouldn’t exist because it’s too careless in how it works and ff_xmllint needs to be selectively applied to only feeds that are known to have invalid form.

ah right there are plugins which work on entire feed before tt-rss processes individual articles. i’ve completely forgotten those exist. yeah they can easily cause those kinds of problems.

e: maybe we should consider deprecating those hooks or hiding them behind a config.php knob with a bunch of warnings on top of it.