Af_psql_trgm similarity matches show in the UI, but don't auto-mark-as-read as far as I can tell

scumola · September 20, 2023, 4:54pm

[X] I’m using stock docker compose setup, unmodified.
[ ] I’m using docker compose setup, with modifications (modified .yml files, third party plugins/themes, etc.) - if so, describe your modifications in your post. Before reporting, see if your issue can be reproduced on the unmodified setup.
[ ] I’m not using docker on my primary instance, but my issue can be reproduced on the aforementioned docker setup and/or official demo.

According to the UI, the similarity ratings are showing up, so the PG plugin seems to be working

I have my preferences set up properly for the similarity plugin (pretty sure) see the first followup comment.

However, when I go through my articles, I’m not seeing the numbers in “all articles” drop by more than one when I’m going through articles that are flagged as > my “minimum similarity” and have title lengths > the “minimum title length”.

I looked through the plugin php code and it’s supposed to be logging when a similar article is being marked as read:
https://gitlab.tt-rss.org/tt-rss/tt-rss/-/blob/master/plugins/af_psql_trgm/init.php?ref_type=heads#L363

I’m not sure how to enable debugging with the docker stack to search for these log messages to do any more diagnosis on my own unfortunately.

Refreshing the feed (F,R) or re-loading the browser page completely still keeps the total number of “all articles” the same, so I’m guessing that these similar articles aren’t getting marked as read.

Steve

p.s. Thanks for the great tool!

scumola · September 20, 2023, 4:55pm

The prefs UI

fox · September 20, 2023, 6:55pm

you should be able to see this log output in the feed debugger (f D), check rehash checkbox so that all articles in the feed are processed again.

i’m not sure i understand this. why would they “drop more than one”? what do you mean?

scumola · September 20, 2023, 7:06pm

The plugin is described as: Marks similar articles as read (requires pg_trgm)

So when I mark an article as read, the total number of unread articles (in the box by “all articles”) drops by one (normal behavior). If the plugin is working and marking some of the other articles as read in the background (the ones that are similar to the current article), then I’m assuming that I should see the total number of unread articles drop by more than one. Does that make sense, or perhaps I’m mis-understanding how the plugin works?

example:

I read an article that doesn’t have any similar articles: “all articles” unread count drops by one.
I read an article that has two similar articles: “all articles” unread count should drop by 3 (one for the original, two for the two duplicates being marked as read)

My understanding is: If I read an article about Apple’s new iPhone once, all of the other “Apple unveiled a new iPhone” articles that match the similarity values, should also be marked as read and I won’t see them (I only want to read about the Apple iPhone once, not 20 times. )

I’ll check the log output. Thanks for the help.

scumola · September 20, 2023, 7:15pm

Found the debug output which says that it’s marking an article as read, but it looks like it’s just marking the origin article, not the similar ones:

19:11:07/178 guid 2,https://yro.slashdot.org/story/23/09/20/1559201/john-grisham-george-rr-martin-other-top-us-authors-sue-openai-over-copyrights?utm_source=rss1.0mainlinkanon&utm_medium=feed (hash: {"ver":2,"uid":2,"hash":"SHA1:73d15d305ae7bc167f39c1b1a60e1e03e8d117ad"} compat: SHA1:a211b54fdf4646f1246b796e9317b12a772e377e)
19:11:07/178 orig date: 1695225600 (2023-09-20 16:00:00)
19:11:07/178 title John Grisham, George RR Martin, Other Top US Authors Sue OpenAI Over Copyrights
19:11:07/178 link https://yro.slashdot.org/story/23/09/20/1559201/john-grisham-george-rr-martin-other-top-us-authors-sue-openai-over-copyrights?utm_source=rss1.0mainlinkanon&utm_medium=feed
19:11:07/178 language en
19:11:07/178 author msmash
19:11:07/178 looking for tags...
19:11:07/178 tags found: ai
19:11:07/178 done collecting data.
19:11:07/178 looking for enclosures...
19:11:07/178 article hash: caea2e84832522d6c75f2da5731df0f4df894b29 [stored=36aa19cb3971a4d9995f203dc184fbc0422c3f78]
19:11:07/178 hash differs, running HOOK_ARTICLE_FILTER handlers...
19:11:07/178 af_psql_trgm: similarity result for John Grisham, George RR Martin, Other Top US Authors Sue OpenAI Over Copyrights: 0.5
19:11:07/178 af_psql_trgm: marking article as read (0.5 >= 0.25)
19:11:07/178 === 0.0104 (sec) Af_Psql_Trgm
19:11:07/178 === 0.0110 (sec) Auto_Assign_Labels
19:11:07/178 plugin data: af_psql_trgm,auto_assign_labels,
19:11:07/178 matched filters: 
19:11:07/178 matched filter rules: 
19:11:07/178 filter actions: 
19:11:07/178 date: 1695225600 (2023/09/20 16:00:00)
19:11:07/178 num_comments: 63
19:11:07/178 article labels:
19:11:07/178 force catchup: 1
19:11:07/178 base guid found, checking for user record
19:11:07/178 initial score: 0 [including plugin modifier: 0]
19:11:07/178 user record FOUND: RID: 5481, IID: 5481
19:11:07/178 resulting RID: 5481, IID: 5481
19:11:07/178 article updated, but we're forbidden to mark it unread.
19:11:07/178 assigning labels [other]...
19:11:07/178 assigning labels [filters]...
19:11:07/178 article enclosures:
Array
(
)
19:11:07/178 resulting article tags: ai
19:11:07/178 article processed.

Shouldn’t I be seeing:

af_psql_trgm: Article: Different title from the one that’s being logged now - similarity value > minimum threshold, marking as read
af_psql_trgm: Article: Different title from the one that’s being logged now - similarity value > minimum threshold, marking as read
etc…
?

fox · September 20, 2023, 8:02pm

as far as i remember, the plugin has no concept of ‘origin’ articles: it works on the feed it’s enabled on and marks articles as read if they have necessary similarity score to other articles in the database.

the idea is that you don’t enable it for the feed which is the authoritative source on whatever you’re interested in.

scumola · September 20, 2023, 10:18pm

Does the function that marks similar articles as read happen at read-time, or when polling the RSS feed, or when? For example, the feed debugging log - is that happening when the user is reading an article, or when the feed gets updated?

fox · September 21, 2023, 4:17am

it happens on feed update.