[Plugin suggestion] Similar articles as read, but for MySQL

Hi Fox,

I know that there is af_psql_trgm, which marks similar articles as read, but it works only on PostgreSQL.

I was wondering if would it ever be possible to create the same for the 1% who are bound to MySQL and not tech savvy enough to move.

Thanks in advance.

Regards

i welcome someone else to do it if mysql has trigram support of some kind.

however, my policy at this point is no new tt-rss-related code specific to mysql. it’s in maintenance mode, everyone should switch to postgres, if they want new stuff.

Thanks Fox. I appreciate your feedback.

make sure the articles appear when the timeout elapses, this is the part i haven’t checked at all.

it’s more of a you-specifically-friendly mode, that is i’ve been going through my backlog and noticed that you’ve asked for this.

anyway, it seems that there should be some kind of trigram search in mariadb: text - Is there a Trigram functionality like pg_trgm (PostgreSQL) for MySQL? - Stack Overflow so it might be possible.

I marked as read all the articles in the feeds on which the plugin is enabled.
The delay is set up at 72 hours. If something pop up before the next 3 days I will update this post.

thank you very much for looking into it.

something is working at least.

main problem is that there’s no defined 0 … 1 scale on the returned result, score value is all over the place. sometimes it’s 2.something for an exact match, and sometimes it’s something else.

e: it seems to be predictable enough though so setting a threshold shouldn’t be impossible.

update: i have everything working, but the score is really all over the place. if there’s no way to get it to some kind of known range, i suspect actually using this plugin would be impossible.

note returned score values:

the whole thing does a good enough job - most of the time - to find out stuff that’s similar to each other, so the popup dialog part of the plugin is functional, but setting a hard cutoff value here which would mean “okay, mark this as read” seems impossible at first glance - an exact string match may return a score 32 or 2 or 8 or whatever, a partial match is also unpredictable.

maybe there’s a way to normalize the score but i was unsuccessful in googling it.

well, if nothing else, a fast fulltext search could now be added for mysql.

e: it seems to be working properly ootb on non-english strings though, which is nice:


https://dev.mysql.com/doc/refman/5.7/en/fulltext-natural-language.html

Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.

relevant commits:

https://dev.tt-rss.org/fox/tt-rss/commit/50f014e52d0f90c71a95576c2414804eeaef131d
https://dev.tt-rss.org/fox/tt-rss/commit/9428e2c571a38aed36537d072858d245907341e7
https://dev.tt-rss.org/fox/tt-rss/commit/0bb72fbb266c8cd5c6ae570c1a2cdb7f8fb79c68

MySQL doesn’t appear to support the if not exists syntax so upgrading on mysql as opposed to mariadb fails

Error MySQL Server v8.0.28:

Performing updates to version 147

[10:36:18/139415] Starting migration to 147... [10:36:18/139415] Failed on line: create fulltext index if not exists ttrss_entries_title_search_idx on ttrss_entries(title) [10:36:18/139415] Migration failed: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'if not exists ttrss_entries_title_search_idx on ttrss_entries(title)' at line 1

One of migrations failed. Either retry the process or perform updates manually.

migrations are applied once so luckily “if not exists” is not needed. can you just edit this out and see if the rest applies properly? then i’ll try to fix this in trunk later today.

p.s. so now there’s two different incompatible variants of this shitty database server, amazing.

Appears to be working fine now after removing “if not exists”, although the process did time out on my cheap hosting the index was created, applied the update manually and used dummy schema file to bump the version, so something to be aware of…

thanks for testing this.

I upgraded yesterday and so far everything seems fine.

Plugin is working on 5 feeds that usually had some very similar content.
What I noticed is that the number of “surviving” articles is now very low. My feeling is that it is working too aggressivly, so I increased the threshold (to 1,5 right now) and will see what happens.

Thank you very very much

i’m glad it worked for you, although i still have my concerns over scoring being so inconsistent, even with the ad-hoc normalization i’ve added. we’ll see how this works out, i guess. :thinking:

btw the way this works you don’t enable this on all feeds with similar content, one should be left as a primary source. otherwise they’d just keep marking each other. i vaguely remember explaining this at some point in the original trgm thread.

fix: schema: don't use 'create index if not exists' syntax because mysql doesn't support it · b148d2f515 - tt-rss - Tiny Tiny RSS

I have the same issue when upgrading from 146 to 147.

– Edited after fox’s recommendation —

It is a pity there is no if not exist on mysql.

you’ve potentially broke search for yourself. please don’t post bad advice other clueless people are likely to follow.

what you should’ve done is waited for a fix, which took what, an hour?