i welcome someone else to do it if mysql has trigram support of some kind.
however, my policy at this point is no new tt-rss-related code specific to mysql. it’s in maintenance mode, everyone should switch to postgres, if they want new stuff.
I marked as read all the articles in the feeds on which the plugin is enabled.
The delay is set up at 72 hours. If something pop up before the next 3 days I will update this post.
main problem is that there’s no defined 0 … 1 scale on the returned result, score value is all over the place. sometimes it’s 2.something for an exact match, and sometimes it’s something else.
e: it seems to be predictable enough though so setting a threshold shouldn’t be impossible.
update: i have everything working, but the score is really all over the place. if there’s no way to get it to some kind of known range, i suspect actually using this plugin would be impossible.
the whole thing does a good enough job - most of the time - to find out stuff that’s similar to each other, so the popup dialog part of the plugin is functional, but setting a hard cutoff value here which would mean “okay, mark this as read” seems impossible at first glance - an exact string match may return a score 32 or 2 or 8 or whatever, a partial match is also unpredictable.
maybe there’s a way to normalize the score but i was unsuccessful in googling it.
well, if nothing else, a fast fulltext search could now be added for mysql.
e: it seems to be working properly ootb on non-english strings though, which is nice:
Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.
MySQL doesn’t appear to support the if not exists syntax so upgrading on mysql as opposed to mariadb fails
Error MySQL Server v8.0.28:
Performing updates to version 147
[10:36:18/139415] Starting migration to 147... [10:36:18/139415] Failed on line: create fulltext index if not exists ttrss_entries_title_search_idx on ttrss_entries(title) [10:36:18/139415] Migration failed: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'if not exists ttrss_entries_title_search_idx on ttrss_entries(title)' at line 1
One of migrations failed. Either retry the process or perform updates manually.
migrations are applied once so luckily “if not exists” is not needed. can you just edit this out and see if the rest applies properly? then i’ll try to fix this in trunk later today.
p.s. so now there’s two different incompatible variants of this shitty database server, amazing.
Appears to be working fine now after removing “if not exists”, although the process did time out on my cheap hosting the index was created, applied the update manually and used dummy schema file to bump the version, so something to be aware of…
I upgraded yesterday and so far everything seems fine.
Plugin is working on 5 feeds that usually had some very similar content.
What I noticed is that the number of “surviving” articles is now very low. My feeling is that it is working too aggressivly, so I increased the threshold (to 1,5 right now) and will see what happens.
i’m glad it worked for you, although i still have my concerns over scoring being so inconsistent, even with the ad-hoc normalization i’ve added. we’ll see how this works out, i guess.
btw the way this works you don’t enable this on all feeds with similar content, one should be left as a primary source. otherwise they’d just keep marking each other. i vaguely remember explaining this at some point in the original trgm thread.