But for that “all articles” queries the DISTINCT across all those columns really killed performance. Execution time drops from ~2000ms to 1ms for LIMIT 30 and to a few ms for LIMIT 1000.

yeah it makes total sense but well i guess nobody noticed.

after some extensive reading of many boring articles i’m not really seeing any issues with DISTINCT disabled so i’m going to push its removal to trunk.

i think in this particular case performance gained is worth the change regardless of possible issues which are unlikely, i think, given that user entries and entries tables are in a 1-to-1 relationship now. if something is discovered we’ll see how to work around it.

i’m not going to change anything with catchup_feed() because of what is said above and because catching up is not really a big deal performance-wise.

I have seen dupes when viewing labels. I added “test” to 2 articles, each in different categories. Then I clicked the “test” link to view all articles w/ the label.

I added the label to articles:

  • Is Microsoft’s ‘Find Your Joy’ Holiday Ad Sad? - Slashdot; and
  • Tommy Kraft - Visual Effect Artist

The “Friday January 31, 2014” Lio comic already had a “test” label from its rss feed. You can see in the screenshot below that Tommy Kraft’s video is duplicated. I have other labels and I see some dupes there too; this is how I reproduced it.

labels don’t come from the feed itself, unless you have the auto assign plugin enabled. maybe you mean tags?

also, i’m not sure if its possible to label something twice, at least i couldn’t. :slight_smile:

i’d like to replicate this but at the moment i’m not sure how.

btw there’s definitely duplicates now in the feed which has all labels combined, for obvious reasons.

I started noticing duplicate headlines appearing and this morning went and searched and found this. My seeing duplicates started when this change was made. This isn’t on “all articles” just on a category. Here is an example and you ca see the duplicates:

yeah, i might have been a bit too hasty in removing DISTINCT:

  1. filtering strictly on frontend quickly desyncs LIMIT/OFFSET calculation thus breaking lazy load

  2. there’s a SELECT DISTINCT ON (columns) syntax (that mysql naturally doesn’t support) which might be somewhat faster because there’s less fields to process but i haven’t really noticed:

distinct on (...) query
EXPLAIN ANALYZE SELECT DISTINCT ON (id, yyiw, ttrss_feeds.title, score , date_entered , updated )
						ttrss_entries.id AS id,
						date_entered,
						to_char(date_entered, 'IYYY-IW') AS yyiw,
						guid,
						ttrss_entries.title,
						updated,
						label_cache,
						tag_cache,
						always_display_enclosures,
						site_url,
						note,
						num_comments,
						comments,
						int_id,
						uuid,
						lang,
						hide_images,
						unread,feed_id,marked,published,link,last_read,orig_feed_id,
						last_marked, last_published,
						ttrss_feeds.title AS feed_title,favicon_avg_color,
						content, 
						author,score
					FROM
						ttrss_entries LEFT JOIN ttrss_user_entries ON (ref_id = ttrss_entries.id)
						LEFT JOIN ttrss_feeds ON (feed_id = ttrss_feeds.id)
					WHERE
					ttrss_user_entries.owner_uid = '2' AND
					true ORDER BY yyiw desc,  ttrss_feeds.title, score DESC, date_entered DESC, updated DESC
					LIMIT 30 OFFSET 0;

i guess it’s either duplicates or using the above with fallback DISTINCT for mysql.

e: i’m honestly not sure which one of left joins or w/e else causes these duplicates, i sadly couldn’t find any on my tt-rss database. this might be one of those situations where i’m too dumb to figure something out. :man_shrugging:

https://git.tt-rss.org/fox/tt-rss/commit/d06cc8267bb7dfadd3693260af8788aa990a304e

a shameful changeset, i guess.

e: from people affected i would appreciate a database dump (postgres).

You’re correct, I did mean tags. I manually added “test” to a couple articles and when I cliked the “test” anchor in the article the resulting list had dupes.

This PR doesn’t fix the duplicate articles when viewing the “test” tag. I am running on gentoo, pgsql 12.4, apache 2.4.46, and php 7.4.11.

e. I’ll try to create a minimal reproducible database and dump it when I get some free time.

Got time quicker than I thought. The dump has this discuss and soylentNews subscribed. I added “test” tags to 2 articles (this one “Slow ‘all articles’ queries, options to improve?” and one in soylent “Laptop Vendors Will Pair AMD’s Cezanne (Zen 3) with High-End GPUs”). The dupes show up when I click the “test” anchor. It shows 3 articles 1 for this discuss and the soylent article is duped. HTH.

BTW, this is dumped from my dev machine which is Win10, psql 12.1, apache 2.4.39, and php 7.3.7; but it shows the same behavior as my gentoo server.

Attacheddb.tt-rss.20201207.dump.gz (156.2 KB)

e. the articles I tagged are read; mark all unread to see the 2 tagged articles.

I tagged test to some more articles and removed it from the soylent article and the duplication disappeared. Uhoh, I thought it might be a plugin or something; so I disabled all plugins. I added test tags to more articles and randomly on some it caused dupes and others it did not, it might take adding test to 10+ articles b/f I saw the dupes again. This is a wily one : /

e. I don’t think the plugins effect this though.

this could be a tag-related bug somewhere which surfaced because of above changes.

browsing by tag is a separate query and, to me, a less interesting one, so don’t worry about the dumps and stuff.

it could as well use a full DISTINCT like it did before which would probably deal with this oddity, honestly i would prefer to not dig into tag-related tables and stuff unless absolutely necessary. :slight_smile:

The dupes don’t bother me. Sometimes removing/replacing them fixes it. The ones I do use don’t vary that much. This is a very low priority “feature”, IMO.

The dupes come from the ttrss_tags table, see screenshot. Somehow the tag/post_int_id is getting duplicated via the webapp.

ttrss-dupe-sql..PNG

yeah that’s what i figured, i guess the bug is in the tag editor somewhere. i’ll make a note, etc.

I see the same, please undo the DISTINCT stuff as a properly joined query and data model cannot result in duplicates.
All it does is hide issues like this, which appears to be a double insert.

Okay, but undoing (reverting) means that other issues would go unnoticed. You know, like this one:

Your comment makes literally no sense. Wouldn’t you want the software to operate at its best? Identifying issues and resolving them works toward that end.

that’s what he wants, revert the latest change by removing DISTINCT, because it masks the underlying issue somewhere.

the takeaway here is that tone is important.

e: also, i think we shouldn’t do any further changes on trunk without a clear way to produce the duplicate entries sans-DISTINCT for every view perspective - tags, categories, etc. either a database dump or whatever else.

Ahh… Got it… My apologies for misunderstanding; I hadn’t checked today’s commits.

It’s hard to find good help these days. :man_shrugging:

well, at least everyone has an opinion on how to do things.