I started noticing duplicate headlines appearing and this morning went and searched and found this. My seeing duplicates started when this change was made. This isn’t on “all articles” just on a category. Here is an example and you ca see the duplicates:

yeah, i might have been a bit too hasty in removing DISTINCT:

  1. filtering strictly on frontend quickly desyncs LIMIT/OFFSET calculation thus breaking lazy load

  2. there’s a SELECT DISTINCT ON (columns) syntax (that mysql naturally doesn’t support) which might be somewhat faster because there’s less fields to process but i haven’t really noticed:

distinct on (...) query
EXPLAIN ANALYZE SELECT DISTINCT ON (id, yyiw, ttrss_feeds.title, score , date_entered , updated )
						ttrss_entries.id AS id,
						date_entered,
						to_char(date_entered, 'IYYY-IW') AS yyiw,
						guid,
						ttrss_entries.title,
						updated,
						label_cache,
						tag_cache,
						always_display_enclosures,
						site_url,
						note,
						num_comments,
						comments,
						int_id,
						uuid,
						lang,
						hide_images,
						unread,feed_id,marked,published,link,last_read,orig_feed_id,
						last_marked, last_published,
						ttrss_feeds.title AS feed_title,favicon_avg_color,
						content, 
						author,score
					FROM
						ttrss_entries LEFT JOIN ttrss_user_entries ON (ref_id = ttrss_entries.id)
						LEFT JOIN ttrss_feeds ON (feed_id = ttrss_feeds.id)
					WHERE
					ttrss_user_entries.owner_uid = '2' AND
					true ORDER BY yyiw desc,  ttrss_feeds.title, score DESC, date_entered DESC, updated DESC
					LIMIT 30 OFFSET 0;

i guess it’s either duplicates or using the above with fallback DISTINCT for mysql.

e: i’m honestly not sure which one of left joins or w/e else causes these duplicates, i sadly couldn’t find any on my tt-rss database. this might be one of those situations where i’m too dumb to figure something out. :man_shrugging:

https://git.tt-rss.org/fox/tt-rss/commit/d06cc8267bb7dfadd3693260af8788aa990a304e

a shameful changeset, i guess.

e: from people affected i would appreciate a database dump (postgres).

You’re correct, I did mean tags. I manually added “test” to a couple articles and when I cliked the “test” anchor in the article the resulting list had dupes.

This PR doesn’t fix the duplicate articles when viewing the “test” tag. I am running on gentoo, pgsql 12.4, apache 2.4.46, and php 7.4.11.

e. I’ll try to create a minimal reproducible database and dump it when I get some free time.

Got time quicker than I thought. The dump has this discuss and soylentNews subscribed. I added “test” tags to 2 articles (this one “Slow ‘all articles’ queries, options to improve?” and one in soylent “Laptop Vendors Will Pair AMD’s Cezanne (Zen 3) with High-End GPUs”). The dupes show up when I click the “test” anchor. It shows 3 articles 1 for this discuss and the soylent article is duped. HTH.

BTW, this is dumped from my dev machine which is Win10, psql 12.1, apache 2.4.39, and php 7.3.7; but it shows the same behavior as my gentoo server.

Attacheddb.tt-rss.20201207.dump.gz (156.2 KB)

e. the articles I tagged are read; mark all unread to see the 2 tagged articles.

I tagged test to some more articles and removed it from the soylent article and the duplication disappeared. Uhoh, I thought it might be a plugin or something; so I disabled all plugins. I added test tags to more articles and randomly on some it caused dupes and others it did not, it might take adding test to 10+ articles b/f I saw the dupes again. This is a wily one : /

e. I don’t think the plugins effect this though.

this could be a tag-related bug somewhere which surfaced because of above changes.

browsing by tag is a separate query and, to me, a less interesting one, so don’t worry about the dumps and stuff.

it could as well use a full DISTINCT like it did before which would probably deal with this oddity, honestly i would prefer to not dig into tag-related tables and stuff unless absolutely necessary. :slight_smile:

The dupes don’t bother me. Sometimes removing/replacing them fixes it. The ones I do use don’t vary that much. This is a very low priority “feature”, IMO.

The dupes come from the ttrss_tags table, see screenshot. Somehow the tag/post_int_id is getting duplicated via the webapp.

ttrss-dupe-sql..PNG

yeah that’s what i figured, i guess the bug is in the tag editor somewhere. i’ll make a note, etc.

I see the same, please undo the DISTINCT stuff as a properly joined query and data model cannot result in duplicates.
All it does is hide issues like this, which appears to be a double insert.

Okay, but undoing (reverting) means that other issues would go unnoticed. You know, like this one:

Your comment makes literally no sense. Wouldn’t you want the software to operate at its best? Identifying issues and resolving them works toward that end.

that’s what he wants, revert the latest change by removing DISTINCT, because it masks the underlying issue somewhere.

the takeaway here is that tone is important.

e: also, i think we shouldn’t do any further changes on trunk without a clear way to produce the duplicate entries sans-DISTINCT for every view perspective - tags, categories, etc. either a database dump or whatever else.

Ahh… Got it… My apologies for misunderstanding; I hadn’t checked today’s commits.

It’s hard to find good help these days. :man_shrugging:

well, at least everyone has an opinion on how to do things.

people affected, try running this query:

SELECT ref_id, int_id FROM ttrss_user_entries u1 WHERE 
    (SELECT COUNT(*) FROM ttrss_user_entries u2 
          WHERE u2.ref_id = u1.ref_id) > 1;

if there are results, this means that you actually do have multiple user entries linking to a base entry, which is one possible explanation for category/feed duplicates.

alternatively, you can try adding a constraint:

ALTER TABLE ttrss_user_entries ADD CONSTRAINT 
    enforce_unique_posts UNIQUE (ref_id);

and here’s how to drop it afterwards:

ALTER TABLE ttrss_user_entries DROP CONSTRAINT enforce_unique_posts;

e: it is unlikely to be the reason though because an all-column DISTINCT, like we had previously, would show these duplicates because they would have a different primary key (int_id).

i’d really prefer a database dump tbh.

e: this should deal with duplicate tags https://git.tt-rss.org/fox/tt-rss/commit/85b788709a9f7a93645f7aab1675b99d57874d0b

The tags editor ui fix seems to have solved my dupes. thx.

i’ve added a per-user preference to skip DISTINCT checking for headlines (except for Labels category):

https://git.tt-rss.org/fox/tt-rss/commit/f3d4bae32eae802c717ff36fcaad1f7ee56d2a39

if you don’t mind an occasional duplicate, enable it and report any performance changes. if you actually get duplicates, i’d appreciate a reproducible test-case or a database dump.

on my low-power celeron home server, this reduces times spent in headlines query (shown in f G) by half.

e: i noticed a bunch of what i thought were duplicates but apparently those were just some idiot posting again and again:

image

(article id is different)

tried it, it is noticeable faster :slight_smile: