Dealing with duplicate posts after upgrading from old version

Hey folks,

Recently, I’ve upgraded my Tiny Tiny RSS setup from the (fairly old) Debian 11 package to the latest git version on Debian 12. This also meant a jump from PHP 7.4 to PHP 8.2.

Unfortunately, after the migration, all (really, all, not just a subset) posts got duplicated. I searched and searched for solutions unsuccessfully, and also read about how this just happens sometimes and one way to deal with it is to just start over. Well, starting over was a non-starter for me, so I dug in my heels and solved my problem.

I’d like to share what I discovered in the hopes that it might help other TT-RSS users with similar upgrade situations.

For context, what one needs to understand is that when TT-RSS inserts a new post into its database, it generates a unique identifier that takes into account a number of things including (but not limited to) the title, URL, post content, etc. When a feed is updated, it calculates identifiers for all posts comes across, and if that identifier doesn’t already exist in the database, it considers that post to be new and adds it.

Code change

One of the first things I discovered is that there has been a code change in classes/rssutils.php that caused identifiers to change. I reverted the change on my local instance and this helped fix one of the ways the identifier was changing in comparison with the previous version of TT-RSS.

Plugins

Secondly, I didn’t notice it at first but identifiers seem to also depend on plugins. In the new version I deployed, some plugins were moved out of the core and are now distributed via external repositories (link missing, can’t add it). Re-installing plugins which were previosuly enabled in my old setup also helped with the identifiers issue.

JSON

Lastly, even when the identifiers were identical, TT-RSS kept on duplicating entries. This time I had a look on the database side of things. Sure enough, in the new installation, entries in the guid column of the ttrss_entries table had a subtle, yet significant difference: the old installation had JSON bits like "uid":"<number>" whereas the new version created the same entries but with different quoting: "uid":<number>. The most likely culprit is difference in the PHP json module between 7.4 and 8.2.

The fix for this was to run a database query to convert the preexisting entries to the new format:

update ttrss_entries set guid = REGEXP_REPLACE(guid, '"uid":"([0-9]+)"', '"uid":\\1');

Conclusion

After these three adjustments, the upgrade was finally a success and the updater stopped duplicating all the existing posts. Such a massive relief! So if you’re upgrading from a very old TT-RSS and are facing a similar problem, try the above and let us know what you find out!

i wouldn’t recommend maintaining an unsupported fork (which your code changes effectively mean) because some amount of articles still present in the feed XML were reimported after an upgrade because of GUID changes and whatnot, but it’s your call and i’m not gonna stop you :slight_smile:

e: ah i didn’t notice that you seem to be using some kind of host install, so it’s unsupported anyway. well, fork away then!

For sure, maintaining a patch like that is less than ideal, but in my case worth it since the change is quite small and doesn’t seem to affect the functionality of the application at all. According to the commit message this change was part of a php 8.2 compatibility effort, but as far as I can tell, array_keys() is not being deprecated…

Certainly a much better option than having to deal with the fallout from thousands upon thousands of duplicated feed items…

1 Like

each RSS feed XML data normally has less than 50 articles. this would mean an impressive amount of feeds you’re subscribed to then.

I maintain an instance for a number of users besides myself, so yes. Impressive amount of feeds is what we have over here :slight_smile: