Issue since Readability update

Since php-readability was updated recently I’ve been noticing that occasionally my feeds stop updating. If I regress that commit then all of the feeds successfully update again. Going back to the new version will then work OK for a while before eventually failing again. I assume there must be some specific posts which are triggering a bug. I’m using PHP 7.3, but as I said, the older version works fine with PHP 7.3 if I git reset back to 487d06a20dc471fba487da579e69cf0cac291cc0.

[14:57:08/43903] Scheduled 37 feeds to update…
[14:57:09/43903] Base feed: ISPreview UK
[14:57:09/43903] => 2019-02-18 12:13:52.147662, 56 2
PHP Fatal error: Uncaught TypeError: Argument 1 passed to iterator_to_array() must implement interface Trav
ersable, null given in /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php:324
Stack trace:
#0 /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(324): iterator_to_array(NULL)
#1 /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(421): andreskrey\Readability\N
odes\DOM\DOMText->getChildren(true)
#2 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(1270): andreskrey\Readability\Node
s\DOM\DOMText->hasSingleTagInsideElement(‘tr’)
#3 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(1166): andreskrey\Readability\Read
ability->prepArticle(Object(andreskrey\Readability\Nodes\DOM\DOMDocument))
#4 /usr/www/ttrss/vendor/andreskrey/Readability/Readability.php(155): andreskrey\Readability\Reada
bility->rateNodes(Array)
#5 /usr/www/ttrss/plugins/af_readability/init.php(178): andreskrey\Readability\Readabi in /usr/www/ttrss/vendor/andreskrey/Readability/Nodes/NodeTrait.php on line 324
[14:57:11/43243] removing lockfile (43243)…
[14:57:11/41915] [reap_children] child 43243 reaped.
[14:57:11/41915] [SIGCHLD] jobs left: 1
[14:57:14/44892] Scheduled 0 feeds to update…
[14:57:14/44892] Sending digests, batch of max 15 users, headline limit = 1000
[14:57:14/44892] All done.

maybe i could wrap this into try-catch or something; i’d revert it but older version had some layout issues which are seemingly fixed in the newer one

can you report this to the upstream developer? GitHub - andreskrey/readability.php: PHP port of Mozilla's Readability.js

Describe the problem you’re having:

Since February 17th I’m getting Uncaught: TypeError when updating feeds

If possible include steps to reproduce the problem:

tt-rss version (including git commit id):

Version v18.12 (9e7bbf6)

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:

Arch Linux PHP 7.3.2 MySQL

Please provide any additional information below:

Strangely 1 feed is still updating sporadically. I rebooted the system after a kernel update yesterday which refreshed my feeds and I got about 25 news stories, but now it’s not refreshing properly. I use cron to update the feeds, but I also tried the update_daemon2.php file and get the same result.
CLI errors:
PHP Fatal error: Uncaught TypeError: Argument 1 passed to iterator_to_array() must implement interface Traversable, null given in /tt-rss/vendor/andreskrey/Readability/Nodes/NodeTrait.php:324
Stack trace:
#0 /tt-rss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(324): iterator_to_array(NULL)
#1 /tt-rss/vendor/andreskrey/Readability/Nodes/NodeTrait.php(421): andreskrey\Readability\Nodes\DOM\DOMText->getChildren(true)
#2 /tt-rss/vendor/andreskrey/Readability/Readability.php(1272): andreskrey\Readability\Nodes\DOM\DOMText->hasSingleTagInsideElement(‘td’)
#3 /tt-rss/vendor/andreskrey/Readability/Readability.php(1166): andreskrey\Readability\Readability->prepArticle(Object(andreskrey\Readability\Nodes\DOM\DOMDocument))
#4 /tt-rss/vendor/andreskrey/Readability/Readability.php(155): andreskrey\Readability\Readability->rateNodes(Array)
#5 /tt-rss/vendor/andreskrey/Readability/Nodes/NodeTrait.php on line 324

Merging this with the existing thread as the error is the same.

Ahh, Wasn’t sure if this was internal or external. Thanks.

Ahhh so it’s not a ttrss issue, but an upstream issue with that library. OK. I don’t personally have a github account. Does anybody else who is seeing the same issue have one who could report it to save me having to register an account etc.?

I filed an issue on his github repo. I’ll monitor the thread and if I see anything pertinent I’ll post here for the Non-Git among us :sunglasses:

Thank you!

Not much interest from the author to fix the problem, as I’m not much help in the coding department. Maybe someone can read through the few comments and help with what he’s asking for. Otherwise we’ll have to revert and hope for another type of fix.

So I rolled back to a previous commit. For those that would like help with that, I issued git log --oneline and went back 8 commits to find 13e7e775a and then I issued git reset --hard 13e7e775a and now my cron is updating the feeds again. I use cron nightly to git pull the repo, and I commented out that line in my contab so the revert sticks.
Some people also issue git clean -f after a git reset, YMMV.
Hope this helps! :beers:
EDIT: Please make a copy of the original repo before issuing the above commands, in case of catastrophic errors. cp -r tt-rss tt-rss.bak

Why? It’s git, just move back to HEAD when you’re ready.

Ah, yes, thanks. Wasn’t thinking about that.

i guess i’ll make a VM or something with php 7.3 and take a closer look at this :face_with_raised_eyebrow:

UPD: i’ve subscribed to the feed in the OP but so far no errors. maybe the data is not in the feed anymore. op can you post more feeds / specific posts where this happens on?

e: i’m using an ubuntu 18.04 test vm, php 7.3.2-3 from the ppa

post actual feed url ideally with affected post titles

This is the list of the feeds that I have the af_readability plugin enabled on… The issue is sporadic for me though. It will work fine for a day or two and then I’ll start seeing that error in the update process. So I guess if you subscribe to these and monitor it for a while you should see the same at some point.

I’m not sure about specific post titles. All my feeds are up to date at the moment, and if I git pull back up to HEAD then it will work fine. I guess if I purge the database of post entries it might trigger it though?

http://www.theregister.co.uk/headlines.atom
http://www.ispreview.co.uk/index.php/feed
http://feed.theregister.co.uk/rss?a=Simon%20Travaglia
https://www.thinkbroadband.com/news.rss
http://www.daemonology.net/blog/index.rss
https://dan.langille.org/feed/
https://www.gazetteseries.co.uk/news/yateandsodburynews/rss/

one of your feeds (the gazette) doesn’t open with connection timeout, i guess it’s geoblocking or something, the rest seemingly updated without any errors

yeah, i’m not going to keep a separate vm running and updating a bunch of random feeds because it might trigger a readability error at some point, maybe. this sounds like too much effort for a third party library + bleeding edge php combination. instead i’m going to wrap this into try-catch.

next time this happens make it trigger reliably on specific post urls (use force rehash in feed debugger) at least and post those here.

alternatively be a normal person like the rest of us and use a server distro for your server stuff.

e: maybe support should be limited to stable distros like centos and debian (+ubuntu) to begin with, it’s not like i’m going to investigate any issues with meme-tier garbage like arch or gentoo or whatever

update: readability parsing is already inside a try-catch block, which means it crashes in constructor? strange. i’ll move it inside the block, i guess.

https://git.tt-rss.org/fox/tt-rss/commit/fd8f8c7b3e612ffe394dbb62fc1036ac2277473e

let me know if that changes anything.

Would it be beneficial to call debug to log some info (article URL, feed, etc.) inside the catch() before returning false?

it could be a good idea to dump entire article XML somewhere (i.e. with file_put_contents) so that we could train readability on it later and see if it crashes


sorry about the 525. i’ve updated docker-ce and discourse, uh, didn’t take it well. i had to rebuild the container and since it’s such an overbloated monstrosity it always takes forever.

Definitely, but I’ve used a lot of forum software and like Discourse the best. I also see other sites moving to it.