Mercury Full Text Extraction

Full of cold, so apologies if this is not coherent…

Either I’m being an idiot (this is the most likely option, especially with a cold), or a recent update to ttrss (updated to v22.12-fb4bc26 yesterday) has borked integration with the Mercury Full Text plugin and or parser (this is the less likely option), or a recent update on the VM (Ubuntu 20.04) that runs docker for ttrss has borked it (fairly sure that this wouldn’t be the cause).

While I have some “fun” with restoring the VM backup (from before the updates yesterday) so I can take a checkpoint before updating (see above about being an idiot), if somebody else has experienced this, I’d be very grateful for any pointers.

I’m using this plugin https://github.com/HenryQW/mercury_fulltext with https://hub.docker.com/r/wangqiru/mercury-parser-api

If there is a “better” (as in “simpler” for idiots like myself) way of doing full text extraction for sites like theregister.com I’m very open to suggestions.

repeated in the logs I see the entries such as the below (sequence is reversed, the first one below is the lowest in the log)

Uncaught TypeError: property_exists(): Argument #1 ($object_or_class) must be of type object|string, null given in /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php:279
Stack trace:
#0 /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php(279): property_exists()
#1 /var/www/html/tt-rss/classes/pluginhandler.php(15): mercury_fulltext->extract()
#2 /var/www/html/tt-rss/backend.php(144): PluginHandler->catchall()
#3 {main}
  thrown


Forwarded For: <public IPv6 removed>>
Forwarded Protocol: https
Remote IP: 10.201.253.81
Request URI: /tt-rss/backend.php
User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54

followed by

TypeError: property_exists(): Argument #1 ($object_or_class) must be of type object|string, null given in /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php:217
Stack trace:
#0 /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php(217): property_exists()
#1 /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php(240): mercury_fulltext->process_article()
#2 /var/www/html/tt-rss/classes/pluginhost.php(347): mercury_fulltext->hook_article_filter()
#3 /var/www/html/tt-rss/classes/rssutils.php(845): PluginHost->chain_hooks_callback()
#4 /var/www/html/tt-rss/update.php(238): RSSUtils::update_rss_feed()
#5 {main}
1. classes/pluginhost.php(352): user_error(TypeError: property_exists(): Argument #1 ($object_or_class) must be of type object|string, null given in /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php:217
Stack trace:
#0 /var/www/html/tt-rss/plugins.local/mercury_fulltext/init.php(217): pr...)
2. classes/rssutils.php(845): chain_hooks_callback(hook_article_filter, {Closure}, [{"owner_uid":3,"guid":"3,298235 at http:\/\/road.cc","guid_hashed":"{\"ver\":2,\"uid\":3,\"hash\":\"SHA1:09e0ce0d0e7443ab4debc205d2d9a1428733b18c\"}","title":"Canadian cyclist clears cycle lanes with homemade cargo bike snowplow; Geraint&#039;s giving out ...)
3. update.php(238): update_rss_feed()

Thanks!

Looking at the lines mentioned in mercury_fulltext/init.php at master · HenryQW/mercury_fulltext · GitHub , you might want to take a look at the Mercury API part (is it available, can you send the API requests directly, is the API URL set correctly in the plugin settings, what’s in its logs/output). To me it seems likely that the API isn’t returning valid JSON.

since this is a third party plugin, asking its developer for help directly might be more productive.

Thanks for the responses.

Feeling more stupid, but less cold today

So after a long slow restore from backup (upload is slower than download) and then taking checkpoints before making any changes, it looks as if the mercury full text plugin wase a red herring as same log entries… (hence feeling more stupid), but there was something else that was pulling full text.

I found at https://community.tt-rss.org/t/af-readability-call-to-undefined-function-masterminds-html5-parser-ctype-alpha/5607/8 I wondered if I’d missed out something in how I update, but that had the git pull in it, so was not the issue.

restoring to the original checkpoint again and disabling auto updates, then running an update and checking for af_readability showed it now needed to be installed (presumably from https://git.tt-rss.org/fox/tt-rss.git/commit/?id=8ea537123d1cef38f25f9fbe92e3a9c0f89de55a)

installed the af_readability plugin, ran a debug with refetch, and all was good again (-:

tl:dr idiot (that’s me) didn’t pay close enough attention to git commits.

apologies for wasting peoples time.