HTML Escape in XML breaks feed

  • [ ] I’m using stock docker compose setup, unmodified.
  • [ X ] I’m using docker compose setup, with modifications (modified .yml files, third party plugins/themes, etc.) - if so, describe your modifications in your post. Before reporting, see if your issue can be reproduced on the unmodified setup.
  • [ ] I’m not using docker on my primary instance, but my issue can be reproduced on the aforementioned docker setup and/or official demo.

I have a feed that was working for a long time, and recently seems to have stopped working. The feed in question is:
https://yaleclimateconnections.org/section/eye-on-the-storm/feed/

Trying to debug the feed results in (similar results running through the tester).

[02:28:30/276] start
[02:28:30/276] running HOOK_FETCH_FEED handlers...
[02:28:30/276] feed data has not been modified by a plugin.
[02:28:30/276] local cache will not be used for this feed
[02:28:30/276] last unconditional update request: 2023-09-05 18:42:59
[02:28:30/276] stored last modified for conditional request: Tue, 05 Sep 2023 18:37:32 GMT
[02:28:30/276] fetching https://yaleclimateconnections.org/section/eye-on-the-storm/feed/ (force_refetch: 1)...
[02:28:30/276] fetch done.
[02:28:30/276] effective URL (after redirects): https://yaleclimateconnections.org/topic/eye-on-the-storm/feed/ (IP: 192.0.78.231) 
[02:28:30/276] server last modified: Tue, 05 Sep 2023 18:37:32 GMT
[02:28:30/276] saving to local cache: 0834e1adc81b55fbb105a824e37d854b8cf5dde6.xml
[02:28:30/276] running HOOK_FEED_FETCHED handlers...
[02:28:30/276] feed data has not been modified by a plugin.
[02:28:30/276] fetch error: LibXML error 26 at line 14 (column 42): Entity 'raquo' not defined

[02:28:30/276] + LibXML error 26 at line 14 (column 42): Entity 'raquo' not defined

[02:28:30/276] + LibXML error 26 at line 28 (column 42): Entity 'raquo' not defined

[02:28:30/276] update failed.

Indeed looking at lines 14 and 28, the XML document has HTML escapes in it: <title>Eye on the Storm Archives &raquo; Yale Climate Connections</title>.

That was a new one to me, apparently it is a Right Pointing Double Angle Quotation Mark. With a bit of googling, it sounds like LibXML isn’t setup to deal with these HTML escapes?

I’m guessing it isn’t up-to-spec to have these in an XML tag without a CDATA wrapper, but I’m not an expert on the spec. I wanted to report this in case there was a easy solution that could be implemented on the TTRSS side to filter these.

Version Info

  • Tiny Tiny RSS version (including git commit id): v23.06-dc25a9cf
  • Platform (i.e. Linux distro, Docker, PHP, PostgreSQL, etc) versions: Docker 24.0.2

My docker-compose setup looks very similar to what is in git, I’m guessing it matches what was there a couple years ago. I can update to the current if that matters.

Similar for the version, I updated a couple months ago, can update to a newer version if it matters.

https://gitlab.tt-rss.org/tt-rss/tt-rss/-/wikis/FAQ#i-want-to-check-how-tt-rss-renders-my-feed-the-feed-im-trying-to-use-is-parsed-incorrectly

Thanks for pointing that out. That is a sensible policy to have.

Interestingly, I read the FAQ here, which is linked from the home page, and it doesn’t have that question/response.

static wiki updates every few hours daily, if it doesn’t show up tomorrow this means i broke something. :slight_smile: