Are you using stock Docker compose setup?
no

If not, either reproduce this issue on the official demo or switch to Docker and see if the issue is resolved.
can’t see error log but the backtrace indicates it’s bug in the code itself (lack of check)


Describe the problem you’re having:

I’m getting a bunch of errors on this feed, including this one:

Undefined index: host
1. classes/urlhelper.php(45): ttrss_error_handler(8, Undefined index: host, classes/urlhelper.php, 45, [{"base_url":"entries\/2018-02-26-1.xml","rel_url":"mailto:CakeFest@cakephp.org","rel_parts":{"scheme":"mailto","path":"CakeFest@cakephp.org"},"base_parts":{"path":"entries\/2018-02-26-1.xml"}})
2. classes/feeditem/atom.php(77): rewrite_relative(entries/2018-02-26-1.xml, mailto:CakeFest@cakephp.org)
3. classes/feeditem/atom.php(103): rewrite_content_to_base(entries/2018-02-26-1.xml, <div xmlns="http://www.w3.org/1999/xhtml">
      <p>
        <a href="https://cakefest.org">CakeFest</a> is organized for developers,
        managers and interested newcomers alike. Bringing a world of unique skill
        and talent together in a celebra...)
4. classes/rssutils.php(659): get_content()
5. update.php(235): update_rss_feed(623, 1)

The resulting feed content is it links to PHP: Manual Quick Reference instead (tested on demo).

There’s also a bunch of other errors on that feed so maybe add more check so it doesn’t fail on nonsense feed like this?

Include steps to reproduce the problem:

  1. subscribe to this feed PHP.net news & announcements
  2. wait
  3. check error log
  4. also see entry on 2018-02-26, link to CakeFest@cakephp.org is wrong.

tt-rss version (including git commit id):
92c78beb909d8955657564127c2e953ca25113e3

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:
FreeBSD/PHP 7.4/PostgreSQL 13.2

Please provide any additional information below:
n/a

xml:base being a relative (to what? ideally, a multitude of ancestor :base elements, amirite) URL is where i’m personally drawing the line with atom and its idiotic quirks.

if its actually allowed (i’m not going to bother checking the spec), whoever wrote this trainwreck of standard can go fuck himself with a cactus. feel free to quote me on this.

As far as I can tell, relative URIs are not allowed in the xml:base. So this feed is probably just broken. The issue with mailto links specifically seems to be a bug in Tiny Tiny RSS. I have filed a pull request that should fix the issue.

Regarding all those error messages, I am unsure what TT-RSS should do when the base URI is invalid. The simplest way would be to just return the relative URI when the base URI does not have a scheme or host. But I don’t know if fox would want to add such a change because in theory this could break weird feeds with base URIs like //example.com/ where a relative link lorem_ipsum.html plus the TT-RSS scheme e.g. https could result in a working URI https://example.com/lorem_ipsum.html despite the broken base URI.

[1] RFC 5023 2.1.3. Use of “xml:base” and “xml:lang”
[2] RFC 3986 5.1. Establishing a Base URI
[3] RFC 3986 4.3. Absolute URI

the proper way to deal with stuff like this is refusing to parse the feed. trying to guess garbage data into something that works is never going to work properly. i’d rather not implement hacks to maybe fix some subset of broken feeds while breaking other broken feeds.

thanks for the pr.

That would certainly be easiest, but at least according to libxml the feed is technically not “broken.” Considering the feed parses but the xml:base is invalid, the following change would get rid of the errors while ensuring all relative link href tags are blank if the base URI is relative.

diff --git a/classes/urlhelper.php b/classes/urlhelper.php
index 03f0c474d..6e74565a5 100644
--- a/classes/urlhelper.php
+++ b/classes/urlhelper.php
@@ -42,6 +42,10 @@ class UrlHelper {
 		} else {
 			$base_parts = parse_url($base_url);

+			if (empty($base_parts["scheme"] || empty($base_parts["host"]))) {
+				return false;
+			}
+
 			$rel_parts['host'] = $base_parts['host'];
 			$rel_parts['scheme'] = $base_parts['scheme'];

ttrss.diff (902 Bytes)

its not broken as an XML document but it is broken as an Atom feed, which is no less important.

i may be repeating myself below but it seems pointless to ignore xml:base (or any other feed-provided content) “sometimes” when we feel it is invalid. (e: and i’m saying this fully aware of how we deal with timestamps :face_with_raised_eyebrow: )

relative urls below :base could still be generated with it in mind and stripping would not achieve much of anything while adding unnecessary hacks to tt-rss.

i.e.

stripping base would give you ‘example.com/article1.xml’ which would still be incorrect.

in general tt-rss expects feeds to provide valid data. we don’t try to fix broken XML documents (with a few very rare exceptions) therefore we shouldn’t try to fix broken feed content, even if document is well-forming, if only for consistency.

the opposite approach endgame is adding per-feed compatibility shim for every broken blog under the sun, it’s a road to nowhere.


i wonder what would happen if we validated feeds against DTDs (if those exist). probably an ocean of tears.

In my opinion, the “correct” (if it could be called that) way to deal with a feed like this would be to leverage the plugin system to manually change things for just the feed in question. Altering core code for feeds that are out-of-spec or otherwise providing incoherent information is only going to break other things and, as an example, that type of practice leads to problems like the browser incompatibility/quirks back in the day (which still exists, to some extent).