I appear to no longer be able to subscribe to Mastodon user feeds. TTRSS reports Couldn't download the specified URL: HTTP/1.1 404 Not Found when attempting to add the feed, when the URL exists and is fetchable via curl and wget.

MyFeedSuck’s diagnostic output:

Fetch error: HTTP/1.1 404 Not Found; Undefined index: port [404]
Effective URL: https://podcastindex.social/%40cisene.rss (IP: 178.33.220.142)
Used curl: NO
Content type: application/rss+xml; charset=utf-8

Example feeds:

  • https://cmpwn.com/@sir.rss
  • https://podcastindex.social/@cisene.rss

tt-rss version (including git commit id): master at 7c8bed05243156a4dc6290c6ac411401d773a03a

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:

  • Application is run in a Docker container running Alpine Linux edge with PHP 7.4.10
  • Database is MariaDB 10.4 (run using official MariaDB docker image)

Please provide any additional information below:

I was able to previously subscribe to Mastodon user feeds. I suspect this may be related to some of the changes relating to port handling, but haven’t ran a bisect to identify where this broke. I am able to curl/wget the feeds without issue, but both my TTRSS instance and MyFeedSucks are unable to fetch the feed.

this isn’t port handling but urlencoding. it looks like mastodon requires @ to not be urlencoded.

nginx accepts both @test and %40test for my example file so i’m gonna go with mastodon being in the wrong here.

e:

[16:16:22/27826] fetching [http://debian-wsl.local/@test.xml] (force_refetch: 1)...
[16:16:25/27826] fetch done.
[16:16:25/27826] effective URL (after redirects): http://debian-wsl.local/%40test.xml (IP: 172.27.142.138)
[16:16:25/27826] source last modified: Mon, 28 Sep 2020 16:15:23 GMT
[16:17:32/27826] fetching [http://debian-wsl.local/%40test.xml] (force_refetch: 1)...
[16:17:35/27826] fetch done.
[16:17:35/27826] effective URL (after redirects): http://debian-wsl.local/%40test.xml (IP: 172.27.142.138)
[16:17:35/27826] source last modified: Mon, 28 Sep 2020 16:15:23 GMT

subscribing also works for me with ‘@’ in the URL. still, it’s just an example XML file sitting on nginx.

It appears to be the case, but there does appear to be a recent change in TTRSS to surface this issue.

I was able to previously subscribe to multiple Mastodon user feeds and remain subscribed (TTRSS appears to continue to be able to fetch those feeds without issue). It appears I am not able to add any additional Mastodon user feeds because of the change to url encode characters prior to fetching.

Reading through RFC 3986 on URI syntax, it specifically calls out how behavior may be different if characters are normalized (URL encoded) and that they should not be normalized:

Percent-encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications. Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

The reserved characters mentioned:

reserved = gen-delims / sub-delims
gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”
sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;” / “=”

that’s a good point, i guess those reserved characters should be left alone. unfortunately, urlencode() is apparently based on an older RFC which doesn’t take those into account.

anyway, the whole thing is only there so FILTER_VALIDATE_URL wouldn’t complain about non-latin characters. maybe we should keep urlencoded variant for filter_var() and actually work on the original URL.

https://git.tt-rss.org/fox/tt-rss/commit/c70e26db31d520c554b867325ace95cbee6687e3

alternatively we can de-urlencode reserved characters but it feels like piling hacks on top of hacks.

filter_var has some sanitizing filters that can encode a variety of character ranges, but I have no first-hand experience with how this affects non-latin characters, etc. I’m just mentioning it for discussion.