jreiff
1
-
[ ] I’m using stock docker compose setup, unmodified.
-
[ ] I’m using docker compose setup, with modifications (modified .yml files, third party plugins/themes, etc.) - if so, describe your modifications in your post. Before reporting, see if your issue can be reproduced on the unmodified setup.
-
[x] I’m not using docker on my primary instance, but my issue can be reproduced on the aforementioned docker setup and/or official demo.
Description
The af_readabilty plugin does not handle relative URLs correctly. The HTML spec requires that the last element be removed from the base URL before appending the relative URL. af_readability, however, simply appends the relative URL to the article base.
(I know, relative URLs suck…)
Spec: URL Standard, relative state, item 5.4.2.
af_readability code: https://git.tt-rss.org/fox/tt-rss.git/tree/plugins/af_readability/init.php#n228
Example
The article https://apod.nasa.gov/apod/ap220315.html links to the image image/2203/Road2Stars_EsoHoralek_1080.jpg. The af_readability rewrites this to https://apod.nasa.gov/apod/ap220315.html/image/2203/Road2Stars_EsoHoralek_4000.jpg, i.e. it appends the relative URL without removing the last element ap220315.html.
Steps to reproduce
- Add the APOD RSS feed
https://apod.nasa.gov/apod.rss.
- Enable af_readability plugin.
- Enable
Inline article content.
- Fetch feed contents. The images are not displayed, and the links are broken.
Environment
- Tiny Tiny RSS version (including git commit id): v22.03-1703850
- Platform (i.e. Linux distro, Docker, PHP, PostgreSQL, etc) versions: Official demo
jreiff
3
Thanks for the fast reply!
Just to be clear: This is not about relative URLs in the feed, but on the website that af_readability fetches. And while I totally agree that feeds should always use absolute URLs, relative URLs are unfortunately still very common on regular websites.
If you do not want to change this behavior, then it would at least be good to document that af_readability does not support websites with relative links.
fox
4
ah, right, readability. i don’t think readability canonicalizes URLs at all, so tt-rss processes resulting HTML data rewriting URLs as if they were in the feed.
readability is, by design, a hack. it works, kinda, until it doesn’t. that’s just how it is. i think there’s a README warning on the plugin to this effect although maybe i’m misremembering things.
we used to have a more complicated URL rewriting before which was more clever with relative URL parts but the contributed implementation was so opaque I had trouble understanding what it did so I just cut it away and replaced with something much simpler. dealing with relative URLs is of critical importance so it makes sense to have it as bulletproof as possible even if it breaks edge cases like readability.
jreiff
5
I just checked, there is no README or similar in plugins/af_readability. Also, I did not find anything in the wiki or the FAQ.
I agree that the URL spec is too complicated and that it would be too complex to cover all cases. To fix the majority of cases, however, it would suffice to remove everything after the base URL’s last slash if the relative URL’s path is nonempty (see RFC 3986).
Again, if you do not want to make this change, then please consider documenting this behavior somewhere.
jreiff
6
Oh, maybe even better: PHP Readability has an option to fix relative URLs: GitHub - fivefilters/readability.php: PHP port of Mozilla's Readability.js
fox
7
that looks good, i’ll make a note to poke at this.
adjusting core rewriting behavior could also be useful, it’ll help to have a bunch of test urls i.e. original base A, relative part B, should rewrite to C.
as usual, no eta.
jreiff
8
Thank you! And yes, this is not a critical issue, so take your time.
fox
9
this should deal with readability-php configuration:
https://git-gitea.tt-rss.org/fox/tt-rss/commit/711662948768492e8d05b778a7d80eacaec368d2
i’ll take a look at core URL rewriting next. readability has a fairly simple implementation (at Readability.php:571) maybe I could lift something there.
jreiff
10
I just tried your changes, and they fix the APOD images. Thank you!
fox
11
https://git-gitea.tt-rss.org/fox/tt-rss/commit/1c4f7ab3b838b23afb2ee4dab14acbf75956e952
more examples of various relative-to-absolute URLs would be very helpful to add to the phpunit test i’ve added.
jreiff
12
Mozilla has an extensive set of test cases:
https://searchfox.org/mozilla-central/source/netwerk/test/gtest/urltestdata.json
It would be total overkill in its entirety, but maybe you can find some inspiration there. Off the top of my head, I would add the following cases:
-
http://example.com/test/url + (empty path) → http://example.com/test/url (empty relative path does not remove last element)
-
http://example.com/test/url + / → http://example.com/ (root path)
-
http://www.example2.com/test + http://www.example.com/test2 → http://www.example.com/test2 (URLs with scheme are interpreted as being absolute)
fox
13
thanks!
that won’t work, second URL is not relative so it would just return “http://www.example.com/test”. i don’t see why it should too, that would be arbitrarily moving paths between domains.
jreiff
14
Sorry, my mistake, I fixed my post above.
fox
15
Hi
Just updated to the latest and getting these 2 types of errors constantly
fox
17
yeah, that should be already fixed in trunk.
Ah sorry, looks like I jumped the gun.
Edit:
I’m still seeing the same issue, have to see whether other people report the same thing
edit: latest update fixed this