Wrong link on url containing japanese text

Are you using stock Docker compose setup?
no

If not, either reproduce this issue on the official demo or switch to Docker and see if the issue is resolved.
yes same problem on official demo


Describe the problem you’re having:
wrong link on #テスト

Include steps to reproduce the problem:

  1. subscribe to edogawa_test (edogawa_test)
  2. open feed
  3. latest entry’s #テスト link is missing href attribute
    • it should link to https://twitter.com/search?q=%23テスト (or https://twitter.com/search?q=%23%E3%83%86%E3%82%B9%E3%83%88)

tt-rss version (including git commit id):
92c78beb909d8955657564127c2e953ca25113e3

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:
FreeBSD 13.0/PHP 7.4/PostgreSQL 13.2

Please provide any additional information below:
n/a

i’m not going to subscribe to your feed but i have doubts that naked unicode characters are going to pass tt-rss url validation

Q: why are urls validated?
A: search the forums, it has been discussed before, in excruciating detail

I’m not sure I find the thread. Is it this one?

I don’t see the justification of doing full url validation for href links though (beyond javascript:) as those shouldn’t do anything until the user actually click on them. I also noticed custom protocols like git: or osu: which is valid href attribute is also broken.

thanks for sharing your opinion.

No, really, what is it? Where can I read about it?

I’ve seen about stripping out other attributes (onclick etc) or removing javascript: in hrefs but I’ve never seen needing validation for any other type of links.

Here is some reading material for you:

The above thread covers a lot of the changes that took place last year to address a myriad of security implications.

Most of the stuff mentioned here seem to be around sanitizing image src attribute though (at least the ones that relate to rendering feed content).

Unlike image src, anchor href doesn’t seem to be affected as nothing should need to do anything with it (unless ttrss somewhere in the process tries to download everything in href? that would be weird).

Just to be clear: I’m not suggesting to relax the existing url validation that’s being used by a lot of things. It’s just validation for feed content’s anchor href attribute should be different/minimal as those are inert until clicked by the user themselves.

There’s a linked PDF from the security researchers in that thread that goes into great detail about the vectors of attack, etc. There are also several other threads on this forum that I was able to find with a few keyword searches: filter, validate, url, etc.

Don’t you think URLs should be validated before being served to the user? In a properly designed application, wouldn’t you expect that? Can you imagine any possible problem with TT-RSS just blindly passing along whatever data it comes across to the user?

For anchor link href attribute? Beyond removing javascript: links and making sure rel=noreferrer is set when target is _blank, I don’t think so. Even the rel thing isn’t needed anymore in recent browsers. Browsers don’t generally do anything with those until user click on those.

Unless I’m missing something here? I’m interested if you know any danger I’m unaware of.

Vulnerabilities are often layered together. It can start with a trusted web site whose editor credentials become compromised, leading to a malicious actor changing URLs. Those URLs are fetched through a feed reader, which are then clicked by an end user who’s using a browser with a zero-day exploit on improperly crafted HREF attributes. This leads to the browser breaking out of the operating system sandbox, which then exploits the OS kernel and results in full access to the computer.

The idea is that everyone at every stage does everything correct so if there’s a failure at some point the affects are mitigated. Ensuring URLs are valid is one of those steps and it’s so frickin’ easy to do, why not do it?

some imbecile with an anime userpic clicking on something, owning himself, and coming here crying because “tt-rss did it” is already bad enough, the rest has been explained by @JustAMacUser.

Actually I noticed non-ascii characters in path is fine (for example https://google.com/テスト/) because there’s separate rawurlencode/rawurldecode for path before passed to filter_var but there’s none for query string.

So the validator really only “validates” query string…

it’s an exception we made - Empty links due to validate_url/filter_var

wrapping/unwrapping url parameters is too much effort and has more security implications, in my opinion.

that’s literally not true.

I see. Good thing this valid path http://google.com/%%30%30 doesn’t crash browsers anymore.