I don’t see the justification of doing full url validation for href links though (beyond javascript:) as those shouldn’t do anything until the user actually click on them. I also noticed custom protocols like git: or osu: which is valid href attribute is also broken.
No, really, what is it? Where can I read about it?
I’ve seen about stripping out other attributes (onclick etc) or removing javascript: in hrefs but I’ve never seen needing validation for any other type of links.
Most of the stuff mentioned here seem to be around sanitizing image src attribute though (at least the ones that relate to rendering feed content).
Unlike image src, anchor href doesn’t seem to be affected as nothing should need to do anything with it (unless ttrss somewhere in the process tries to download everything in href? that would be weird).
Just to be clear: I’m not suggesting to relax the existing url validation that’s being used by a lot of things. It’s just validation for feed content’s anchor href attribute should be different/minimal as those are inert until clicked by the user themselves.
There’s a linked PDF from the security researchers in that thread that goes into great detail about the vectors of attack, etc. There are also several other threads on this forum that I was able to find with a few keyword searches: filter, validate, url, etc.
Don’t you think URLs should be validated before being served to the user? In a properly designed application, wouldn’t you expect that? Can you imagine any possible problem with TT-RSS just blindly passing along whatever data it comes across to the user?
For anchor link href attribute? Beyond removing javascript: links and making sure rel=noreferrer is set when target is _blank, I don’t think so. Even the rel thing isn’t needed anymore in recent browsers. Browsers don’t generally do anything with those until user click on those.
Unless I’m missing something here? I’m interested if you know any danger I’m unaware of.
Vulnerabilities are often layered together. It can start with a trusted web site whose editor credentials become compromised, leading to a malicious actor changing URLs. Those URLs are fetched through a feed reader, which are then clicked by an end user who’s using a browser with a zero-day exploit on improperly crafted HREF attributes. This leads to the browser breaking out of the operating system sandbox, which then exploits the OS kernel and results in full access to the computer.
The idea is that everyone at every stage does everything correct so if there’s a failure at some point the affects are mitigated. Ensuring URLs are valid is one of those steps and it’s so frickin’ easy to do, why not do it?
some imbecile with an anime userpic clicking on something, owning himself, and coming here crying because “tt-rss did it” is already bad enough, the rest has been explained by @JustAMacUser.
Actually I noticed non-ascii characters in path is fine (for example https://google.com/テスト/) because there’s separate rawurlencode/rawurldecode for path before passed to filter_var but there’s none for query string.
So the validator really only “validates” query string…