"data:image" in src breaks the image

As far as I can see the sanitizer tries to parse the url via “parsel_url” and attached the base uri to be beginning. Meaning at the end it looks something like this:

https://feedhost.com/image/jpg;base64, abcdef

I fixed this problem for me this way:

Index: classes/urlhelper.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/classes/urlhelper.php b/classes/urlhelper.php
--- a/classes/urlhelper.php	(revision 92c78beb909d8955657564127c2e953ca25113e3)
+++ b/classes/urlhelper.php	(date 1623002135396)
@@ -29,6 +29,9 @@
 	 */
 	public static function rewrite_relative($base_url, $rel_url) {
 
+		if (preg_match('%^data:image/(webp|gif|jpg|png|svg);base64,%', $rel_url) === 1) {
+			return $rel_url;
+		}
 		$rel_parts = parse_url($rel_url);
 
 		if (!empty($rel_parts['host']) && !empty($rel_parts['scheme'])) {

This is however seems to me to be a common method for fixing URLs and not specifically used for images. So I would not recommend fixing it this way midterm.

https://git.tt-rss.org/fox/tt-rss/pulls/38

there are some upcoming changes discussed here

copying from the aforementioned PR: allowing base64 media from the internet seems like a bad idea for obvious reasons (i wonder if svg+js can be embedded like this) so support for this specifically is unlikely.

JavaScript in SVG files is a known security concern. I was able to find several articles about it simply by Googling “SVG JavaScript Security”. In addition, it’s one of the changes that was made in TT-RSS several months back when a security researcher reported potential vulnerabilities in TT-RSS:

this is certainly good news, data urls inheriting tt-rss origin was a major concern for me

url helper functions are not aware of the context for which the URL is going to be used, so enabling data: scheme for URLs while restricting them to <img> tags (what about embedded audio or videos :face_with_raised_eyebrow:, are those a thing?) could be tricky.

simply allowing data: might still be a bad idea, i think.

I agree here 100% (also mentioned this in my initial post).

But in case you want to support different url schemes: The sanitization should be done in the context the url is used. Like ‘mailto:’ could be allowed for a:href, but not really for img:src. URL schemes should be whitelisted.
Also I would specifically whitelist content types provided with data. This way you don’t have to worry about video/audio at the moment.

this would require two different url validation / absolutization paths - one for internal rendering and another for outbound network requests.

for internal rendering, invalid urls should also be rewritten into something easier to understand than blank string, i.e. some kind of “this url was invalid and was removed” in-ttrss landing page.

alternatively, url validator could receive a sender dom node tag name (if any) and proceed accordingly, with rules becoming less strict if its an “img” etc.

e: this mirrors the PR discussion a bit.

e2: i must say that i don’t like all this potential additional complexity

I feel you in regards to the complexity. Never a good idea to stuff ever more features into a project. Initially I wrote that post only to report that links are broken. So for me it works either way :slight_smile:

I’m already very grateful that you guys care about this project because I use it daily. (Also I use the mobile client for which data-images are not displayed at all at the moment. Did not have time to check why that happens yet.)

https://git.tt-rss.org/fox/tt-rss/commit/e9c062a189cfad71922fc576d636610da18006d4

very nice, thank you, will try this out this weekend :slight_smile: