Tiny Tiny RSS: Community

Temporarily remove all iframe stripping

I know… I know… “iframes bad…”

I’m trying to troubleshoot videos (including whitelisted sites like youtube) and images not showing in feeds. (They show in the browser but when I grab them via the API, they’re stripped). I’ve turned off and on various plugins (e.g., no_iframes, videoframes, af_youtube_embed) but none seem to make any difference. (I was unable to figure out from https://tt-rss.org/wiki/ApiReference how to set has_sandbox always true in config so I even hardcoded it temporarily to true which also didn’t make a difference.)

I’d like to temporarily turn off all iframe stripping to confirm that’s the problem in various feeds. Would someone direct me please how best to do that (I don’t mind changing PHP code temporarily if that’s the easiest way.) Thanks.

unless you have plugins enabled which remove iframes, enabling sandbox should be enough to keep all iframes in articles.

(functions.php)

      if ($_SESSION['hasSandbox']) $allowed_elements[] = 'iframe';

if all else fails you can remove the if (), but it should work without hacks.

Thanks. I added a hard-coded:

$_SESSION['hasSandbox']=true;

…right before the if-statement and the problem went away. Not sure which plugin is causing the problem but this is a reasonable (albeit risky) temporary solution.

If you need a more permanent solution, you can use HOOK_SANITIZE in a plugin. Depending on what you’re doing you might be able to force iframes to be included dynamically (i.e only when needed in the API).

Appreciate that as I do need a more permanent solution. But something still doesn’t seem right and it appears even my hard-coded option isn’t always working.

For example, I have this feed setup: https://www.youtube.com/feeds/videos.xml?channel_id=UCNSMdQtn1SuFzCZjfK2C7dQ. The following plugins are enabled:

  • auth_internal
  • af_comics
  • af_proxy_http
  • af_readability
  • af_redditimgur
  • af_youtube_embed
  • note

All others, include the no_iframes plugin, are disabled.

On the feed itself, only “Include in e-mail digest” and “Always display image attachments” are checked; thus, “Do not embed media” and “Cache media” are both disabled (as is readability). But even with my $_SESSION['hasSandbox']=true; hard-coded, I still don’t see the embedded video.

Here’s what the first entry of the youtube feed shows:

 <entry>
  <id>yt:video:p6qXM_N34TI</id>
  <yt:videoId>p6qXM_N34TI</yt:videoId>
  <yt:channelId>UCNSMdQtn1SuFzCZjfK2C7dQ</yt:channelId>
  <title>Royal Enfield Continental GT 650 Review</title>
  <link rel="alternate" href="https://www.youtube.com/watch?v=p6qXM_N34TI"/>
  <author>
   <name>FortNine</name>
   <uri>https://www.youtube.com/channel/UCNSMdQtn1SuFzCZjfK2C7dQ</uri>
  </author>
  <published>2019-11-15T16:30:05+00:00</published>
  <updated>2019-11-16T00:42:50+00:00</updated>
  <media:group>
   <media:title>Royal Enfield Continental GT 650 Review</media:title>
   <media:content url="https://www.youtube.com/v/p6qXM_N34TI?version=3" type="application/x-shockwave-flash" width="640" height="390"/>
   <media:thumbnail url="https://i1.ytimg.com/vi/p6qXM_N34TI/hqdefault.jpg" width="480" height="360"/>
   <media:description>Royal Enfield, the world’s oldest motorcycle brand, is back with a modern twist on a very 60s idea of a British twin. Strangely enough, it attracts as much attention outside a retirement home as it does nestled amongst hipsters downtown. What's the deal here? [Details below]

Olympia Long Beach Jacket : https://frt9.co/xjh41e
Revit Orlando H2O Jeans : https://frt9.co/psf0et
Klim Powerxross Gloves : https://frt9.co/0gna40
TCX Street Ace Waterproof Shoes : https://frt9.co/hg5bgh
Klim Krios Pro Arsenal Helmet : https://frt9.co/kik7fm
100 Percent Accuri Goggles : https://frt9.co/hw1asi

-
Cinematographer &amp; Editor : Aneesh Shivanekar

-
Gear up for your next adventure at fortnine.ca:
https://frt9.co/98au9e

Connect with us:
http://facebook.com/fortnine
http://instagram.com/fortnine
http://twitter.com/fortninecanada</media:description>
   <media:community>
    <media:starRating count="8907" average="4.95" min="1" max="5"/>
    <media:statistics views="98571"/>
   </media:community>
  </media:group>
 </entry>

And this displays properly as such:


Yet here is what my tt-rss feed https://rss.mydomain.com/public.php?op=rss&id=246&view-mode=all_articles&key=sskrut5dd0694a8cadd shows:

<entry>
	<id>tag:rss.mydomain.com,2019-11-15:/2026228</id>
	<link href="https://www.youtube.com/watch?v=p6qXM_N34TI" rel="alternate" type="text/html"/>
	<title type="html">Royal Enfield Continental GT 650 Review</title>
	<summary type="html"><![CDATA[]]></summary>
	<content type="html"><![CDATA[]]></content>
	<updated>2019-11-16T00:42:50+00:00</updated>
	<author><name>FortNine</name></author>
	<source>
		<id>https://www.youtube.com/channel/UCNSMdQtn1SuFzCZjfK2C7dQ</id>
		<link rel="self" href="https://www.youtube.com/channel/UCNSMdQtn1SuFzCZjfK2C7dQ"/>
		<updated>2019-11-16T00:42:50+00:00</updated>
		<title>FortNine</title></source>


	<link rel="enclosure" 
		type="application/x-shockwave-flash" 
		length="1"
		href="https://www.youtube.com/v/p6qXM_N34TI?version=3"/>

</entry>

So clearly something is hosed (or I have a messed-up understanding of how af_youtube_embed works) as I would expect my tt-rss feed to pretty much have the same entry or, at minimum, show the video embedded. Yet it’s almost all been stripped. And, for the life of me, I can’t figure out what tt-rss code is stripping the youtube entry of all the details before transforming it into:


I’m using v19.8 (762ff9b) if it matters.

Ah, you’re not actually using the API proper, you’re accessing this through a published feed. Different thing and it will remove iframes.

You can still fix it with a plugin by simply hooking the sanitize function and adding iframes back in. However, this will add iframes for everyone. You’ll need to come up with some program logic to decide whether to add iframes back in. Checking if the request is coming in to public.php with op=rss as a parameter should be sufficient, I would think.

I knew I had to be doing something wrong. Thanks for helping me understand this better. But, shirking from the potential to look even more ignorant, what should the plugin af_youtube_embed actually do if not show the embedded plugin for youtube feeds? I’ll go look at the code but it seemed from the description that was the whole purpose of the plugin.

plugins that do things within tt-rss UI don’t necessarily work on generated feeds.

And there I have it. My misunderstanding. I’ll figure out some code to bypass the stripping for generated feeds. Thanks again to both of you.

you should be able to write a system (i.e. loaded in config.php) plugin which sits on HOOK_SANITIZE and whitelists iframes if HOOK_ARTICLE_EXPORT_FEED was called before. something like that should work, i think.

I just threw this together quickly, but it should do the job.

/path/to/tt-rss/plugins.local/iframes_on_export/init.php
<?php

class iframes_on_export extends Plugin {

	private $host;

	public function about() {
		return array(
			1.0,
			'Force iframes to be included in content for published feeds',
			'JustAMacUser',
			true,
			'https://community.tt-rss.org/t/temporarily-remove-all-iframe-stripping/2895/'
		);
	}

	public function api_version() {
		return 2;
	}

	public function init( $host ) {
		$this->host = $host;

		$host->add_hook( $host::HOOK_ARTICLE_EXPORT_FEED, $this );
	}

	public function hook_article_export_feed( $line, $feed, $is_cat ) {
		if ( ! in_array( $this, $this->host->get_hooks( $this->host::HOOK_SANITIZE ) ) )
			$this->host->add_hook( $this->host::HOOK_SANITIZE, $this );

		return $line;
	}

	public function hook_sanitize( $doc, $site_url, $allowed_elements, $disallowed_attributes, $article_id ) {
		if ( ! in_array( 'iframe', $allowed_elements ) )
			$allowed_elements[] = 'iframe';

		return [ $doc, $allowed_elements, $disallowed_attributes ];
	}

}

It will need to be enabled in config.php.

Were you just looking over my shoulder and saying to yourself “He’s taking too damn long to figure out how the codebase works while I can do this is 20 seconds…”?

Thanks for putting this together. However, this isn’t going to work on youtube feeds, correct? I’d also have to throw in:

	public function hook_render_enclosure($entry, $hide_images) {

		$matches = array();

		if (preg_match("/\/\/www\.youtube\.com\/v\/([\w-]+)/", $entry["url"], $matches) ||
			preg_match("/\/\/www\.youtube\.com\/watch?v=([\w-]+)/", $entry["url"], $matches) ||
			preg_match("/\/\/youtu.be\/([\w-]+)/", $entry["url"], $matches)) {

			$vid_id = $matches[1];

			return "<iframe class=\"youtube-player\"
				type=\"text/html\" width=\"640\" height=\"385\"
				src=\"https://www.youtube.com/embed/$vid_id\"
				allowfullscreen frameborder=\"0\"></iframe>";

		}
	}

(of course, with:

$host->add_hook($host::HOOK_RENDER_ENCLOSURE, $this);

…added to init, correct?)

Yeah, you would need something get the enclosure into the content. I was just focused on the part of making it work for the sanitize_hook.

e:

I’ve been proudly breaking fox’s code for years now. :smirk:

ooh i like that, it’s quite clever

I wasn’t able to easily figure out how to get the iframes back into the published feed content. Actually, to cut down on processing and server hits (for both my own server as well as those hosting the feeds), my preference would have been to store the full content on the initial grab and then strip it later. Otherwise, it seems you have plugins that have to go back often and re-get the stripped content. (EDIT: It appears function update_rss_feed in classes/rss_utils.php might be where the data gets initially stored but I haven’t yet figured iframe stripping completely. I believe it might be as simple as returning true as the first step in function iframe_whitelisted of /include/functions.php.)

So, call it taking the easy way out, I added a local filter to Af_ComicFilter. Perhaps this will be useful to others to get the iframes back into re-published feeds:

<?php
class Af_videos extends Af_ComicFilter {

	function supported() {
		return array("Video Sites");
	}

	function process(&$article) {

	// quickly discover if we have a whitelisted site and avoid further processing if we don't
	$find_video_sites = array('youtu.be' ,'youtube.com', 'vimeo.com', 'facebook.com', 'video.valme.io');
	$video_site = $this->strpos_arr($article["link"], $find_video_sites);

		if ($video_site) {

			// add your own regex for additional sites but don't forget to whitelist above in $find_video_sites
			// youtube regex from http://stackoverflow.com/questions/5830387/how-to-find-all-youtube-video-ids-in-a-string-using-a-regex/5831191#5831191
			$site_regex = array(
				'youtube' => array(
					'~
					# Match non-linked youtube URL in the wild. (Rev:20130823)
					https?://         # Required scheme. Either http or https.
					(?:[0-9A-Z-]+\.)? # Optional subdomain.
					(?:               # Group host alternatives.
					  youtu\.be/      # Either youtu.be,
					| youtube         # or youtube.com or
					  (?:-nocookie)?  # youtube-nocookie.com
					  \.com           # followed by
					  \S*             # Allow anything up to VIDEO_ID,
					  [^\w\s-]       # but char before ID is non-ID char.
					)                 # End host alternatives.
					([\w-]{11})      # $1: VIDEO_ID is exactly 11 chars.
					(?=[^\w-]|$)     # Assert next char is non-ID or EOS.
					(?!               # Assert URL is not pre-linked.
					  [?=&+%\w.-]*    # Allow URL (query) remainder.
					  (?:             # Group pre-linked alternatives.
						[\'"][^<>]*>  # Either inside a start tag,
					  | </a>          # or inside <a> element text contents.
					  )               # End recognized pre-linked alts.
					)                 # End negative lookahead assertion.
					[?=&+%\w.-]*        # Consume any URL (query) remainder.
					~ix'
					),
				'vimeo' => array(
					"/(https?:\/\/)?(www\.)?(player\.)?vimeo\.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*/"
					),
				'facebook' => array(
					"~^(https?://www\.facebook\.com/)(?:video\.php\?v=(\d+)|.*?/videos/(\d+)/?)$~m"
					),
				'valme' => array(
					"~^(https?://video\.valme\.io\/videos\/watch\/)(.*)$~m"
					)
				);

			$video_id = false;

			foreach ($site_regex as $site_name => $regex_array) {

				foreach ($regex_array as $regex) {

					// grab video_id from regex
					if (preg_match($regex, $article["link"], $id)) {

						switch ($site_name)
						{
							case 'youtube':
								$video_id = (isset($id[1]) && $id[1]) ? $id[1] : 0;
								$src = "https://www.youtube.com/embed/" . $video_id . "?wmode=transparent";
								break 3;
							case 'vimeo':
								$video_id = (isset($id[5]) && $id[5]) ? $id[5] : 0;
								$src = "https://player.vimeo.com/video/" . $video_id;
								break 3;
							case 'facebook':
								if (isset($id[2]) && $id[2]) { $video_id = $id[2]; }
								elseif (isset($id[3]) && $id[3]) { $video_id = $id[3]; }
								else {  $video_id = 0; }

								$src = "https://www.facebook.com/video/embed?video_id=" . $video_id;
								break 3;
							case 'valme':
								$video_id = (isset($id[2]) && $id[2]) ? $id[2] : 0;
								$src = "https://video.valme.io/videos/embed/" . $video_id;
								break 3;
							default:
								break;
						}
					}
				}
			}

			// add iframe with combined src/video_id to existing content
			if ($video_id) {
				$iframe = '<div><iframe frameborder="0" height="385" scrolling="no" src="' . $src . '" width="640" allowfullscreen type="text/html"></iframe></div>';
				$article["content"] .= $iframe;

				return true;
			}
		}

		return false;
	}

	// https://stackoverflow.com/questions/6284553/using-an-array-as-needles-in-strpos
	function strpos_arr($haystack, $needle) {
		if (!is_array($needle)) { $needle = array($needle); }

		foreach($needle as $what) {
			if (($pos = strpos($haystack, $what))!==false) { return $pos; }
		}

		return false;
	}
}

Please let me know if you see any problems with it (or if there are other video sites you think I should add).

As an aside, is there a quick way to purge a feed and then re-get it (for testing)? I often deleted and then re-added feeds through the UI (and then waited for them to fetch) to test code and it would have been much quicker if there was a button (or CLI command). Very old but I saw https://srv.tt-rss.org/oldforum/viewtopic.php?f=8&t=248 from 2007 and @fox referred to there being something called “emergency action (clear feed)” and then an “option to purge feed will be there” after an updated “feed prefs toolbar” but didn’t see either in the current UI. I did find the “Feed Debugger” option when right-clicking on a feed but the refetch didn’t seem to ever call the Af_ComicFilter plugin so I suspected the feed entries weren’t getting cleared.

I don’t have time to address your full post, but… Don’t modify the core code, it will make updates a pain. Use HOOK_ARTICLE_FILTER to modify the article after TT-RSS gets the article but before it inserts it into the database. This is what I do for YouTube feeds and it saves a bunch of headaches later when you want to get the article through the API or other means.

iframes are stripped in the sanitize() function in include/functions.php.

(Also, don’t @ tag people unless you need their specific attention.)

My bad.

I hear ya’ on both not modifying the core code and iframes being stripped in sanitize(). I’m just playing around with temporary modifications to core code so I can figure out how it works. But even if I comment out pretty much the entire code in sanitize(), the fully retrieved HTML doesn’t seem to be stored in the database. What I see is the same stripped/sanitized content. So not trying to be difficult - I just don’t understand yet what else is doing the stripping.

Right… Modifying just the sanitizing functions isn’t going to do much because plugins like the YouTube one modify the content before it’s rendered to the user in the normal UI. If you’re generating a feed those plugin hooks are never called so there’s no content to remove (or not remove, in your case) because that plugin hasn’t even been called to insert the iframe.

If you use the hook I mentioned above that’s when it goes into the database and if you add the iframes there it will be stored in the database and served all the time no matter which method you use to call the content. You can, alternatively, modify the YouTube plugin to hook even more places; that would also work. (Either way, for published feeds you’ll need the code I provided earlier in this thread to ensure iframes are not removed during sanitizing.)

running feed debugger with force rehash will process all feed articles again.

Thanks. I had tried that but didn’t seem to work. Perhaps my expectations were wrong - I expected the existing articles/entries in the database to be deleted, then refetched, then shown as unread. However, when I do:

SELECT DISTINCT date_entered, to_char(date_entered, 'IYYY-IW') AS yyiw, guid, ttrss_entries.id,ttrss_entries.title, updated, label_cache, tag_cache, always_display_enclosures, site_url, note, num_comments, comments, int_id, uuid, lang, hide_images, unread,feed_id,marked,published,link,last_read,orig_feed_id, last_marked, last_published, content, author,score FROM ttrss_entries LEFT JOIN ttrss_user_entries ON (ref_id = ttrss_entries.id),ttrss_feeds WHERE ttrss_user_entries.feed_id = ttrss_feeds.id AND ttrss_user_entries.owner_uid = '2' AND feed_id = '405' ORDER BY date_entered DESC, updated DESC LIMIT 60 OFFSET 0

…it still shows the articles/entries from 2 days ago. So it appeared the only way to actually purge the articles/entries was to delete and re-add the feed.