Some error with RSS feeds

wn_name · April 21, 2024, 3:05pm

I just tested that and it worked fine. If you’re using Docker Compose, make sure you’re doing docker compose up -d and not a restart.

expert1 · April 21, 2024, 3:35pm

old tt-rss and new tt-rss vm-servers are located on the same network.
with old version - works fine, with new - not work.
I am talking about this rss feed Известия

User-agent in old and in new version
TTRSS_HTTP_USER_AGENT=Mozilla/5.0 (X11; Linux i686; rv:113.0) Gecko/20111914 Firefox/113.0

fox · April 21, 2024, 4:19pm

dunno what to tell you, if user agent is the same. second victim of guzzlehttp rework?

i’m getting 403 without user agent workarounds so maybe it didn’t actually apply for you?

p.s. you could also try passing the feed through feedburner, maybe they’ll allow it.

expert1 · April 21, 2024, 4:32pm

Probably the problem is somewhere at the php library level.
Thanks for the test and tips!

ps if I transfer the old version of tt-rss to postgres-15, will there be problems with the tt-rss application? I can easily upgrade databases

wn_name · April 21, 2024, 4:33pm

Looks like they have “DDoS-Guard” in place. I’m getting HTTP 403 Forbidden from a couple places. Might just need to try later.

{
	"result": {
		"code": 5,
		"message": "Client error: `GET https://iz.ru/xml/rss/all.xml` resulted in a `403 Forbidden` response:\n<!doctype html><html><head><title>DDoS-Guard</title><meta charset=\"utf-8\"/><meta name=\"viewport\" content=\"width=device-w (truncated...)\n"
	}
}

expert1 · April 21, 2024, 4:38pm

If I open it from a regular browser, sometimes I see a second message “Checking browser”.
In this case, wget --user-agent always works like the old version of tt-rss

wn_name · April 21, 2024, 4:53pm

Looks like the trigger might be use of HTTP/1.1 vs HTTP/2.

# DDoS-Guard page
curl --http1.1 -A 'Tiny Tiny RSS/24.04-d83290712 (https://tt-rss.org/)' https://iz.ru/xml/rss/all.xml
# Normal content
curl --http2 -A 'Tiny Tiny RSS/24.04-d83290712 (https://tt-rss.org/)' https://iz.ru/xml/rss/all.xml

fox · April 21, 2024, 5:01pm

we could make a simple on-fetch plugin for izvestia which would use raw curl instead for those feed urls. if there’s a combination of curl options that works.

then again it might not help reliably against ddos screen they’re using.

shouldn’t be a problem but, as usual, make a backup (and verify that you can restore it).

fox · April 21, 2024, 5:03pm

can guzzle prefer http2?

wn_name · April 21, 2024, 5:13pm

Yeah, just need to set 'version' => 2 (or GuzzleHttp\RequestOptions::VERSION => 2) in the request options; curl will fall back to 1.1 if needed.

I just tested it out and hit DDoS-Guard again. Even with HTTP/2 being used, it seems there’s still some discernible difference between CLI curl and what’s happening in PHP land.

edit: It looks like ALPN was being used to tell the difference. Adding \CURLOPT_SSL_ENABLE_ALPN => false to the curl request options got things working in PHP with both HTTP/1.1 and HTTP/2.

fox · April 21, 2024, 6:08pm

i wonder if that’s a sane enough configuration that we could use as a default.

enabling ALPN shouldn’t be a bad thing too.

wn_name · April 21, 2024, 7:22pm

Disabling ALPN feels slightly hacky (or maybe just “limited benefit”) to me, but I’m guessing things would keep working-- just not as smoothly as it could be. Since it seems like a widespread and generally useful feature, I’d probably lean towards the plugin (or configuration) option unless disabling ALPN would also help with Cloudflare, etc.

fox · April 22, 2024, 4:25am

oh, i misread it as force-enabling alpn instead of disabling it. yeah, disabling is hacky.

expert1 · April 22, 2024, 7:59am

I looked at the git history on the working version.
Old version that work
3b4e12ff Andrew Dolgov [email protected] on 02.04.2023 at 20:07

wn_name · April 22, 2024, 2:07pm

Are there any other differences between your old and new system (e.g. switching from host installation to the Docker image, OpenSSL and/or PHP version change, etc.)?

expert1 · April 22, 2024, 4:25pm

in both cases default docker compose file (old and new docker compose) from instruction
in old version i see php 8.2 (default) in new version 8.3 (default)

wn_name · April 22, 2024, 7:46pm

Below is a very basic and brittle plugin you could try (place in plugins.local/ddos_guard_workaround/init.php). No real error handling, best practices, etc.

<?php

class Ddos_Guard_Workaround extends Plugin {
	const SITES_TO_HANDLE = [
		'https://iz.ru/',
	];

	public function about() {
		return [
			null, // version
			'Workaround for DDoS Guard on certain sites', // description
			'', // author
			false, // is system
			'', // more info URL
		];
	}

	public function api_version() {
		return 2;
	}

	public function init($host): void {
		$host->add_hook($host::HOOK_SUBSCRIBE_FEED, $this);
		$host->add_hook($host::HOOK_FEED_BASIC_INFO, $this);
		$host->add_hook($host::HOOK_FETCH_FEED, $this);
	}

	public function hook_subscribe_feed($contents, $url, $auth_login, $auth_pass) {
		return self::should_handle($url) ? self::fetch($url) : $contents;
	}

	public function hook_feed_basic_info($basic_info, $fetch_url, $owner_uid, $feed_id, $auth_login, $auth_pass) {
		return self::should_handle($fetch_url) ? ['site_url' => $fetch_url, 'title' => $fetch_url] : $basic_info;
	}

	public function hook_fetch_feed($feed_data, $fetch_url, $owner_uid, $feed, $last_article_timestamp, $auth_login, $auth_pass) {
		if (!self::should_handle($fetch_url)) {
			return $feed_data;
		}
		$content = self::fetch($fetch_url);
		return $content ?: $feed_data;
	}

	private static function should_handle(string $url): bool {
		foreach (self::SITES_TO_HANDLE as $site_prefix) {
			if (str_starts_with($url, $site_prefix)) {
				return true;
			}
		}
		return false;
	}

	private static function fetch(string $url): string {
		$ch = curl_init();
		curl_setopt($ch, \CURLOPT_URL, $url);
		curl_setopt($ch, \CURLOPT_SSL_ENABLE_ALPN, false);
		curl_setopt($ch, \CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux i686; rv:113.0) Gecko/20111914 Firefox/113.0');
		curl_setopt($ch, \CURLOPT_RETURNTRANSFER, 1);
		$result = curl_exec($ch);
		return $result ?: '';
	}
}

expert1 · April 23, 2024, 8:09am

Cool! it works. Thank You!
What does the CURLOPT_SSL_ENABLE_ALPN parameter mean? Without it, as I understand it, this won’t work? and why "Disabling ALPN feels slightly hacky "?

wn_name · April 23, 2024, 10:39am

ALPN lets a client and server determine which HTTP protocol to use on a new connection very early in the process-- meaning it makes things more efficient, which is a good thing.

DDoS Guard (which is in front of that website) is apparently using information sent during ALPN, among other factors, to profile clients. For some unknown reason DDoS Guard doesn’t like what gets sent by libcurl on the new container image, and triggers the “checking browser” process you mentioned.

Setting CURLOPT_SSL_ENABLE_ALPN to false means ALPN won’t be used, so its information won’t be available to DDoS Guard. This currently allows the request to go through without triggering DDoS protection, however that might change. This workaround should not be necessary, since it should be expected that various non-browser clients would request Atom/RSS feeds.

expert1 · April 23, 2024, 12:41pm

Thanks for clarifying