Tiny Tiny RSS: Community

Wrong encoding content and summary tags (html entities -> utf8)

Describe the problem you’re having:

Bug is the following:

if i try to make rss feed out of tt-rss (link like /public.php?op=rss&id=-3&key=KEY) - inside the feed inside content and summary tags html entities instead of utf8 text:

It looks like:

<title type="html">
Президент РФ поддержал предложение о субсидиях аграриям в 4,5 миллиарда рублей
</title>
<summary type="html">
<![CDATA[
<p>&#1070;&#1083;&#1080;&#1103; &#1054;&#1075;&#1083;&#1086;&#1073;&#1083;&#1080;&#1085;&#1072;: &laquo;&#1044;&#1072;&#1085;&#1085;&#1099;&#1077; &#1084;&#1077;&#1088;&#1099; &#1087;&#1086;&#1076;&#1076;&#1077;&#1088;&#1078;&#1082;&#1080; &#1086;&#1095;&#1077;&#1085;&#1100; &#1089;&#1074;&#1086;&#1077;&#1074;&#1088;&#1077;&#1084;&#1077;&#1085;&#1085;&#1099; &#1080; &#1074;&#1086;&#1089;&#1090;&#1088;&#1077;&#1073;&#1086;&#1074;&#1072;&#1085;&#1099;&raquo;</p>
]]>
</summary>
<content type="html">
<![CDATA[
<p>&#1070;&#1083;&#1080;&#1103; &#1054;&#1075;&#1083;&#1086;&#1073;&#1083;&#1080;&#1085;&#1072;: &laquo;&#1044;&#1072;&#1085;&#1085;&#1099;&#1077; &#1084;&#1077;&#1088;&#1099; &#1087;&#1086;&#1076;&#1076;&#1077;&#1088;&#1078;&#1082;&#1080; &#1086;&#1095;&#1077;&#1085;&#1100; &#1089;&#1074;&#1086;&#1077;&#1074;&#1088;&#1077;&#1084;&#1077;&#1085;&#1085;&#1099; &#1080; &#1074;&#1086;&#1089;&#1090;&#1088;&#1077;&#1073;&#1086;&#1074;&#1072;&#1085;&#1099;&raquo;</p>
]]>
</content>
<updated>2020-05-20T14:42:00+00:00</updated>
<author>
<name/>
</author>

How can i change it to show text in plain utf8?

tt-rss version (including git commit id):

Fresh installed tt-rss, from git, master branch, version of 21.05.2020 (can’t see build as there is " [Tiny Tiny RSS] vUNKNOWN (Unsupported) © 2005-2020 [Andrew Dolgov]" label

Just copied fresh master and updated (just before posting this post) - same issue

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:

php version 5.6.40
DB: PG

Please provide any additional information below:

this is something that DOMDocument does on save, i think. i’ve never bothered to investigate whether it is possible to disable it, if someone does provide a fix (which doesn’t involve mb_convert_encoding afterwards) I’ll review the PR.

Nice! That would be cool to see update that fix this issue

I did a quick search and I don’t think this is resolvable using DOMDocument alone. Although mb_convert_encoding($doc->saveHTML(), 'UTF-8', 'HTML-ENTITIES') worked everytime. :man_shrugging:

e: My instinct tells me mb_convert_encoding would probably break other things at some point.

maybe so, also it’s a really-really ugly hack.

Where should i put this code to make “ugly” fix?

Probably when adding the content to the feed generator.

I recommend against doing this.