Tiny Tiny RSS: Community

Regular expressions containing the symbol "<" do not work in filters

When I try to add a regular expression containing the symbol “<” (which is necessary for e.g. lookbehind assertions) as a filtering condition, this is the result that I get:

I am running the latest Git version of tt-rss on macOS 10.14.5, PHP 7.3.7, MySQL 8.0.16.

this is because ‘<’ opens html tags which are stripped out

So how do I use lookbehind assertions in regular expressions? Is there a way to escape them?

i guess you’ll have to find another way to solve your actual problem, whatever it might be.

no.

html not being allowed in filters has been reported before but there’s issues with enabling this so it’s unlikely to happen.

What is your actual filter?
Are you putting it inside ()?
I used to use pos & neg lookbehinds all the time and had no issues.
Fox fixed the issue I was having with HTML being stripped.

really? as far as i remember html is still being stripped from filters.

The regexp I’m trying to add is this: (?<!не )ищу отношени(я|й)

It’s been a few years, but I remember having an issue and you tweaked something because I was no longer having any problems with my filters using lookbehinds and lookaheads.

Now I no longer use look (ahead|behind), so I wasn’t aware of any issues.

I was trying to find issue i submitted, but it was before the switch to discourse. I’ll try to find it. Maybe i kept something locally on my box.

I found it on the “OLD FORUM”. It was from 2015.

I was using filter: (?<!peter )(parker) and the < was causing an issue.

See exchange here:
Posts from OLD FORUM

yeah i’m afraid this will get filtered currently, it was changed sometime after the PDO overhaul i think.

btw as a terrible workaround you can add (or update) whatever regular expression directly in the database, stripping only happens in the actual editor UI. as long as you don’t edit the filter afterwards it’ll work.

Not the only place it gets stripped out it seems…

Screenshot%20from%202019-07-15%2011-29-28

Feed:

      <item>
        <title>Regular expressions containing the symbol &quot;&lt;&quot; do not work in filters</title>
        <dc:creator><![CDATA[@Avoozl]]></dc:creator>
        <description><![CDATA[ <p>So how do I use lookbehind assertions in regular expressions? Is there a way to escape them?</p> ]]></description>
        <link>https://discourse.tt-rss.org/t/regular-expressions-containing-the-symbol-do-not-work-in-filters/2609/3</link>
        <pubDate>Sun, 14 Jul 2019 17:17:50 +0000</pubDate>
        <guid isPermaLink="false">discourse.tt-rss.org-post-9248</guid>
      </item>
      <item>
        <title>Regular expressions containing the symbol &quot;&lt;&quot; do not work in filters</title>
        <dc:creator><![CDATA[@fox]]></dc:creator>
        <description><![CDATA[ <p>this is because ‘&lt;’ opens html tags which are stripped out</p> ]]></description>
        <link>https://discourse.tt-rss.org/t/regular-expressions-containing-the-symbol-do-not-work-in-filters/2609/2</link>
        <pubDate>Sun, 14 Jul 2019 17:17:09 +0000</pubDate>
        <guid isPermaLink="false">discourse.tt-rss.org-post-9247</guid>
      </item>
      <item>
        <title>Regular expressions containing the symbol &quot;&lt;&quot; do not work in filters</title>

that’s strange, if source properly escapes it to &lt; then it shouldn’t get removed by tt-rss element filter.

in any case in those kind of situations i think it’s better to remove too much from time to time than let something through.

e: looks like this is exclusive to title, where tt-rss uses php native strip_tags() instead of DOM filter, i think, maybe it’s a bit too aggressive.