Af_readability issue on nytimes.com

Hi Fox,

This plugin mostly works well, but in about 50% of the articles I get from nytimes.com I get this weird artifact:

Usually the article follows it regardless, and it works for the rest of the articles. I did notice that the JavaScript snippit that is included was the same every time, so I looked where I could find it, and I found it in the original url:

Since you may need a subscription to access this site, I’ve downloaded a raw dump of the source, and can give it to you if you need it, but I did see snippets like this in the html (when I search for the start of that JS):

I have the plugin configured for this feed:

I have also updated to the latest Git version, updated my Linux server version, and tested it on both the Android RSS reader and in browser on PC.

I suspect it is a problem with quote in quoted string matching in af_readability…

Regards,
Marius.

readability is not going to work for everything, especially if its javascript mess. if you think it can be improved, report to its authors? github -> php-readability i think.

Got it, will do, thought it was all written by you.

For future reference I’ve reported it here:

From the link I posted where I reported the bug in Readability, the author gave some instructions on flags to set when it is used in other projects. (seems likely to fix this issue)

Are the advice he gave of any use?

Try enabling the summonCthulhu flag.

let’s see what this is about:

There’s a workaround for this: using the summonCthulhu option. This will remove all script tags via regex,

we’re not going to be parsing XML with regular expressions. readability itself is enough of a hack already, i think.

which is not ideal because you may end up summoning the lord of darkness.

image

soyboy.jpg is how i would imagine a person who writes like that to look, yeah.

For the record, tt-rss is a terrible project in terms of dependencies, as they literally copied and pasted the readability (instead of using a sane solution, like composer). I don’t know in which version they are stuck, so the summonCthulhu flag may or may not exist there.

i’m not really interested in discussing his opinions but maybe if his library had versioning information available anywhere other than on a single third-party website (related to a packaging system he thinks should be exclusively used by everyone) this would be easier to tell.

then again maybe it’s too much to ask for from people who genuinely enjoy being locked into an ecosystem of some kind.

I enabled summonCthulhu. Indeed it removed the js artifacts.

However about half of nytimes.com articles still don’t load. Does this article’s full text load for you?