Feed Retrieval Problems: 403 Errors

Here’s what you can email to publishers:

Cloudflare > Security > Web Application Firewall (WAF) > Custom Rules > Create Rule to whitelist “NewsBlur” in User-Agent “contains” field

Note that custom WAF rules are limited per account. So this won’t always work.

This would also allow malicious bots that spoof the Newsblur agent to get through.

It’s clear that Newsblur is being detected as a malicious bot by Cloudflare’s heuristics across websites. Would it be better to just find out from Cloudflare why the requests are being blocked in the first place? Then adjust the requests so newsblur IPs wont be marked as malicious/ blocked?

NewsBlur is in the Verified Bot program with Cloudflare, and I’ve reached out to them multiple times and they won’t give me any information on where NewsBlur stands.

My hope is that if enough of NewsBlur’s user make noise over on the Cloudflare forums, then we can finally reach somebody high up enough at Cloudflare to allow NewsBlur to fetch those feeds.

I spoke with a publisher this morning and he said that Cloudflare recently increased their “protection” on all of his sites as of a month ago, and that led to a large number of bots being blocked. He’s been whitelisting them individually because that’s the only way to get them in.

Is there any update on this?

I have an idea that I’m trying to find time to work on. It’ll move all of these 403 feeds to a new server in a different queue and allow them to bypass the Cloudflare denial. It’s going to take a bit of time to build, but I see what needs to happen now.

1 Like

Awesome! Good luck!

https://www.ivpn.net/en/blog/index.xml

keeps timing out even though there doesn’t seem to be a problem with it

May be unrelated but you can use open rss for feeds that time out

https://openrss.org/www.ivpn.net/en/blog/index.xml

1 Like

I’m seeing the same problem with Stereogum, though a quick check of their IPs shows Amazon AWS stuff, not Cloudflare. I’ve reached out to them via their technical support contact.

Forgot to mention the site id: NewsBlur

Something must’ve gotten thru on some level as I see Daily Beast is working again.

How’s your project on this coming, Sam?

I just launched Related Stories/Sites about an hour ago, and that was a huge lift: Discover Feeds by samuelclay · Pull Request #1832 · samuelclay/NewsBlur · GitHub. I’ll be blogging about it soon, probably late next week, once it’s working. Now that thats launched, I can work on a new feature.

I found a new project that will allow NewsBlur to act more like a browser to get around some problem feeds, so I’m hopeful that that will work. I’ve noticed that even on my local environment, Cloudflare will block NewsBlur, so they’re not even looking at IP address exlusively, which is disappointing.

1 Like

Sounds complicated! Good work, thanks for the update!

Any new updates on this project?

I just pushed out an update that uses a new service called ScrapeNinja to get around the 403s. It should automatically re-fetch any forbidden feeds, but it may take a couple days to get around to all of those broken feeds. It doesn’t work 100% of the time, but it mainly works. Let me know!

1 Like

Nice…! So far I’ve seen 3 of the feeds that were having issues now working! Very nice!

Yeah, glad to hear it! I have no doubt many will come back to life. It’s what happens in the next few days/weeks, since they are prone to break again. But I’m now paying for the privilege of proxying those sites. I was all set to build this myself but while researching it I came across a github repo that does exactly what I need, and then the README had a company that hosts it and I was sold.

1 Like

This is the repo, fyi:

Still so far so good…very nice. Thank you!

1 Like

Still seems to be working well…