Feed Retrieval Problems: 403 Errors

We increasingly see feeds that have a bunch of 403 errors. When you refresh the feed, it gets the latest articles and shows an OK 200 but the whole point of NewsBlur is automatic retrieval. Unfortunately, NewsBlur doesn’t let you know which feeds have a problem. The feed just doesn’t show anything new. You have to go into settings and then you see all the 403 errors. At the very least, NewsBlur should flag these feeds. I can send a screenshot to Sam of an example.

4 Likes

I just changed the fetcher to no longer have any difference between a force refresh and a background fetch, so if stories are coming in manually, then something is wrong. Can you post the newsblur.com/site/<feed_id> url of a site that’s giving you an issue?

Sure. Here it is:
https://newsblur.com/site/9326617/social-media-butterfly

It’s generating 403 errors again today. I didn’t click Parse this time so that you can take a look.

I had to parse because we need the new articles from this feed. Is this problem fixable?

This issue remains unresolved. On a related topic, NewsBlur should offer private tech support for paying customers. This forum is far from ideal as there’s no privacy plus tech support issues get lost among feature requests.

I am aware of another feed exhibiting the same behavior -
https://newsblur.com/site/9260033/green-party-news

Hopefully this will help you fix the underlying cause and thus fix any feeds experiencing this behavior.

On another note, I would really appreciate some more diagnostic tooling to know when a feed is reporting errors. This feed doesn’t show the exclamation mark.

+1 for letting us know about these 403 errors.

Interesting that both of these feeds show the same issue, which is that NewsBlur feed fetchers are being forbidden (403 error) by IP address. Can you reach out to these two sites and let them know they are banning NewsBlur’s feed fetchers? Let the publishers know that the IP addresses they should expect to see are listed at https://www.newsblur.com/api/ip_addresses/.

I also reached out to both publishers to ask that they allow NewsBlur to fetch their feeds.

Here’s what I sent:

Hi, I run NewsBlur, an RSS news reader, and my users have complained that this site’s RSS feed is preventing NewsBlur’s feed fetchers from fetching it. NewsBlur publishes the IP addresses of its feeds at: https://www.newsblur.com/api/ip_addresses/. May I ask that you open up your RSS feed to NewsBlur? And it would be super helpful if you could let me know why it was banned in the first place, as there are a number of other feeds that have this same issue. Are you using Cloudflare or some other CMS that bans bots? Thanks for your help!

Did anything come of this? I still have feeds that are showing up as 403. More than when it started. I came back and checked a feed I went to to replace another that was going 403, and I see that it’s doing it now too

In our case, the website gave us a special feed, which we did not test or use. Instead, we created a custom feed using RSS.app that works perfectly. There’s a war on RSS and RSS.app is the solution in most cases.

The issue has been resolved for the example feed I provided

I reported a similar issue at 403s since 8/19 for valid Atom feed

As noted in the issue above, sites seem to be blacklisting Newsblur only and not e.g. Feedly or Inoreader. What might explain such discrepancy?

These errors are a result of the websites being hosted on Cloudflare. Cloudflare gives website owners a way to block “scrapers” and “bots” (many cloudflare sites actually do this by default now) and it’s likely Newsblur is being detected as a malicious bot. It may work for other RSS readers because they’re Verified bots.

But Cloudflare has already been treating Newsblur as a Verified Bot for a long time, at least since the spring: https://radar.cloudflare.com/traffic/verified-bots

Could the 403s be triggered by a misbehaving fetcher?

Or could it be that Hetzner suffers from ongoing reputational issues so great they take precedence over Verified Bots?

Having established that 403s are impacting only Newsblur should make it easier for @samuelclay and the team to root cause it.

1 Like