505 on feeds that work fine in other RSS services

I have several feeds, all from the same website, that give a 505 error, like this one:

https://www.trouw.nl/achterpagina/rss.xml

It works fine in BazQux and Feedbin.

There are more with problems:

https://www.volkskrant.nl/voorpagina/rss.xml
https://www.ed.nl/psv/rss.xml

These also work fine in other RSS services.

What’s strange to me is that the feed loads fine on my local dev instance. I believe they might be using some CloudFlare like product that automatically slows clients that ping too often. Try reaching out to them and see if they could let NewsBlur crawl its RSS feeds.

I’ve tested in BazQuz, Feedbin, The Old Reader, Feedly and InnoReader. All of them have no problem at all with these feeds. Only NewsBlur.

I can ask the publishers, but unfortunately there’s almost 0% chance that they will do any effort, because they will say that it works fine.

Are you sure it isn’t a problem at your side?

Yep I’m sure because the problem isn’t on NewsBlur’s side either as my local dev works fine. The publisher is purposefully dropping newsblur requests.

Shouldn’t you get a different error then, instead of " 505 HTTP Version Not Supported"?

FYI:
I got an answer from the webmaster of trouw.nl saying that they don’t block traffic from newsblur.com and that, according to him, the RSS feed is loading fine.

Oh that 505 is an internal error code, not the http result. And they are blocking NewsBlur, here’s the proof:

>>> u = "https://www.trouw.nl/achterpagina/rss.xml"
>>> import requests
>>> requests.get(u, headers={'Accept': 'application/atom+xml, application/rss+xml, application/xml;q=0.8, text/xml;q=0.6, */*;q=0.2', 'Accept-Encoding': 'gzip, deflate'}, timeout=2)
<Response [200]>
>>> requests.get(u, headers={'User-Agent': 'NewsBlur Feed Fetcher - 3 subscribers - https://www.newsblur.com/site/8909290/trouw-achterpagina ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15")', 'Accept': 'application/atom+xml, application/rss+xml, application/xml;q=0.8, text/xml;q=0.6, */*;q=0.2'}, timeout=2)
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/stdlib.py", line 107, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 531, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

Notice when I remove the User-Agent, it works. Now, we should have had a mode where it auto-retries without the user-agent, but that didn’t trigger because this was a timeout and not an immediate error. Let them know and they should be able to figure it out.

Even more curious, I’ve narrowed it down to the inclusion of Chrome/112.0.0.0:

>>> requests.get(u, headers={'User-Agent': 'NewsBlur Feed Fetcher - 3 subscribers - https://www.newsblur.com/site/8909290/trouw-achterpagina ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/537.36 Edg/112.0.1722.48")', 'Accept': 'application/atom+xml, application/rss+xml, application/xml;q=0.8, text/xml;q=0.6, */*;q=0.2'}, timeout=2)
<Response [200]>
>>> requests.get(u, headers={'User-Agent': 'NewsBlur Feed Fetcher - 3 subscribers - https://www.newsblur.com/site/8909290/trouw-achterpagina ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Chrome/112.0.0.0")', 'Accept': 'application/atom+xml, application/rss+xml, application/xml;q=0.8, text/xml;q=0.6, */*;q=0.2'}, timeout=2)
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/stdlib.py", line 107, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 531, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

Here it is failing with just that Chrome tag:

>>> requests.get(u, headers={'User-Agent': 'Chrome/112.0.0.0'}, timeout=2)
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/site-packages/sentry_sdk/integrations/stdlib.py", line 107, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 531, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.trouw.nl', port=443): Read timed out. (read timeout=2)

I was afraid of that; publisher not doing any effort.

I only see a read timeout. Don’t know if you see that as proof for blocking?

I really hope you’re able to find a solution, as multiple major Dutch newspapers are using the same server configuration; hence the same errors on multiple newspaper feeds.

Send that last example to them, it’s clearly breaking for that Chrome user agent.

I will try …

FYI: I did test it myself, and didn’t have any problem with using that agent string.

Method: GET, RequestUri: 'https://www.trouw.nl/achterpagina/rss.xml', Version: 2.0, Content: <null>, Headers:
{
  User-Agent: Chrome/112.0.0.0
}

StatusCode: 200, ReasonPhrase: '', Version: 2.0, Content: System.Net.Http.StreamContent, Headers:
{
  Strict-Transport-Security: max-age=31536000 ; includeSubDomains ; preload
  permissions-policy: ch-ua-model=*,ch-ua-platform-version=*
  x-correlation-id: e7c6a691a8f244a28f0433f46aec11f3
  Cache-Control: public, must-revalidate, max-age=33
  Date: Tue, 09 May 2023 14:17:02 GMT
  X-Frame-Options: SAMEORIGIN
  accept-ch: sec-ch-ua-model,sec-ch-ua-platform-version
  X-Content-Type-Options: nosniff
  x-xss-protection: 1; mode=block
  Content-Length: 15523
  Content-Language: en
  Last-Modified: Tue, 09 May 2023 14:16:35 GMT
  Content-Type: application/rss+xml; charset=UTF-8
  Expires: Tue, 09 May 2023 14:17:35 GMT
}