Cannot add feed from http://blog.ploeh.dk

I cannot add a feed from http://blog.ploeh.dk/ (either atom.xml or rss.xml). NewsBlur is just timing out, I can view the rss.xml and atom.xml in my browser without and issues.

The atom.xml (which is the default and has other subscribers) is reporting the following:

2013-09-10 21:04:34 Timeout (505)

2 Likes

Timeout is right, the feed takes over 20s to fetch and process. The average feed takes < 1s, so NewsBlur gives up on it after a while.

Thanks.
Is there anything I can do? Is it worth contacting the author? It still only takes seconds in the browser to fetch,

Cheers,
Dave

it’s the problem of the blog. the main page loads in 1/2 second, atom takes 3-6 seconds for me to fetch, and RSS takes 3-10 seconds. You could actually hack it if you are a supergeek but it’s related to however the author implemented their blog, made worse with the Schmiel’s algorithm (showing all blog entries from all time, so it gets worse as time goes on).

Schlemiel the Painter’s algorithm. Great story.

As the owner of this blog, I’ve been asked to chip in.

First of all: I do agree that as the content of my blog grows over time, this might actually turn into a problem at some date, so I might consider adding paging to the Atom feed (I don’t know if this is possible for RSS feeds). If anyone has a good Jekyll template for generating paged Atom feeds, I’d be happy to receive pointers.

That said, it strikes me as odd that this is a problem right now. At the moment, I’m travelling, and on a flaky mobile broadband connection, I’m able to pull both Atom and RSS feeds down in a matter of seconds.

Compressed, these pages take up less than a MB. Uncompressed, they’re about 3.5 MB. Thus, the download times sounds sane to me (i.e. I don’t think I’m just seeing a cached copy).

Other feed aggregators (personally, I use goread.io, a single-man project) don’t have any problems with these feed files, so I’m left wondering why you are seeing this.

Mark- you might consider using a caching layer for the RSS- either put it on a cron or use CloudFront/etc with a min TTL of 60 minutes. It’s pretty easy to do.

It’s taking me 6.6-8.0 seconds to download with a very fast connection.

I do understand being a single-engineer project. It’s always a little fun.

Caching is already enabled. These pages are hosted by GitHub Pages. Here’s an example of the HTTP header:

HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/xml
Last-Modified: Fri, 11 Oct 2013 12:57:29 GMT
Expires: Tue, 15 Oct 2013 05:24:04 GMT
Cache-Control: max-age=600
Content-Encoding: gzip
Content-Length: 783407
Accept-Ranges: bytes
Date: Tue, 15 Oct 2013 06:34:25 GMT
Via: 1.1 varnish
Age: 4821
Connection: keep-alive
X-Served-By: cache-am76-AMS
X-Cache: MISS
X-Cache-Hits: 0
Vary: Accept-Encoding

This is when I do an unconditional GET request. If I do a normal GET request (If-Modified-Since: Fri, 11 Oct 2013 12:57:29 GMT), I get back only the HTTP header.

As you can see, max-age is defined as 10 minutes, and above, you can even see from the X-headers that there’s a reverse proxy or similar in front of the web server.

I’ve finally managed to solve this thanks to Yahoo Pipes, details on my blog here: http://taeguk.co.uk/blog/dealing-with…

Even better, I managed to submit a PR to the ploeh.dk feed and get RSS only returning a fewer number of posts.

Awesome, thanks for following up Dave!