Cannot add feed from http://blog.ploeh.dk

Dave_Shaw · September 11, 2013, 12:46pm

I cannot add a feed from http://blog.ploeh.dk/ (either atom.xml or rss.xml). NewsBlur is just timing out, I can view the rss.xml and atom.xml in my browser without and issues.

The atom.xml (which is the default and has other subscribers) is reporting the following:

2013-09-10 21:04:34 Timeout (505)

samuelclay · September 11, 2013, 5:38pm

Timeout is right, the feed takes over 20s to fetch and process. The average feed takes < 1s, so NewsBlur gives up on it after a while.

Dave_Shaw · September 11, 2013, 10:30pm

Thanks.
Is there anything I can do? Is it worth contacting the author? It still only takes seconds in the browser to fetch,

Cheers,
Dave

tedder42 · September 12, 2013, 12:53am

it’s the problem of the blog. the main page loads in 1/2 second, atom takes 3-6 seconds for me to fetch, and RSS takes 3-10 seconds. You could actually hack it if you are a supergeek but it’s related to however the author implemented their blog, made worse with the Schmiel’s algorithm (showing all blog entries from all time, so it gets worse as time goes on).

samuelclay · September 12, 2013, 12:58am

Schlemiel the Painter’s algorithm. Great story.

Mark_Seemann · October 14, 2013, 7:28pm

As the owner of this blog, I’ve been asked to chip in.

First of all: I do agree that as the content of my blog grows over time, this might actually turn into a problem at some date, so I might consider adding paging to the Atom feed (I don’t know if this is possible for RSS feeds). If anyone has a good Jekyll template for generating paged Atom feeds, I’d be happy to receive pointers.

That said, it strikes me as odd that this is a problem right now. At the moment, I’m travelling, and on a flaky mobile broadband connection, I’m able to pull both Atom and RSS feeds down in a matter of seconds.

Compressed, these pages take up less than a MB. Uncompressed, they’re about 3.5 MB. Thus, the download times sounds sane to me (i.e. I don’t think I’m just seeing a cached copy).

Other feed aggregators (personally, I use goread.io, a single-man project) don’t have any problems with these feed files, so I’m left wondering why you are seeing this.

tedder42 · October 14, 2013, 8:13pm

Mark- you might consider using a caching layer for the RSS- either put it on a cron or use CloudFront/etc with a min TTL of 60 minutes. It’s pretty easy to do.

It’s taking me 6.6-8.0 seconds to download with a very fast connection.

I do understand being a single-engineer project. It’s always a little fun.

Mark_Seemann · October 15, 2013, 6:43am

Caching is already enabled. These pages are hosted by GitHub Pages. Here’s an example of the HTTP header:

HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/xml
Last-Modified: Fri, 11 Oct 2013 12:57:29 GMT
Expires: Tue, 15 Oct 2013 05:24:04 GMT
Cache-Control: max-age=600
Content-Encoding: gzip
Content-Length: 783407
Accept-Ranges: bytes
Date: Tue, 15 Oct 2013 06:34:25 GMT
Via: 1.1 varnish
Age: 4821
Connection: keep-alive
X-Served-By: cache-am76-AMS
X-Cache: MISS
X-Cache-Hits: 0
Vary: Accept-Encoding

This is when I do an unconditional GET request. If I do a normal GET request (If-Modified-Since: Fri, 11 Oct 2013 12:57:29 GMT), I get back only the HTTP header.

As you can see, max-age is defined as 10 minutes, and above, you can even see from the X-headers that there’s a reverse proxy or similar in front of the web server.

Dave_Shaw · May 27, 2014, 11:55pm

I’ve finally managed to solve this thanks to Yahoo Pipes, details on my blog here: http://taeguk.co.uk/blog/dealing-with…

Dave_Shaw · August 16, 2015, 7:28pm

Even better, I managed to submit a PR to the ploeh.dk feed and get RSS only returning a fewer number of posts.

samuelclay · August 17, 2015, 9:38pm

Awesome, thanks for following up Dave!