Spurious entries in the feed for "In the Pipeline"

The feed at http://pipeline.corante.com/index.xml has been showing a lot of spurious entries of late. The entries have a title taken from a recent post, but no body and no url. The title is still a link, but the url is just a formatted date like “2014-09-08 16:37:06.910087”. A quick look at the feed shows that it is truncated but looks correct in all other respects. Obviously the truncation is going to make it difficult to parse normally, so you must have a fallback parser of some kind, which is cool. Or perhaps your parser just doesn’t rely on the xml being valid in the first place. Either way, this feed is confusing it.

Incidentally, the broken links actually crash the Newsblur Android app.

1 Like

It’s because the publisher is changing the URLs in the feed, but because there is little to no content, there is nothing for NewsBlur to de-dupe. And while I used to de-dupe solely by title, there are more feeds where that’s a poor strategy than not.

But you’re not parsing urls out of the feed here, you’re accidentally picking up timestamps instead.