Duplicate stories in feed for Mother Jones

The main feed for Mother Jones (http://feeds.feedburner.com/motherjon…) results in tons of duplicate stories. Nearly every story has 3-6 copies.

1 Like

This feed is perfect! Ok, here’s the insanity going on with this feed and how it relates to many other duplicate stories.

For some reason, they keep changing the ‘id’ on each story. Here are 6 of the exact same story:

> _.map(_.range(62,67), function(i) { return NEWSBLUR.reader.model.stories[i-1].id; })

["http://motherjones.com/rss/blogs_and_… at http://motherjones.com",
"http://motherjones.com/rss/blogs_and_… at http://motherjones.com",
"http://motherjones.com/rss/blogs_and_… at http://motherjones.com",
"http://motherjones.com/rss/blogs_and_… at http://motherjones.com",
"http://motherjones.com/rss/166286 at http://motherjones.com"]

It looks like the ids are rotating randomly through those weird ids. But this gives me a change to really figure out what’s wrong with the de-dupe detection system I have in place.

Here’s the system in the feed fetcher: https://github.com/samuelclay/NewsBlu…

For what it’s worth, Kevin Drum’s MoJo feed is similarly duped.


Oh this feed is such a mess. I’ve been wrestling with this one for the past three days. It’s got its own unit tests now.

The problem is that the feed is doing this insane xml:base attribute that is linked to their favicon. But it keeps changing. So they are effectively saying that the stories are new when they’re not, and for some reason NewsBlur isn’t picking up on it. I’m going to try and fix that, but it’s not so easy.

The *real* solution is to email them (which I will be doing) and have them fix it. Stay tuned.