Official Google Blog duplicates and republishing?

http://googleblog.blogspot.com/

The blog seems to triple postings, and I read most of the articles last night and this morning they are “new” again. It only does it on this blog.

1 Like

I’ve also been seeing this for several days now. Only happens with the Google blog.

Still getting 3 copies of every story.

Yeah - also having this problem.  Happened after Google “combined” their product blogs into this one.  The feedreader (http://feeds.feedburner.com/blogspot/MKuf) doesn’t seem to show the stories in multiple.

Now becoming very annoying. Seeing repeats of last weeks and this weeks posts. And a sprinkle of dupes.

Yet no response or fix

Now p****** me off. Today I got ALL of this weeks blog entries again today

I responded to another thread that was similar to this one. Anyway, I’m not sure why the de-duper is not picking up on these stories being the same but having different IDs. Actually, I think it is working just fine, but there may be a merge problem. Because this is such a central blog, lots of other blogspot blogs redirect to this one, and when they get merged, their stories also get merged. And because of a quirk in how Blogspot handles permalinks, they look like different stories. 

When the stories get duped, how often does it happen? And how many stories come in duped at a time?

Actually, looking in the code it seems that dupe stories from dupe feeds do get deleted, so I’m not sure what’s going on here.

I’m getting at least 3 copies of every story and like others occasionally I get a repeat of a whole weeks worth.

Ok, I just deployed a change that will let me better watch this happen. Next time it does, please let me know on this thread. I can then take a look and see what actually happened on the backend. I would love to get to the bottom of this and fix it once and for all.

Happened again - Just checked my feed today and had a load of posts from last week + multiple duplications of those too!
TBH it happens all the time so you could probably check anytime!

It hasn’t happened since I posted, so I’ll keep watching. I’m subscribed to it now and regularly checking it.

1 Like

Looks like it has happened again. Two stories from yesterday unread this morning after reading them yesterday. When I change view to show all entries, Each post is in triplicate.

I’m watching it. It’s going to take a couple weeks but I’m checking it constantly and will get to the bottom of it soon enough.

1 Like

Ok, I’ve got my new database raw feed data collector installed. That will help me figure out why this feed is bypassing the story de-dupe checker. NewsBlur can handle incorrect story guid changes, so I need the raw feeds to see what’s wrong in between fetches.

Just updating this - this weekend this Google blog has done it at least twice (showing articles previously marked as read)

So I’ve been tracking it and found the smoking gun. But I still don’t have a solution yet. They have some weird caching issue that changes the feed URL from http://google.blog to http://www.google.blog, which are two different addresses that are not de-duped. That’s how the stories are getting reinserted, but NewsBlur should be eliminating those. I have yet to write the test case and fix it up, but I’ll try to do that this week.

1 Like

Ok, I think I finally cracked this one. I wrote up an extensive test case against the real data and managed to fix it once and for all. After this point, no story from Google or The Verge should duplicate itself and you should not see any new unreads after having read them.

It’s a simple fix, so I’m hoping it’ll take, but if it doesn’t know that I now have test cases that mimic the old behavior and they are now passing, so I hope that will be the end of this nefarious bug.

For those interested, here’s the test case: https://github.com/samuelclay/NewsBlur/blob/master/apps/rss_feeds/tests.py#L179-L225

1 Like

Just a note on this, I had to roll back one of the changes because it was hammering the system, which means it is now slightly possible that it duplicates stories. If that happens, I’ll see it since I’m actively focused on this issue. But when it does happen, I’ll have more data that I can then use in my test cases.

I just found this thread because I’m getting the same problem. However I’m not subscribed to the URL at the top of this thread (http://googleblog.blogspot.com/ which NewsBlur says only has 96 subscribers), but the alternative posted by Matt8: http://feeds.feedburner.com/blogspot/MKuf which has almost 12k subscribers.

I’ve added http://googleblog.blogspot.com/ to my list of feeds, and can see far fewer duplicates in there, but there are still loads in http://feeds.feedburner.com/blogspot/MKuf

Perhaps it’s worth making sure the fixes (not sure if still rolled back since your last update) work on that one too, as it looks like that’s going to affect far more people.