De-duplicate Like News Items

I subscribe to a ton of feeds. A lot of the feeds are from different sources. For example:

New York Times > New York
New York Times > Home Page
New York Times > Environment

The Guardian > US Edition
The Guardian > Science
The Guardian > Business

The feeds feature different content, but often feature the same news stories several times a day. For example, the New York, Home Page, and Environment feeds from the New York Times will all have the same article on fracking with the same headline and content.

It would be nice to have a way for NewsBlur to recognize if an item in a feed was previously featured in another feed and hide it, leaving only one instance of the news item in all the feeds (or under “All Site Stories”). For users that (for whatever reason) want to access the duplicate feeds, there could be a section under Read Items if this hypothetical option is toggled in settings.

1 Like

For this to work, either A the “duplicated” feed items must be complete exact matches, or B there would have to be a fully general way for each user to specify what counts as “duplicated” items, across what feeds. There would also need to be a way to specify which feeds are to be counted as one ‘group’.

I’d rather see IFTTT, Yahoo Pipes or similar doing the work; then it’s up to the user to remove the “duplicated” (syndicated?) items.

Yeah, this is a tough problem. I could use some statistical models, but that’s a very expensive feature and is certainly not easy to build. Also, there’s the fact that the UI doesn’t really support this yet. I’m not sure which is the harder problem to solve, but they are both what’s causing this problem to not yet be solved.

Couldn’t just two articles with the same link be treated as matches? This wouldn’t probably cover all cases of duplicates but at least a lot.

Any news on this? Will this be implemented some day?