De-duplicate Like News Items

I subscribe to a ton of feeds. A lot of the feeds are from different sources. For example:

New York Times > New York
New York Times > Home Page
New York Times > Environment

The Guardian > US Edition
The Guardian > Science
The Guardian > Business

The feeds feature different content, but often feature the same news stories several times a day. For example, the New York, Home Page, and Environment feeds from the New York Times will all have the same article on fracking with the same headline and content.

It would be nice to have a way for NewsBlur to recognize if an item in a feed was previously featured in another feed and hide it, leaving only one instance of the news item in all the feeds (or under ā€œAll Site Storiesā€). For users that (for whatever reason) want to access the duplicate feeds, there could be a section under Read Items if this hypothetical option is toggled in settings.

1 Like

For this to work, either A the ā€œduplicatedā€ feed items must be complete exact matches, or B there would have to be a fully general way for each user to specify what counts as ā€œduplicatedā€ items, across what feeds. There would also need to be a way to specify which feeds are to be counted as one ā€˜groupā€™.

Iā€™d rather see IFTTT, Yahoo Pipes or similar doing the work; then itā€™s up to the user to remove the ā€œduplicatedā€ (syndicated?) items.

Yeah, this is a tough problem. I could use some statistical models, but thatā€™s a very expensive feature and is certainly not easy to build. Also, thereā€™s the fact that the UI doesnā€™t really support this yet. Iā€™m not sure which is the harder problem to solve, but they are both whatā€™s causing this problem to not yet be solved.

Couldnā€™t just two articles with the same link be treated as matches? This wouldnā€™t probably cover all cases of duplicates but at least a lot.

Any news on this? Will this be implemented some day?