Filted out duplicate articles which have similar title or description

I like this newsblur so much… change from google reader to this as fist time use. and going apply premium for sure.

ALL look like perfect but i always got the issue with duplicate articles in some my favorite news feed.

Some news site they has their own content and also sync content with other site… so the content is duplicated … only the link difference and … some difference on title
for eg: both NYT and ABC has the same articles about one thing… they just have a little differ in title and description.

Other case is on the same site there are two feeds …
one is for all news update feed, other is for one topic feed only
so they both has same articles some time. when i go to all articles feed… there are many articles which i open still not marked as read ( which been open from one topic feed )…

the urls and title of those articles from both feed is slightly difference ( they put tracking number on end, and add some word on title ) , about 20% differ

Conclusion

could the news blur filted out the articles which have similar title and description base on keywords

so when looking at the news list… i can see the article with color maybe blue … which have similar content, title with other one, the newest one have normal color

or beside three option of intelligent trainer ( red, green, yellow), we have other option switch … that help not show similar articles ( only show fist one ).

or may be just group them in to one … if click in the link will open the fist one, if click on the button next to link … can drop down all similar article …

thanks

5 Likes

Ars Technica has lots of duplicates today too.

Yeah, this is an ideal goal, but it’s quite hard. If I ever ramp up NewsBlur to a full-time development shop with multiple engineers, this problem will get attacked.

Uhm as i thinking… just look at it as current “intelligent trainer” feature extension . but develop it for client side java script base only , cause this is not need to effect database , java script base power enough to fileted out/ group the similar articles at GUI

I been using premium acc for a year now and
still looking for this feature … to remove similar and duplicated articles

if you doing php or mysql it very easy… just loop over last 200 artices then use similar_text funtion to have % of similar title … then group them or just mark them as gray may be or delete them ( put some setting for user ) …

Just image that I have 10 rss all about technical news … every day i see around 60% talking same topic, browse all of them or mark one by one is very hard job and wasted time…