Improve feed and website sections merging

It looks like the detection where the feed entries are located on the original website is title based only.

The problem is that some sites use the same title multiple times, e.g. “Weekly links” or something other date related column. Also it looks like NewsBlur has a problem with pinned posts on the website which don’t follow the chronological order of the feed. Also titles could be a little bit different in the feed and on the website thanks to typographic plugins etc.

My idea is to compare the hyperlinks instead of the titles since they should be more unique, at least for SEO. In this case we have to handle Google Analytics tags added by Feedburner etc. and resolve redirections, but this shouldn’t be a big problem.

What do you think about this?

1 Like

Hyperlinks are closer to unique, but they sometimes don’t match what the RSS feed has for its links. Titles are the best way I’ve found. As for not being in chronological order, it’s supposed to route around that to a certain extent, but the question is really when does one story start and another begin? If they are not expected to be in chronological order, you may not be sure that the Original view is correctly linked to the right stories.

I may take a closer look at the heuristics for finding stories. But I’ve spent more than a few weeks on those things and they’re already at a point where it’s diminishing returns for lots of effort spent.

I have to say that the title detection works pretty well in the most cases.

Hyperlink comparison was just an idea but now looks even more complex since you somehow have to resolve redirection links and clean up parameters etc. like in case of Feedburner feeds. Looks like this will need an additional database and/or might generate a lot of false statistics on the side of the site owner.

Great to see that you already thought about this, so the current implementation is likely the best. :slight_smile: