System for keyword based auto tagging

Hi Samuel,

I had an idea and was wondering how plausible it might be.

A system where individual RSS items would be run through a keyword matching filter. The defined filters would apply user defined tags to the RSS items containing matches for the user defined keywords.

The net effect would be that we would have the ability to set up some number of keyword and tag combinations that would then run against new RSS items entering our feeds. Then, if we were so inclined, we could simply browse to the specific tag in Newsblur to see all the unread items that have entered our feeds that were given a particular tag. Based on the way the tagging system currently works we could have a list of RSS items that were collected globally and relate to a specific topic.

I’m sure that is a tremendous amount of work in that suggestion but I figured I’d throw out the idea to see what you thought.

Thanks for all the hard work,


There are two parts to this system: the frontend and the backend. The backend work is something I can think about, but if you draw some preliminary screenshots of the frontend I bet you can address some of the remaining questions. Things like how do you input new tags and how do you choose which feeds they apply to (if more than a single feed). How would you read by tag? Would it be a list of tags in the sidebar? How do we prevent additional clutter. And how likely is it that a tag finds a story and you don’t get false negatives and false positives.


Ahh, good points. Now, I’ve never been accused of being a good designer but I think I could mock up some front end visuals to give this idea a shot at success. I’ll post them up here when they are ready.


Sounds like a good idea… especially to help with training on feeds that don’t have tags or titles with keywords you can train on

There are many times where I would love to have something like this. Like taking anything from Slashdot that starts with “Bennett Haselton writes”, then adding the tag “NOPE” which is set to thumbs down. Or finding the embeds for Daily Show and/or Colbert Report videos, then adding the tag “ALREADY SEEN IT” and set to thumbs down.

Hmm, I think i should clear up a bit of confusion about my idea. The tags that you can train on are part of the RSS item. For those to be edited ( added or removed ) I believe you would need to edit the RSS item its self and I don’t think any of Newsblur’s functions do that. I’m not 100% on this but I think an RSS item is global and all users subscribed to a feed share that one RSS item. That means that if you edit the tags you can train on you would be editing someone elses RSS item as well. I certainly wouldn’t want that.

The tags I was thinking of adding were the tags users can set when saving a story. Those tags can’t be trained on and are functionally quite a bit different from the RSS story tags ( those are referred to as categories in the RSS spec ).

Not that I don’t like the thought of more advanced training but knowing what little i do about the under the hood operations, I wasn’t shooting to enhance that part.

Hope that helps clarify my idea a bit. When I do the screen mock ups I think it will be more clear as well.


Ok Samuel, I’ve got the mock-ups ready. Lets see if I have addressed most of the points you brought up.

To start with I’ve tried to re-use as much functionality and UI as possible to try and ease the development impact. My idea would work like this.

The system would be accessible via a right click on any one subscription.

The options for the subsystem could then be displayed as such:

The filter type I imagined would direct the filter to use either simple text matching, maybe a more complex substring function or, my personal favorite, regular expressions. I figure most folks don’t use regex so the type of matching function should be defined to give the subsystem versatility.

The filter text is easy, it’s just the text you want to use in your matching function.

The tag to apply would be a reuse of the saved story tagging subsystem. You enter the tag you would like to apply based on a match of your filter and the end result is the RSS item is saved and tagged just as if you would have used the same features in the current UI.

Now for browsing, we can again reuse the current UI to keep development to a minimum.

The system for viewing the tagging that has been done by the filters already exists. Maybe its features could be expanded but as a bare bones idea I think its enough to make the subsystem functional.

The only point you brought up that I haven’t figured out how to address is the false positives / negatives. My only answer to that one would be some sort of preview for what the filters would match but that seems like it would be at a large cost for maybe a minimal benefit.

In summary, if this subsystem could be applied to both the title and contents of each RSS item for each RSS subscription, I could create some very specific reading lists without having to run the feed through some RSS manipulation service Pre-NewsBlur. I could simply go to a saved story tag and browse the stories that were tagged in multiple subscriptions. I hope I did ok explaining all that. I’ve had no formal training or experience in doing this type of design work. If you’d like anything more just let me know.

Thanks much,

Hi D, this is an incredible feature request with quite the spec. So I have a pretty clear picture of what you want, but what about adding auto-tagging to the intelligence trainer that exists now?

Ah ha! I hadn’t realized but there is a system that already acts upon all the RSS items isn’t there. I can see why you would look to that to add this idea in. If this idea was attached to the trainer I think it may loose two significant features from my design.

First, the trainer only acts on the RSS title and category elements. I was hoping to have pattern matching against the description element as well.

Second, and maybe more importantly, complete control over the pattern used for matching. Unless modified, the trainer system’s method of defining the match criteria, it wouldn’t allow for the user to be totally specific in their definitions of what they want matched. (did I mention I like regex?)

When it comes to the trainer system, I think you have struck the right balance between its ease of use and its functionality. I think that if this idea lived within the confines of the trainer it would get me about half way to my ideal reading list. Lets face it half way is plenty of progress. I’m certainly someone who loves powerful features even if they have a high learning curve. I think that is pretty evident from the features I cooked up. I also realize I’m probably in the minority on that so to benefit the service as a whole, the scope of the feature might have to be dialed back.

There is another way of looking at the end result I was shooting for. A collection of RSS items from any of my subscribed feeds that I have decided to include based on information in the RSS item, on a per item basis. I suppose that would be a good description of the end result that I came up with. Maybe it can be done with less complexity, but I do like powerful and flexible features.

In any case, NewsBlur is already my feed reader of choice. So any thing extra is just icing on the cake.

Thanks again,