Train intelligence with a priori arbitrary words

nelas · May 8, 2012, 8:23pm

Is it possible to train NewsBlur intelligence using arbitrary words (ie, words that are not present in the feed contents)?

Use case: I have a folder with a bunch of feeds of scientific journals and I would like to be able to filter quickly my articles of interest. Right now I am steadily marking title words and tags interactively (during my normal browsing), but being able to provide an a priori list of words of interest to the trainer (eg, species, genes, keywords) would be even more interesting and efficient to filter content.

Filtering scientific literature is a hot topic and NewsBlur approach is the most promising I have used so far.

Tks!

samuelclay · May 9, 2012, 7:33pm

Heh, this is a funny request, because this is how most other “intelligent” rss feed readers do it. You supply it with a list of keywords and it diligently applies those keywords to scores. Unfortunately, that’s not in scope for NewsBlur because I basically don’t want to encourage this kind of behavior. It may work, but it’s needlessly complicated and makes it harder to understand how the intelligence works.

Just having the ability to do this is something that I see as a negative, since it raises the ceiling for how complicated I’m allowing NewsBlur to become.

samuelclay · May 9, 2012, 7:34pm

Great idea, nonetheless, it’s just way out of scope. If you can come up with a reasonable interface (better yet, submit a pull request), I would consider it much more strongly. But simplicity is key, and I just don’t have time to figure out how to make this simple and not resemble the current interface word-for-word. In fact, what you see is how I simplified the training process.

nelas · May 10, 2012, 11:47pm

Hey, thanks for the response! I agree that the way the intelligence works is quite elegant and this might over-complicate it. I am having a look at the code to see if I can suggest something more concrete.

Meanwhile, the interface I had in mind was simple, just an “add custom” button for each type of classifier:

It would include the word/phrase to the feed’s intelligence and then you can vote it up/down as usual. Almost as the story title highlight form. In fact, I just realized that I can select custom words by erasing the title, writing and highlighting it

Do you plan to include a “body” classifier model in the analyzer or this would bring performance problems?