I abused and broke training (self-hosted)

First, big thank you for making this available for self-hosting :person_bowing:

Second, this is self-hosted, so I understand it might be of low priority for you.

Also, I’m probably holding it wrong :joy: But, here comes, as the use might be of interest to you and others, and I may have stumbled upon genuine bugs you might be interested in fixing / adjusting.

First, what I was trying to do, and why: I have a bunch of feeds (currently a bit over 200, says the dashboard). Various news sites, game releases, github commits and releases, some youtube publishers, blogs. On various topics, mostly organized into directories, some sub-directories. At various times, there are different things I am interested in seeing. Sometimes it might be what is being said about one of currently ongoing wars, at other times it may be advances in specific area(s) of medical technology. And sure, there are feeds dedicated to that, but usually such items are in more generic news feeds. And yes, you can configure the trainer to surface the stuff you are interested in… But it will be all the stuff you are interested in, and not a particular category of it. So, trainer plus “focused” is not quite a solution.

But, I’ve noticed that same feed can be subscribed multiple times, in different folders. So I created a folder named _filtering_test (to have it at the top of the list, while I’m playing around), and added in there directories, let’s call them topic1 and topic2, and started configuring the recently added feature, per-folder training rules. And things misbehaved, in interesting ways:

  1. Turns out I had elsewhere a sub-directory Topic1 (this one starting with a capital letter, as opposed to lowercase in my tests). The rules I configured for topic1 also applied to Topic1, and feeds I added to topic1 also were listed in Topic1
  2. Organize sites → select those under Topic1 that I did not want there → Delete Sites caused for them to be removed from everywhere, not just that folder
  3. Trying to add new training rules for terms in topic2 apparently didn’t save them, even though I made sure to click Save when adding them. Viewing the global training rules didn’t list them, listing folder specific ones shows only _filtering_test - topic1.
  4. How long does it take for training to take effect? Does it take effect on existing articles, or only on new ones?
  5. And, something that all of the above not working didn’t let me find out - will the per-folder training be reflected only in view for this folder, or will it affect articles from same feed when they’re viewed via other folders?

As I said, this may be completely not how you envision those features being used, but, that’s what I figured for “now show me this topic” that’s easier to reach than scrolling to the saved searches (also, saved searches cannot be edited or named).

I did not try yet whether adding a saved search as a column on dashboard is something that will work well enough for me as “I want to look at this topic now”.

Hey viq, thanks for the detailed report. I tracked down the root causes and have fixes for everything:

1. Case-sensitive folder creation bug - You’re right that creating topic1 when Topic1 already existed caused problems. There was an inconsistency where folder creation used a case-sensitive check (allowing both to exist), but adding feeds to folders used a case-insensitive check (so feeds targeted at topic1 silently went into Topic1). Fixed so folder creation is also case-insensitive, preventing duplicate folders that differ only in case.

2. Deleting feeds affected wrong folders - The folder matching in delete_feed (and rename/delete folder operations) used Python’s substring matching instead of exact matching. This meant a folder named “News” could incorrectly match “Tech News”. Fixed to use exact folder name matching. Since your feeds were merged into Topic1 by the case bug, when you deleted them there, the subscription was removed entirely (the feed only existed in one folder despite appearing to be in two).

3. Training rules not saving - Folder-scoped and global-scoped training require the Premium Archive tier. On a self-hosted instance, you may need to set is_archive=True on your user profile. Feed-scoped training (the default) should work regardless. You can set it via the Django shell:

from apps.profile.models import Profile
p = Profile.objects.get(user__username=‘your_username’)
p.is_archive = True
p.save()

To answer your other questions: training applies to new stories as they come in, and per-folder training only affects the river view for that specific folder.

These fixes are in ac1cfb2b9b7c98c81878be3c60405814d8634801, which I just pushed to main.

Thank you!

Ah, “marked as they come in” means the testing will take a while…

I updated code, but did not re-create folders, in case it makes any difference. I have folder _filtering_test, and under that topic1_filters and Topic2, in which I have a bunch of feeds. While in topic1_filters I’ve added a couple folder rules, and when viewing Intelligence Trainer they do show up under _filtering_test - topic1_filters. Then I went to Topic2 (which has some of same feeds), and tried adding some rules there - and currently they show up in Intelligence Trainer as belonging to _filtering_test - topic1_filters. So currently saving works, but it seems there may be an issue with assigning them to proper place, especially when some duplication is in place.