Recently (past few weeks?) I’ve noticed many twitter feeds that go from zero unread posts to exactly 20 new/unread posts. I suspected something new/weird because this (exactly 20 items) happens so often, but couldn’t put my finger on it.
Today, one twitter feed with exactly 20 posts came from a twitter user who hasn’t posted in a while but today posted a numbered 30-tweet thread. My newsblur feed for that twitter account contains exactly 20 items, which are the “latest/last” tweets numbered 11 througth 30. Somehow newsblur completely missed tweets 1 through 10 in that feed/batch/update.
This is as close to “conclusive” evidence that I can get that there is some kind of rate limit or batch limit or … somthing? I can’t tell if it’s a newsblur polling issue or a twitter backend/API issue, or ?
Newsblur’s site settings show no polling errors or rate-limit errors, so either newsblur is batching weirdly or twitter is silently delivering “not the whole updated feed” on any given polling cycle, or ???
I suspect this may be a per-poll limitation because many other twitter feeds routinely populate with many more than 20 items; my guess is that those higher-than-20 unread items accumulate over more than one polling cycle (less than 20 items/poll?).
Basically if a feed hasn’t published in the past 30 days, it follows different rules. In this case, these twitter accounts are dormant until they’re not. Once they start posting again, all of their posts should show up. Part of the algorithm also takes into account how many of these stories are actually read, so if you read a story in a feed that has been dormant, it’ll start accumulating more stories.
Hmm. The inactive feed explanation doesn’t match my observations.
The feeds seeing exactly 20 items coming in at a time are active on a daily basis. And I read them regularly.
I wonder if something like a combination of increased tweet volume and/or long-ish polling intervals has revealed a limit that was always there.
It looks to me like the twitter API that tweepy uses to fetch tweets has an optional “count” parameter that (as far as I can see) is not set to any particular value. I also don’t see mention of any default limit in the twitter API docs, but clearly there must be some limit to how many “recent” tweets are returned. I wish I had a better understanding of the interplay between polling intervals and the definition of “recent tweets” in the twitter API.
Having now investigated a total of three high-volume “newsblur 20 count” twitter feeds, it now seems inescapable: presumably through a combination of polling intervals and tweet-fetch limits inherent in newsblur’s use of the twitter API, newsblur loses many tweets when an account posts more than 20 tweets per newsblur polling cycle. This renders newsblur ineffective as a twitter feed consolidator: too much information is lost.
Farewell, fair newsblur twitter feeds, you were fantastic while the twitter volumes were sufficiently low.