Just changed my WordPress installs to 403 with unrecognized URL parameters. Let’s see how many terribly-behaved RSS readers this breaks.
When I checked site settings, I noticed that Newsblur was indeed fetching https://cdn.jwz.org/blog/feed/?_=3781 for the site.
If I go to my browser and view that ?_=3781 thing, I get the feed-that’s-basically-an-error-message.
If I strip off the parameter and just view https://cdn.jwz.org/blog/feed/, that looks like blog posts, yay.
When I change my Newsblur site settings to fetch https://cdn.jwz.org/blog/feed/ without the weird URL parameter, I get the error messages. I was reeeeally hoping I’d get blog posts.
Uhm, I’m out of ideas.
This isn’t too urgent; the guy relays his blog posts to tumbr, mastodon, and probably other follow-able things. So, not urgent. But weird.
I do see all the posts appearing in my feed, but I also see an extra entry with the unknown parameter problem still. The error includes more now, though: “If you are seeing this,
your feed reader is badly behaved.
Use a different one.”
I don’t know if this is something that will go away going forward, but this entry is marked from 35 minutes ago.
Here’s a short description of what happened on this feed:
Jwz didn’t like how NewsBlur performed cache busting. I was adding a _=12345 parameter to the end of the query string when users force-refreshed the feed in order to bust the cache, but sometimes the feed would take that extra query string and return a feed with the address set to that cache-busted query parameter. So NewsBlur would diligently change the url to match, and that led to jwz’s feed permanently storing the cache buster.
Because the feed is both popular and prolific, it had a rare condition where the feed would re-sync its entire archive (6,000+ stories and counting) when it had premium archive subscribers, and that took longer than the 10 seconds the add subscription request had to complete, so it failed, even though it technically succeeded. If you were to refresh NewsBlur, you would see the feed successfully added, but that’s not a very good user experience. I fixed this issue by only resending the archive when the feed switches from not having any premium archive subscribers to having its first premium archive subscriber, and that means the number of stories is much lower. The background feed fetchers will take care of periodically re-syncing those stories anyhow.
I’m not thrilled with jamie’s approach to this, but I’m curious: how necessary is cache-busting these days? And what’s the relationship between cache-busting and syncing for premium archive subscribers?
I was adding a _=12345 parameter to the end of the query string when users force-refreshed the feed in order to bust the cache
@samuelclay I love NewsBlur, but please dont do this.
As a website owner who provides RSS feeds, I can confirm that this tactic will lead to Newsblur being blocked. Newsblur should adhere to the cache headers in responses.
The cache buster comes from years of experience running a news reader. I’ve seen numerous feeds that wouldn’t work without that cache buster. I’m happy to turn it off for now and will see what the next feed that breaks looks like.
Sure. I’ve also worked on RSS tech for a few decades now. I own a pretty popular RSS reader actually.
The only time I can see something like that cache buster hack being helpful is if a particular feed isnt responding with the appropriate headers, causing the feed to be cached, when it shouldn’t. That is likely to be a one-off case that Newsblur should handle separately.
But applying the hack for every website doesnt seem like a good idea and would cause the feeds to be blocked unnecessarily, which would be a very frustrating experience for Newsblur users.