Jwz feed broken, he's mad about URL parameters

The jwz feed used to be blog posts, but now gives just one “post” with content

<description>
<![CDATA[ unknown parameter "_" ]]>
</description>

Probably because the blog author was mad about weird URL parameters jwz: &quot;Just changed my WordPress installs to 403 with un…&quot; - Mastodon

Just changed my WordPress installs to 403 with unrecognized URL parameters. Let’s see how many terribly-behaved RSS readers this breaks.

When I checked site settings, I noticed that Newsblur was indeed fetching https://cdn.jwz.org/blog/feed/?_=3781 for the site.

If I go to my browser and view that ?_=3781 thing, I get the feed-that’s-basically-an-error-message.
If I strip off the parameter and just view https://cdn.jwz.org/blog/feed/, that looks like blog posts, yay.

When I change my Newsblur site settings to fetch https://cdn.jwz.org/blog/feed/ without the weird URL parameter, I get the error messages. I was reeeeally hoping I’d get blog posts.

Uhm, I’m out of ideas.

This isn’t too urgent; the guy relays his blog posts to tumbr, mastodon, and probably other follow-able things. So, not urgent. But weird.

2 Likes

Oh sorry, yeah it seems like Newsblur may be using a cached version of the old page that needs to be refreshed.

Good catch, I’ve updated the feed to use the correct url and it should now be updating again.

1 Like

I see blog posts now. Life is full of love and wonder.

I do see all the posts appearing in my feed, but I also see an extra entry with the unknown parameter problem still. The error includes more now, though: “If you are seeing this,
your feed reader is badly behaved.
Use a different one.”

I don’t know if this is something that will go away going forward, but this entry is marked from 35 minutes ago.

2 Likes

I also still see the “bad” posts popping up, even today:

Today, June 29th, 12:04

Same. It’s still complaining about a “_” parameter.

And indeed somehow the parameter seems to have come back?

image

1 Like

I filed sites rejecting _ (underscore) query parameter · Issue #1877 · samuelclay/NewsBlur · GitHub before realizing that folks were chatting about the same problem here.

Also seeing the errors in my view of the feed.

1 Like

Ok, I’ve made changes (linked in sites rejecting _ (underscore) query parameter · Issue #1877 · samuelclay/NewsBlur · GitHub) to both strip underscores from feed addresses when being saved, and to special case jwz’s site so that we never send underscores to it when forcing a refresh.

2 Likes

Here’s a short description of what happened on this feed:

  1. Jwz didn’t like how NewsBlur performed cache busting. I was adding a _=12345 parameter to the end of the query string when users force-refreshed the feed in order to bust the cache, but sometimes the feed would take that extra query string and return a feed with the address set to that cache-busted query parameter. So NewsBlur would diligently change the url to match, and that led to jwz’s feed permanently storing the cache buster.

  2. Because the feed is both popular and prolific, it had a rare condition where the feed would re-sync its entire archive (6,000+ stories and counting) when it had premium archive subscribers, and that took longer than the 10 seconds the add subscription request had to complete, so it failed, even though it technically succeeded. If you were to refresh NewsBlur, you would see the feed successfully added, but that’s not a very good user experience. I fixed this issue by only resending the archive when the feed switches from not having any premium archive subscribers to having its first premium archive subscriber, and that means the number of stories is much lower. The background feed fetchers will take care of periodically re-syncing those stories anyhow.

3 Likes

His DNA Lounge feed is doing similar.

I’m not thrilled with jamie’s approach to this, but I’m curious: how necessary is cache-busting these days? And what’s the relationship between cache-busting and syncing for premium archive subscribers?

I’m also not thrilled by the … thoroughness of his approach, but I like jwz, so he gets some latitude in my book.

Thanks, @samuelclay , for prioritizing this. :slight_smile:

I was adding a _=12345 parameter to the end of the query string when users force-refreshed the feed in order to bust the cache

@samuelclay I love NewsBlur, but please dont do this.

As a website owner who provides RSS feeds, I can confirm that this tactic will lead to Newsblur being blocked. Newsblur should adhere to the cache headers in responses.

Still broken. I see nothing but error messages now:


jwz

jwz https://www.jwz.org/blog en jwz@jwz.org (jwz) Tue, 02 Jul 2024 13:25:58 GMT Tue, 02 Jul 2024 13:25:58 GMT https://www.jwz.org/blog/feed/?_=4341 ZoP/5g Error invalid parameter "_"

If you are seeing this,
your feed reader is badly behaved.
Use a different one.

The cache buster comes from years of experience running a news reader. I’ve seen numerous feeds that wouldn’t work without that cache buster. I’m happy to turn it off for now and will see what the next feed that breaks looks like.

I’ve deployed the change.

2 Likes

Sure. I’ve also worked on RSS tech for a few decades now. I own a pretty popular RSS reader actually. :slight_smile:

The only time I can see something like that cache buster hack being helpful is if a particular feed isnt responding with the appropriate headers, causing the feed to be cached, when it shouldn’t. That is likely to be a one-off case that Newsblur should handle separately.

But applying the hack for every website doesnt seem like a good idea and would cause the feeds to be blocked unnecessarily, which would be a very frustrating experience for Newsblur users.