Rachel By the Bay feed challenge (and excessive timeouts, probably not respecting retry-after)

denubisx · May 29, 2024, 8:42am

So, she just wrote So many feed readers, so many bizarre behaviors

Looking at the way it’s pinging, newsblur is … not…super well behaved in parsing retry-after tags?

2024-05-29 18:25:50
Timeout (505)
2024-05-29 17:24:51
Timeout (505)
2024-05-29 16:19:01
OK (200)
2024-05-29 13:37:42
Timeout (505)
2024-05-29 08:11:25
Timeout (505)
2024-05-29 02:45:28
Timeout (505)
2024-05-29 02:44:43
Timeout (505)
2024-05-28 17:48:24
Timeout (505)
2024-05-28 07:54:41
Timeout (505)
2024-05-28 03:09:34
Timeout (505)
2024-05-27 22:24:28
Timeout (505)
2024-05-27 17:30:13
Timeout (505)
2024-05-27 17:29:44
Not modified (304)
2024-05-27 15:09:21
Timeout (505)
2024-05-27 12:04:04
Timeout (505)
2024-05-27 10:08:03
Timeout (505)
2024-05-27 09:03:59
Timeout (505)
2024-05-27 07:55:58
Timeout (505)
2024-05-27 06:54:07
Timeout (505)
2024-05-27 05:49:56
Timeout (505)
2024-05-27 04:41:01
Timeout (505)
2024-05-27 03:43:58
Timeout (505)
2024-05-27 02:37:48
Timeout (505)
2024-05-27 02:36:58
Timeout (505)
2024-05-27 01:31:51
Timeout (505)

It might be useful to engage with her challenge when she says

Now we get to the part where I pitch a way forward, and nobody takes me up on the offer. The idea is basically this: I get some kind of commitment and support from the people who do feed reader stuff, and in turn, I build a new kind of web site which amounts to a “feed reader correctness score”.

Just dropping it here.

samuelclay · May 29, 2024, 12:31pm

I’m glad to see that NewsBlur fulfills almost all of the criteria that she calls “nigh-perfect” feed reader behavior. This issue here is that a 505 is because she is publishing too many stories and NewsBlur is timing out in parsing them.

Looking at the feed, it’s enormous: https://rachelbythebay.com/w/atom.xml

I wish it didn’t timeout, but without writing some logic to parse out the top 10 stories and then parsing those, it just isn’t able to be easily parsed by NewsBlur’s feed fetchers.

thedaveCA · May 31, 2024, 5:39pm

I’ve been meaning to write a proxy that would impose caching and limit the number of articles, and possibly other things…

But alas, I haven’t. And at this point probably won’t. I would like Rachel’s blog back though, enough that I might poke her and see if she’d do a mini feed with less articles, or something.

teancom · May 31, 2024, 11:04pm

I count 100 stories (<entry>s) in the feed, at least what I got with a wget. That doesn’t seem like a ton? Not a tiny amount, but well within norms for a site that’s been around a while.

fancybone · June 1, 2024, 6:29pm

I could take a stab at implementing some functionality along those lines; is the Newsblur code on Github what’s currently running under the hood?

samuelclay · June 3, 2024, 12:58pm

Yep, what’s on GitHub is live.

thedaveCA · June 5, 2024, 3:45am

Oh hey, the feed updated. Awesome!

easports · June 12, 2024, 4:48pm

Rachel has a new update. She puts Newblur in

Group B: They tend to do spammy unconditional requests at startup, and usually at a needlessly fast rate, too - like less than a second apart. This is what most entries in group B have, and if that’s their only problem, then fixing that would move most of them into group A. (There can be other small anomalies which put something here).