I’m subscribed to the Money Stuff email newsletter and forward the emails from Gmail to Newsblur to read them. It looks fine in Newsblur itself, but when I click on the title of a story to read it the article by itself, the curly quotes are munged. For example:
The munged text looks like this:
The classic description of a successful investment banking analyst is “detail-oriented.â€
Just deployed a fix for this. Unfortunately, it won’t be retroactive, but you’ll see all new stories come with the correct encoding. Let me know if that’s not the case and give me any URLs where it doesn’t work.
What is the probability that the US Supreme Court will strike down President Donald Trump’s broad tariffs? There are a number of ways of thinking about that question. One is what you might call legal analysis: You can look at the text of the US Constitution and the relevant statutes, as well as relevant legal precedents, and try to figure out whether the tariffs are legal or not. If they seem very illegal, then the probability that the Supreme Court will strike them down is high; if they seem very legal, low; if the law is genuinely ambiguous then maybe it’s 50/50. In previous columns I have sort of gestured in the direction of this analysis, but honestly this might be the worst way of thinking about the question.Â
Well I’ll be. I was so proud of this fix, because it did work, but only on normal RSS feeds. I noticed Techmeme had this same encoding issues and I was thrilled that this fixed it as well.
But of course, I neglected to cover newsletters, which take a slightly different path into becoming stories. So that’s now deployed and will fix future newsletters.
If you see any encoding issues, please holler, I’d love to fix them!
Hi, I’ve been distracted for a while but I was reading Money Stuff today and it’s still not fixed. From the December 1 email:
We have talked before about the business model that Sam Altman proposed for OpenAI in 2019, which was (1) build an artificial superintelligence, (2) ask it how to make money and (3) do that. “We will create God and then ask it for money,†as I put it.Â
Apologies for taking so long on this but I finally fixed it for good. Looks like we were correctly storing it in the DB but neglected to include the character set encoding in the html we were serving. It’s now fixed, and I verified and even that url looks good now.
Thanks so much for your persistence and patience. And if you spot anything like this again, please please please post it! Same goes for any other concerns, bugs, or feature ideas.
This doesn’t seem to work for headlines; for example, the latest entry for Mieux Donner has <title>De l’INSA à Mieux Donner : quand l’esprit d’ingénieur rencontre la philanthropie</title> but is rendered like so:
Root Cause:
A bug in feedparser 6.0.12 where it does a case-sensitive lookup for the content-type header. We were passing headers with Content-Type (title case), but feedparser only recognizes content-type (lowercase). When it couldn’t find the header, it defaulted to iso-8859-1 encoding instead of respecting the feed’s UTF-8 declaration.
The Fix:
Normalize all HTTP header keys to lowercase before passing them to feedparser. This ensures feedparser correctly detects the UTF-8 charset from the Content-Type header.
I’ve also manually corrected the affected story. New stories from this feed will now display correctly.