The Verge RSS feed not Updating

John_Roepke · November 5, 2011, 5:13am

When I add http://www.theverge.com/rss/index.xml to NewsBlur it doesn’t get any new items after Oct. 30th (when the site switched from being ThisIsMyNext to The Verge.) even though they’re in the RSS.

If it helps, my username is “justjohn”

samuelclay · November 5, 2011, 9:56am

A number of users have mentioned that The Verge’s RSS feed doesn’t work. I checked on it and confirmed that something is going on. I have e-mailed them to let them know.

Here’s what’s going on, for those interested (and can read Python):

>>> import feedfinder
>>> feedfinder.feed('http://theverge.com’)
‘http://theverge.com/rss/index.xml’

>>> import feedparser
>>> fp = feedparser.parse(‘http://theverge.com/rss/index.xml’)
>>> fp
{‘feed’: {}, ‘status’: 200, ‘bozo’: 1, ‘headers’: {‘status’: ‘200 OK’, ‘content-length’: ‘19383’, ‘via’: ‘1.1 sbnation.com’, ‘content-encoding’: ‘deflate’, ‘vary’: ‘Accept-Encoding’, ‘x-runtime’: ‘578’, ‘connection’: ‘Keep-Alive’, ‘etag’: ‘“fd2f3a5b33a1a706fea1dec62ad25df3”’, ‘cache-control’: ‘private, max-age=0, must-revalidate, private, max-age=0, must-revalidate’, ‘date’: ‘Fri, 04 Nov 2011 00:06:59 GMT’, ‘p3p’: ‘CP=“CAO DSP COR CURa ADMa DEVa PSAa PSDa CONi OUR IND PHY ONL UNI COM NAV INT CNT STA”’, ‘content-type’: ‘application/xml; charset=utf-8’}, ‘etag’: u’“fd2f3a5b33a1a706fea1dec62ad25df3”’, ‘href’: u’http://www.theverge.com/rss/index.xml’, ‘entries’: [], ‘bozo_exception’: error(‘Error -3 while decompressing data: incorrect header check’,)}
>>> fp.entries
[]
>>> fp.bozo
1
>>> fp.bozo_exception
error(‘Error -3 while decompressing data: incorrect header check’,)

John_Roepke · November 5, 2011, 8:37pm

I just tried the same thing you posted on my machine (Ubuntu 11.10) and I get the correct feed, so their feed isn’t completely broken (at least for me):

import feedfinder
import feedparser

url = feedfinder.feed("http://theverge.com")
fp = feedparser.parse(url)
fp
{‘feed’: {‘updated’: u’2011-11-05T20:14:06Z’, ‘subtitle’: u’’, ‘updated_parsed’: time.struct_time(tm_year=2011, tm_mon=11, tm_mday=5, tm_hour=20, tm_min=14, tm_sec=6, tm_wday=5, tm_yday=309, tm_isdst=0), ‘language’: u’en’, ‘title’: u’The Verge - All Posts’, …lots more entries…

I tried with python 2.7.2 & feedparser 5.0.1

samuelclay · November 5, 2011, 11:06pm

No dice. Trying on my local Mac (Python 2.7.1):

>>> import requests
>>> r = requests.get('http://www.theverge.com/rss/index.xml’)
>>> r.headers
{‘status’: ‘200 OK’, ‘content-length’: ‘5972’, ‘via’: ‘1.1 sbnation.com’, ‘content-encoding’: ‘deflate’, ‘vary’: ‘Accept-Encoding’, ‘x-runtime’: ‘137’, ‘connection’: ‘Keep-Alive’, ‘etag’: ‘“365ff5ec199a2c8d5b50ccbe2a5c3dda”’, ‘cache-control’: ‘private, max-age=0, must-revalidate, private, max-age=0, must-revalidate’, ‘date’: ‘Sat, 05 Nov 2011 23:06:05 GMT’, ‘p3p’: ‘CP=“CAO DSP COR CURa ADMa DEVa PSAa PSDa CONi OUR IND PHY ONL UNI COM NAV INT CNT STA”’, ‘content-type’: ‘application/xml; charset=utf-8’}
>>> import zlib
>>> zlib.decompress(r.content)
Traceback (most recent call last):
File “”, line 1, in
error: Error -3 while decompressing data: incorrect header check
>>>

John_Roepke · November 6, 2011, 4:32am

I don’t know if you need to use zlib to decompress the data from the request. You should be able to access r.content directly.

I was able to get content (albeit with some interesting unicode at the beginning) from the request on my Mac (Python 2.7.1) so it sounds to me like there’s some proxy or out of date cache in the way if you’re not getting any new content.

–
Here’s the headers/content I got

>>> r.headers

{‘status’: ‘200 OK’, ‘content-length’: ‘19519’, ‘via’: ‘1.1 sbnation.com’, ‘content-encoding’: ‘deflate’, ‘vary’: ‘Accept-Encoding’, ‘x-runtime’: ‘108’, ‘connection’: ‘Keep-Alive’, ‘etag’: ‘“9b5325d4194bc6c6dd3c2c29a24c36a7”’, ‘cache-control’: ‘private, max-age=0, must-revalidate, private, max-age=0, must-revalidate’, ‘date’: ‘Sun, 06 Nov 2011 04:12:49 GMT’, ‘p3p’: ‘CP=“CAO DSP COR CURa ADMa DEVa PSAa PSDa CONi OUR IND PHY ONL UNI COM NAV INT CNT STA”’, ‘content-type’: ‘application/xml; charset=utf-8’}

>>> r.content

u'\x02\x00\x00\x00\ufffd\ufffd\x00\ufffd\x1f\x04\ufffd<?xml version="1.0" encoding="UTF-8"?>\n<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom">\n
<title>The Verge -
All Posts</title>\n
<subtitle></subtitle>\n
<updated>2011-11-06T03:54:02Z</updated>\n
<id>http://www.theverge.com/rss/index.xml</id>\n
<link type="text/html" rel="alternate" href="http://www.theverge.com/">\n
<entry>\n
 <published>2011-11-06T03:54:02Z</published>\n
 <updated>2011-11-06T03:54:02Z</updated>\n
 <title>US Cellular <br>

John_Roepke · November 16, 2011, 9:44pm

For anyone interested, I’ve decided to just work around this issue until it’s resolved by proxying the feed through a server I control. If anyone else wants to do the same here’s the PHP script I’m using:

<?php <br /> print(file\_get\_contents('http://www.theverge.com/rss/index.xml'));   
?\>

samuelclay · November 16, 2011, 9:46pm

I just emailed them again yesterday. They don’t see a problem, but I see the issue on both my machine at home and the servers.

samuelclay · November 16, 2011, 10:01pm

And fixed. Dear lord, I had to track down the problem to an out-of-date feedparser error. Keep 'em coming! (Also, this means that you might want to check any of your other feeds that are throwing 500s)

Matt · March 22, 2012, 11:24am

I know this is dragging up an old one, but The Verge doesn’t appear to be updating for me now. Is anyone else seeing this? I’ve not had a new story show up in there since March 7.

RobertWawrzyniak · March 22, 2012, 11:40am

I’m seeing this as well but because I have some more feeds like The Verge in my list, I didn’t recognize it…

Seems to be a nasty bug though.

samuelclay · March 22, 2012, 4:49pm

Good catch. Fixed here: https://github.com/samuelclay/NewsBlu….