Language Log feed parse error

The feed for Language Log (http://languagelog.ldc.upenn.edu/nll/) has stopped working in NewsBlur.

Both Atom and RSS 2.0 don’t work; they fail with a SAX exception.

http://languagelog.ldc.upenn.edu/nll/…
http://languagelog.ldc.upenn.edu/nll/…

 2014-06-16 10:38:34 SAX Exception (553) 
 2014-06-16 09:37:10 SAX Exception (553) 
 2014-06-16 08:34:57 SAX Exception (553) 
 2014-06-15 23:50:54 SAX Exception (553) 
 2014-06-15 13:47:54 SAX Exception (553) 
 2014-06-15 03:49:32 SAX Exception (553) 
 ... 

The last post available in the feed is from June 5 2014, if that matters.

1 Like

Looks like the server is returning something wrong. The url doesn’t validate: http://validator.w3.org/feed/check.cg…

Well, that’s annoying that it doesn’t validate.

Chrome seems to be able to handle the feed just fine, as does Google’s RSS Subscription Extension.

And if you try to validate the feed content directly, there are a few recommendations, but the feed validates.

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 13, column 107: Self reference doesn’t match document location
line 83, column 0: item contains more than one enclosure
line 163, column 0: content:encoded should not contain iframe tag

I’m not sure what’s up with the w3c feed validator not being able to retrieve the feed, but it looks like NewsBlur *can* actually retrieve it, but there’s a parsing error after it’s retrieved.

For what it’s worth, there’s no mention of compression or any sort of encoding here:

 ❯ curl "http://languagelog.ldc.upenn.edu/nll/?feed=rss2" -D - -o /dev/null 
 HTTP/1.1 200 OK 
 Date: Tue, 17 Jun 2014 22:16:03 GMT 
 Server: Apache/2.2.17 (Unix) mod\_ssl/2.2.17 OpenSSL/1.0.0d PHP/5.3.5 
 X-Powered-By: PHP/5.3.5 
 Expires: Wed, 11 Jan 1984 05:00:00 GMT 
 Cache-Control: no-cache, must-revalidate, max-age=0 
 Pragma: no-cache 
 X-Pingback: http://languagelog.ldc.upenn.edu/nll/xmlrpc.php 
 Last-Modified: Mon, 16 Jun 2014 23:42:40 GMT 
 ETag: "29f988671d23727017257a6e37d24d07" 
 Transfer-Encoding: chunked 
 Content-Type: text/xml; charset=UTF-8

It would be great if Newsblur could take another look at this and see if they can work around the issue. I have also asked Mark Liberman of Language Log to see if anything changed their end, but since other feed readers can handle it I think Newsblur should make another try to get it working.

The RSS Validator needs to be able to validate the feed. Let Mark at LL know so he can fix that first. Then NewsBlur will most likely work.

The thing is, the feed *does* validate. And it can be retrieved and read by multiple other means.

@Tim have you heard back from anyone at Language Log?

@Samuel any further news on this front? The feed *can* be retrieved with browsers and curl and other feed readers.

Thomas, if the feed won’t validate on the official RSS validator, then let the publisher know. Oftentimes publishers are happy to fix a feed if the official validator says it’s broken.

I’ve figured this out.

 $ curl -H "Accept-Encoding: gzip,deflate" "http://languagelog.ldc.upenn.edu/nll/?feed=atom" 

Will generate a gzip that has the following appended at the end:

 <!-- Quick Cache is NOT caching this page, because '$_SERVER['REQUEST_URI']' indicates this is a '/feed'; and the configuration of this site says not to cache XML-based feeds. --> 

Most gzip implementations seem to be able to strip this, but I guess SAX and whatever W3C is using choke on it (browsers display the feeds fine, and gunzip is able to handle the gzipped version without problems though it mentions it’s stripping stuff from the end).

I’ve informed Mark of this, but there may be an easy way to fix this on your end as well.

Thanks!

Mark has replied that he has deactivated Quick Cache for now and NewsBlur has successfully loaded Language Log now!

1 Like

Thanks so much for figuring out the Quick Cache thing. I’ve never heard of it, but it looks like it’s a WordPress plugin. Thanks also for getting in touch with Mark at LanguageLog and getting him the information he needed to fix the feed.

(It might be worth getting in touch with them and asking them to come up with a better, more compatible way to indicate that quick cache isn’t caching the page so that this doesn’t become an issue with other WordPress sites that are using Quick Cache.)