Hitting my feed too often

NewsBlur Feed Fetcher - 12 subscribers - http://www.newsblur.con has hit my feed over 50 times in 12 hours. Is there any chance you can fix the reader to avoid hitting sites at this rate for no good reason?

1 Like

Josh Seid–
I know you aren’t Newsblur.

To answer your question: from my point of view as a content provider with a self hosted blog paying for my server resources, a bot should not visit a feed that updates roughly 7 times a week that often.

I don’t post ever 15 minutes. I rarely post more than once a day. Like Sam who pays for his server hosting, I pay for my server hosting. I try to limit all the scads of robots each of whose bot masters programs them to visit more frequently than could benefit me (or interested readers.) I presume that if Newsblur’s the feed reader is working properly it ought to be more courteous, and no more than once a day.

I think Newsblur, as a commercial entity whose owner has programmed his bot to visit my sites and collect my content for commercial redistribution to his customers, he ought to be more courteous.

Note: As far as I am aware, Newsblur has no communication channel other than this one or private email to Sam to permit content providers to communicate their preferences to his bot or service. As a courtesy to the people whose content he copies, Sam could consider creating a channel that is convenient for the content providers, placing a link to the resource in the User Agent so that we could conveniently do visit and inform our bots about our preferences.

Options could include: General rate limiting of for his several bots,
(a) not framing content (‘story’),
(b) not visiting or copying full content for ‘original’,
© not visiting ‘feed’ (or trimming the feed), or
(d) not copying for ‘text’ and of course
(e) not visiting at all.

Other nice features could include permitting content creators to request deletion in some set amount of time. Alternatively, he might offer to negotiate licensing fees for his commercial redistribution of content. For example, we could request that he delete all our copyrighted content from his server after 30 days. We could also request that the feeds he copies only be displayed to customers who subscribe to them rather than to anyone and everyone who happens to know the url. Or we could offer to permit him to display in certain ways for a fee. (I realize that he might have difficulty getting customer to pay for his service if he has to pay the content providers. But the fact is, his business model involves much more than merely copying and supplying feeds to his customers.)

In any case, as things currently stand, I see no good reason for Newsblur’s bot to visit at the rate it visits, particularly not for 12 customers. If these are not paying customers, I should think Sam and his customers should thank me for alerting him to the fact that his system is wasting its own resources which it could better use refreshing feeds from blogs that have recently updated. Now that he knows how many useless visits his bot is expending, he may be able to reduce use of his own server resources and so reduce costs of his system.

Samuel,
I should consider myself lucky that your bot is hitting my site? Wow!

I consider it presumptuous for you to tell me how I should feel and presumptuous for you to tell me what I should value.

If visitors who value my content and who for some reason wish to learn I have posted fresh content through your service need to wait 24 yours to learn I posted, that’s ok with me. In fact: I prefer it. This is my preference even if you think I should prefer something that suits your interests better.

My site is rankexploits.com/musings. I already trim my feed when I see your particular feed reader while providing full feeds to more courteous services. I do not plan to implement PuSh for your convenience particularly as I believe that would prevent me from trimming the feed I provide to your site. (I trim it to the extent that I find it difficult to believe anyone reads my feed from your site.)

Please be courteous and rate limit your bot. Or, if you really wish to send it that often, I would be happy to negotiate a licensing fee for your business use of my content.

If you want to limit RSS feed reader hits to your precious service, you should implement PubSubHubbub, which most smart feed readers support: http://en.wikipedia.org/wiki/PubSubHu


Frederick,

I am happy with the traffic I get. I blog for my own personal interests. My aim in not professional prestige. I also don’t think it is your place to decide what I the content creator consider to be the important part of my blog.

With respect to what I deem important to me, as far as I can see, Newblur brings zero value to me or my readers.

My concern is not forcing all access through a narrow gate. I don’t monitor any measures of impact nor citations counts. I don’t count my readers. I like the readers who visit and I enjoy conversations in comments at my blog.

As for the moment: I am willing to permit Newsblur to continue to visit the feed pages. When I see their agent, I will provide trimmed feeds which will not display images. (This by itself makes feeds at Newsblur nearly useless for anyone interested in reading the content which relies heavily on plots graphs and so on.)

Meanwhile I block the “NewsBlur Content Fetcher” and the “NewsBlur Page Fetcher” and will continue to do so.

By granting Newblur this modest access, those readers who wish to be informed I posted can learn I have posted and click on over if they are interested. But I see no good reason for NewsBlur to visit every 15 minutes. If you read my blog you know my posts are not “fast breaking” news and there is no good reason for visits at this rate.

Since you bring up the issue of commercial value: Of course generally speaking I see little commercial value in my blog. I run a few ads to help defray costs of hosting.

Yet it seems to me that Newsblur is a commercial entity. As part of its business practice, it is copying my content, storing material on their servers and providing that content to its customers. Clearly, Newsblur is somehow seeing some sort of commercial value in the content they copy though that commercial value seems to be limited to Newsblur promoting and selling Newsblur. (BTW: I have nothing against free enterprise. I only note that the ‘not commercially valuable content’ Newsblur copies seems to have some commercial value to Samuel Clay.)

Right now, my request is modest. I would like Sam to be courteous and throttle his bots to once a day rather than every 15 minutes. I am willing to let him visit at this rate without charge.

If he wishes to copy my “not commercially valuable content” more frequently to support his commercial venture he should arrange a license. If he wishes to stop visiting entirely: That would be fine with me.

I note the most recent visit is:

#: 78087 @: Sat, 29 Jun 2013 12:18:58 -0700 Running: 0.4.10a1
Host: 198.199.82.46
IP: 198.199.82.46


User Agent: NewsBlur Feed Fetcher - 12 subscribers - http://www.newsblur.com (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.2.3 (KHTML, like Gecko) Version/5.2)
Reconstructed URL: http:// rankexploits.com /musings/feed/

If this is the true NewBlur, I request these be throttled. If they are an imposter, let me know so I can ban the IP range immediately. If this is Newsblur’s IP range and the visits continue at the present unwelcome rate, I will ban the agent entirely.

Lucia,
I’m not one of your readers so I can’t speak for them. I do however consume a lot of blogs using newsblur and I host a number of blogs and write for a few too. I find RSS readers a fantastic tool to keep track on the RSS feeds I enjoy reading. I used to use google reader and before that I used thunderbird. They would fetch feeds once an hour or more or less often as I set them.

As a server administrator I’ll tell you the load on the servers was very minimal, even the cheapest of hosting can handle a few hundred requests a minute without blinking. If all your readers who use newsblur used one of those tools you’d probably have more traffic. I’m guessing it’s not server load that’s the issue.

It seems that you would like some control over how newsblur accesses your site. Google reader used to offer this control (I don’t remember how and googling doesn’t seem to bring it up anymore) but the http code 420 is being used by twitter. I don’t know if that’s going to be a prevailing trend but it’s a 4xx error and that’s good enough. Newsblur would presumably just try again later.

“420 Enhance Your Calm”

I suspect Newsblur respects the cache-control header.
http://www.w3.org/Protocols/rfc2616/r


Some quick probing shows you are using Cloudflare, which is great for security and caching. I suspect that’s how you found it.

Perhaps you should remove your RSS feed and implement a paywall?

Lucia –

Please note that while you “find it difficult to believe anyone reads my feed from your site”, NewsBlur does NOT crawl around looking for random blogs to fetch. The ONLY reason it is fetching your blog is because there are NewsBlur users who are reading your blog.

In fact, the headers you post suggest that there are, in fact, at least 12 users who are reading your blog from NewsBlur.

“With respect to what I deem important to me, as far as I can see, Newblur brings zero value to me or my readers.”

I imagine the 12 readers of yours who read your site via Newsblur would argue otherwise.

Why would you have an RSS feed at all if you don’t want RSS Readers and feed aggregating sites syndicating your content?

reconbot,
"They would fetch feeds once an hour or more or less often as I set them. "
It would be nice if Newblur provide a service that permits the server administrator who runs the blog to instruct them to visit less frequently.

I am happy with my level of traffic and I don’t believe Newsblur affects that level in anyway.

You are correct that the problem is not entirely one of load. I dislike Newblurs proactive copying of non-feed content which previously occurred using their “NewsBlur Content Fetcher” and the “NewsBlur Page Fetcher” . I also do not like framing of my content and displiked Samuel’s frame-buster.

I wrote a framebuster-buster-buster in respose to Newsblur framing. Though Samuel eventually stopped framing after my previous request, I am glad I wrote it as it prevents some other entities who wish to frame from framing as well.

Samuel,
I do not consider banning your agent a disservice to my readers. If I did, I would not ban it. I also do not think you are the best judge of what is a service to my readers particularly with respect to believing that your product benefits me.

I noticed you are running copies of my text here:
http://www.newsblur.com/site/590534/ under “text”.

I had previously requested you to stop running content under “story” and “original”. It required multiple requests on my part before you did so. I would have thought it courteous for you to grandfather that request to include your new “text” feature. Please cease copying text from my blog and remove all content copied from my blog without permission from your servers. If you come up with new ideas for copying text from my blog to serve in new ways, please consider any such copying forbidden.

I realize you may think it is unreasonable of me to forbid your copying simply because you wish to run a business copying my material with the intention of displaying copies to your customers. But the fact of the matter is:

Copyright law is on my side of this issue.

Because I know that my ability to ban may end up being imperfect, I am making a formal request that you desist from copying my material either directly from my site, through the intermediary of another 3rd party entity, do not store any of my blog content on your servers and do not display any of my blog content from any copies you may have stored on any server you own or control.

Lucia, your content has been removed. I have deleted your feed.

2 Likes

Nick,

First: I have nothing against most feed aggregators. Most permit their visitors to read feeds after logging in. However, Newsblur post the material publicly so that even those who are not subscribers can read the content. For example you can see the feeds here:
http://www.newsblur.com/site/590534/ I do not like this behavior on their part. Since other aggregators tend not to do this, I see no reason why I should cut them off simply because I would prefer to limit Newsblur.

Second: I do not like Newsblur’s strike-out and replace feature for edited content. I realize others may like it, but I don’t. For that reason, I prefer to trim the feed I provide them.

Third: At least in the past, and I believe currently, Newsblur does more than visit RSS feeds. The “NewsBlur Content Fetcher” and the “NewsBlur Page Fetcher” copy non-feed content. They appear to scrape, host content on their servers and display content they host on their servers. Although after asking (more than once) I got Sam to stop scraping the main page to display as “original” or framing content, the event displeased me. Newsblur also frame content in “story” mode. I also dislike this.

I do not like these features of Sam’s business model and I prefer to limit the amount of material I make available to Newsblur but I would be willing to do so if he would ratelimit his bot. It appears he will not do so.

Meanwhile, I am happy to provide RSS feeds to other feed services who are are more respectful of my position as a content providers.

Other content providers may, of course, do as they wish and grant Newsblur whatever access they prefer. My preferences need not be theirs nor theirs mine.

With respect to the 12 people (or bots) who took out subscriptions at Newsblur: It is possible they would argue that Newsblur provides them an important service. However, the fact that someone subscribed to my blog at Newsblur does not mean they read my blog. Many people subscribe to feeds yet rarely read them. In any case, they remain free to read my blog if they wish to do so.

Samuel–
The email in my inbox read:
From: NewsBlur on Get Satisfaction!
Subject: New reply: Hitting my feed too often
Date: June 29, 2013 4:25:50 PM CDT
To: NewsBlur on Get Satisfaction!

Samuel Clay, an employee of NewsBlur, replied to Hitting my feed too often, a problem about NewsBlur.

Lucia, none of your content is on NewsBlur’s servers. Take a look at that URL you linked to."

The URL I linked to was:
http://www.newsblur.com/site/590534/ . This begins with “newsblur”.

Based on the comment here, and my visit to http://www.newsblur.com/site/590534/, it appear you removed that content. Thank you.

(I’m not affiliated with NewsBlur, so if this makes you angry, don’t blame them, but:) one feed request every 15 minutes (on average) is really “too often” and a problem?

4 Likes

If your site was more popular, I’d hit it harder. Since you didn’t link to it I can’t find out any more information, so I’ll just say that once every 15 minutes is a fantastic schedule and you should consider yourself lucky that your readers get near instant notification of new stories.

Also, if you want to lessen the load on your server (and every 15m is a drop in the bucket), implement PuSH (PubSubHubbub), which would make NewsBlur hit your site 1/10 as often.

5 Likes

It is just incomprehensible that Lucia Liljegren would complain about getting accessibility through Newsblur!

What Newsblur offers is obviously a very valuable service to bloggers. For the kind of material on Lucia’s blog, the important result is “impact”. Her material is of no conceivable commercial value, and to have greater effect her work needs to be as easily accessible as possible (which can translate into greater value as professional prestige).

It is nearly unbelievable that Lucia would prefer to actually limit access by anyone interested, in favor of prescribing exactly how anyone can access the blog. If her concern is to force all access through a narrow gate so she can collect readership statistics, she should turn her attention to other and better measures of impact, such as citation counts. Writing a blog post is like writing an op-ed in a newspaper; counting readers is necessarily imprecise.

From WikiPedia: “Professor Judith Curry, a climatologist at Georgia Tech, calls Lucia ‘probably the least controversial person in the climate blogosphere, because of her cheerfulness and sense of humor, honesty, and open mindedness.’” Those admirable qualities should lead her to understand how valuable it is to get increased distribution through Newsblur, resulting in greater impact. This is especially so since (I judge) her views are likely to be the correct ones, and the world would be a better place if they were more widely understood–not if they were restricted.

3 Likes

Doesn’t the ETag HTTP header help with things like this? If there are no new posts to the feed, the NB requests will get back a 304 NOT MODIFIED code instead of the normal 200 OK code, reducing server load. The already suggested PuSH would reduce load even further.

Also, I fail to see how a single service polling a feed is worse than the alternative of multiple clients each polling the feed, which is what would happen with desktop RSS readers. One request really is a drop in the bucket, assuming a server that is less than 20 years old and on a non-dialup connection.

3 Likes

I suggest you ban the feed fetcher entirely, because your demands are unreasonable and a disservice to your readers.

8 Likes