RE: Netscape-Catalog-Robot

Ian King (iking@microsoft.com)
Mon, 30 Dec 1996 14:25:59 -0800


IMHO, this justification is not supportable. The crawler work I've done
has never contemplated an 'option' to disable compliance with
robots.txt. This is an example of a 'protocol' that allows site
managers to *enforce* some etiquette -- e.g.,"don't crawl this tree";
any robot which deliberately disregards this protocol as an 'option' is
effectively saying that Internet courtesy is an 'option.'

There's been a lot of talk that I've seen and heard that timing
restrictions or guidelines should be included, either in robots.txt or
its successor. I think that's a good idea, and again I will disagree
with people who deliberately disregard sites' settings as an 'option.'
To address one point: if you want to power-crawl your own site, there
are numerous alternatives.

Ian King, QA Lead <iking@microsoft.com>
Normandy Information Retrieval WARNING: Dates on calendar are
Internet Services Business Unit closer than they appear.

>-----Original Message-----
>From: Nick Arnett [SMTP:narnett@Verity.COM]
>Sent: Tuesday, December 24, 1996 5:31 PM
>To: robots@webcrawler.com
>Subject: Re: Netscape-Catalog-Robot
>
>At 12:20 PM 12/24/96 +0000, you wrote:
>>
>>Anyone from Netscape listening ?
>>
>>If so, please fix the Catalog-Robot so that it can't send multiple
>>requests per second for hours at a time.
>
>Some customers would regard this as *breaking* the robot, not fixing it.
>Some users will want to run at full-steam-ahead when indexing their own
>servers. Trying to enforce robot etiquette in your code is impractical.
>Instead, we make proper behavior the default and try to educate customers
>about why they usually shouldn't override the defaults. For example, our
>spider can be set to ignore robots.txt, but you have to do it quite
>deliberately. That was a design requirement from some of our customers with
>big intranets.
>
>In short, all it's reasonable to do it ask them to make the *default*
>behavior friendly, which I think is already the case. But we're whistling
>in the wind when we try to ask the code to be the enforcement.
>
>Nick
>
>---------------------------------------
>Verity Inc.
>Connecting People with Information
>
>Product Manager, Advanced Technology
>408-542-2164; home office 408-369-1233
>http://www.verity.com
>
>_________________________________________________
>This messages was sent by the robots mailing list. To unsubscribe, send mail
>to robots-request@webcrawler.com with the word "unsubscribe" in the body.
>For more info see http://info.webcrawler.com/mak/projects/robots/robots.html
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html