A reporter interviewed me today about the future of the robots exclusion
file (robots.txt) and as we talked, it seemed increasingly as though a
standard for notification might avoid some of the need for enhancements to
robots.txt, while bringing along a number of benefits. Notification was
discussed as a Good Thing to Have at the W3C Distributed Search and
Retrieval Workshop.
For those who are wondering, a notification protocol would be a means
whereby a search service could be notified that a Web resource (a page,
typically) is new or changed, presumably prompting the service to re-index
that resource. Notification could be built into or added onto Web servers
so that as documents are published and changed, notices would be sent.
The great benefit one might imagine from this is that search engines could
spend a lot less time exploring and a lot more time indexing and searching;
search latency would presumably rise.
One could hope that a notification mechanism might be designed to allow
publishers to indicate whether a change is major or minor (spelling
corrections might trigger a re-indexing, but not peoples' filtering agents,
for example) and which documents are duplicates of one another.
This opens the door for a new flavor of spamming, especially as agents are
increasingly available to monitor new and changed documents -- spammers
could submit ads repeatedly as "new" documents...
Does anyone see any movement to make notification happen? What would it take?
Nick