In trying to track down problems and bugs, I've enabled extensive logging
into the meta-search engine at Cyber411.  Frankly, I'm amazed at what I'm
getting.
  The (official) front end (I have found others through this logging) is
located at 'http://www.cyber411.com/search/' and the output comes from 
'http://www.cyber411.com/search/nph-bsframe.cgi' [1].  Here's some select
output from the latest version (with comments added): [2]
Dec 20 11:36:32 5Q:silly 1.0.6C[10891]: badrequest - [HEAD]
Dec 20 11:36:32 5Q:silly 1.0.6C[10891]: badrequest - from 38.11.233.106
Dec 20 11:36:32 5Q:silly 1.0.6C[10891]: badrequest - refered from (unknown)
Dec 20 11:36:32 5Q:silly 1.0.6C[10891]: badrequest - user-agent Mozilla/3.0 (Win 95; U)
                   ----- ------ -----   ----------
		    |      |     |          |
                    |      |     |          +----- error message logged
		    |      |     +---------------- process ID
		    |      +---------------------- version of nph-bsframe.cgi
		    +----------------------------- machine name
	It never occured to me to check for a HEAD request for a CGI
	program.  The main problem is - the size of the resulting 
	document changes depending upon the search criteria.  I suppose
	I should check up on what to return for a HEAD request.
	But I'm surprised that Netscape sends this out.  Mostly, I get:
Dec 20 23:39:32 5Q:silly 1.0.6C[17426]: badrequest - [HEAD]
Dec 20 23:39:32 5Q:silly 1.0.6C[17426]: badrequest - from ax1.healey.com.au
Dec 20 23:39:32 5Q:silly 1.0.6C[17426]: badrequest - refered from (unknown)
Dec 20 23:39:32 5Q:silly 1.0.6C[17426]: badrequest - user-agent Mozilla/3.0 (Win 95; I) via Squid Cache version 1.0.12
	This poor fellow attempted to use the search engine for about five
	minutes straight (maybe two dozen attempts).  Most of the HEAD
	requests seem to be coming from proxy/cache servers (at first
	I thought it was naive (read: stupid) robots.  I added a robots.txt
	file to the site and am still getting HEAD requests.  Ah well ... )
Dec 21 07:40:12 5Q:silly 1.0.6C[19751]: contenttype(003) - from leo.xnet.com
Dec 21 07:40:12 5Q:silly 1.0.6C[19751]: contenttype(003) - refered from http://www.cyber411.com/search/
Dec 21 07:40:12 5Q:silly 1.0.6C[19751]: contenttype(003) - user-agent: Mozilla/2.0 (compatible; MSIE 3.0b1; Mac_PowerPC)
Dec 21 07:40:12 5Q:silly 1.0.6C[19751]: contenttype(003) - data: [text/html]
	Seems like Microsoft doesn't have its act together.  I was under the
	(I suppose) mistaken impression that POSTs required a content-type
	of 'application/x-www-form-urlencoded'.  Actually, I take that
	back.  There is a second method, but it requires actually parsing
	MIME based data.  Ick.  Most of the contenttype errors come from
	MSIE.  Figures.
Dec 23 07:03:33 5Q:silly 1.0.6C[10706]: badquerytype - [Hyper        ]
Dec 23 07:03:33 5Q:silly 1.0.6C[10706]: badquerytype - from villella.rnd.aetc.af.mil
Dec 23 07:03:33 5Q:silly 1.0.6C[10706]: badquerytype - refered from http://www.cyber411.com/search/
Dec 23 07:03:33 5Q:silly 1.0.6C[10706]: badquerytype - user-agent Mozilla/1.2 (compatible; PCN-The PointCast Network 1.2/win16/1)
	Another strange browser.  Most broswers seem to truncate trailing
	white space (my program is expecting 'Hyper', not 'Hyper       ')
	but not this one.  This is fairly easy to fix on my side though.
  Sigh.  I wonder how many CGI programmers are aware of these types of
problems?
  -spc (I won't even mention the non-official front ends to the 
	search engine that aren't even correctly set up ... )
[1]	The name has almost no relation to the project anymore, except
	for the 'nph-' and '.cgi' parts.  Technically, it stands for:
	Non-Parse-Headers-BrainStormFrameVersion.CommonGatewayInterface
	Only it's no longer called Brainstorm (that was the original name)
	nor is there a frames version anymore.  Ahh, the price of
	progress 8-)
[2]	Logging is done via syslogd.  The name of the machine is silly [3]
[3]	silly being an SGI.
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html