When is Honesty the Best Policy?
In the August (1995) issue of Communications of the ACM, Pamela Samuelson draws an interesting distinction between interfaces that users see and those that they do not. She suggests that copyright lawyers have perceived differences between end-user interfaces and interfaces internal to the software because "lawyers can see user interfaces of programs whereas they generally cannot see internal interfaces" (p. 15). For the computer science technical community, on the other hand, "an interface is an interface" (p. 16).
I do not intend to discuss intellectual property law here. But as a non-computer science person myself, I am taken with her comments as emblematic of differences between the world that outsiders see and use and the world that technical professionals create. It seems to me that this distinction poses a very interesting problem for system designers, that is, what do you disclose to the user, given that you cannot fully anticipate the set of needs and values that users will bring to the system?
Clearly, the point of end-user interfaces is to isolate information of interest to the user (sometimes known as "content") from the information necessary to make the system work. But the boundary between the two types of information is not always obvious, and what may be satisfactory to one class of end-users may not work for another. Imagine a distributed system of 100 servers containing information accessible through a common search engine, which supports parallel processing of queries.
We submit a query and get back 9 hits from 15 independent collections. That appears to be an acceptable response. But is it? The designers may have set up the system to return the "nearest" record, and by controlling for duplicates -- or rather, apparent duplicates -- in the returned set, the response has dropped information about multiple instances of a given record that an end-user might find useful. Some users may prefer to search the collection at site A because of its authoritative, updated records -- particularly if the database contains full texts instead of just bibliographic records. On the other hand, editors and fact-checkers routinely conduct simple known item searchs, and it may be sufficient to verify that an item does exist. The point is that a system decision, designed to increase efficiency, may drop information important to one class of users but irrelevant to others.
We could probably fix this scenario by setting the default to "nearest" but offering the user the option of retrieving all hits without controlling for duplicates. Such unrestricted searching is probably not a problem in short, well-constructed or known-item searches. Still, users who are early in research process or who are browsing in search of ideas might tie up the system with vague queries, so some sort of restrictions are probably necessary -- particularly if users have to pay for the system, or if there is a block payment covering many individual users. It is more a matter of when these restrictions are invoked and who has permission to change them. And that might well be a decision taken by the client's set-up rather than by the server's architecture.
What else do we tell users? Do we tell them how many queries have been submitted, hoping that large numbers will foster confidence? Do we also tell them which servers are not available at the time of the search? The value of a networked system increases with the addition of nodes to it, but in an extensive system, it is unlikely that all of the servers will be up all of the time. If at the time of our hypothetical search, 93 of the potential collections are available, and if none of the 7 absent collections contains information pertinent to the search, then knowing which servers are down is irrelevant. However, we cannot always assume that absent servers will contain irrelevant information. An implied "no data" is not equivalent to the actual "data not available," and few things undermine confidence more than finding out that one has been gulled into a false sense of security. On the other hand, will users inexperienced in the research process lose confidence in the system itself if they are told that several servers are down? And are we condemning users to multiple search sessions? For exploratory investigations, that is probably a fact of life, but for known-item searching, it could become a serious irritant.
Full disclosure carries other risks besides annoying some users. If we consistently publicize the fact that server X is down, are we doing our users a favor, or are we undermining confidence in server X and damaging the parent institution's reputation? As Samuelson's comments remind us, there are more interests at work on systems than those of the engineers who build them. And what you see is not always what you get.
Amy Friedlander
Editor
hdl:cnri.dlib/august95-friedlander