|
The surge in dot com funding has brought us commercial ventures
such as netLibrary [1], ebrary
2], and Questia
[3], which are
creating electronic libraries with collections that include tens of
thousands of books. And they are growing fast: although modest by the
standards of print collections, these commercial digital libraries
already dwarf even the largest non-profit collections.
Whatever their technical strategies, these enterprises provide at
least one enormous service. They make available in electronic form
materials that are protected by copyright and controlled by
publishers who depend upon sales revenues. The challenge is
enormous: these new digital libraries must negotiate agreements for
tens of thousands of books and then convert these materials into a
useful electronic format. Nevertheless, developing a good business
plan and building the library challenging as those tasks may
be may prove to be the easy part. Developing a productive
relationship with higher education requires more.
First, academics have grown wary of commercial publishers. Many
feel that the chain from author to audience has already grown too
long and expensive. The appearance of new for-profit digital
libraries means that the consumers of information will be asked to
support yet another organization (with its accompanying salaries and
investors). Universities are more complex than publishing houses, and
much as they could use extra revenue, it is by no means clear that it
is in the interest of universities to mortgage their intellectual
property to subscription services. The better a university, the less
dependent it is upon income streams such as tuition: big universities
derive their fundamental strength from augmenting their endowments
and attracting research funds. Nevertheless, the ultimate source of
their strength is prestige. Prestige draws gifts and grants. The
prestige of academia as a whole strengthens the social contract
between higher education and society. In cases where the
social contract is clear and popular, the results are tremendous:
projects such as the Human Genome Project and agencies such as NIH
flourish because the American people understand that such research
may help them live longer, healthier lives. By contrast, the National Endownment for the Humanities (NEH) was almost eliminated and the National Endowment for the Arts (NEA) permanently crippled in 1995 because
the population felt (and not without some justification) that
humanists and artists largely disdained the general public. The
near-death experience forced the NEH and (at least some)
humanists to begin rethinking the relationship of their work to
society. The strategic mission of institutions
individually, and of higher education as a whole, is to reestablish
public prestige by demonstrating that they are contributing to
society. To achieve this goal, the broader the audience is, the better.
The implications for the new commercial digital libraries are
substantial. Publishers need an income stream; universities and
academics, while happy to have income, fundamentally need exposure.
As a rule of thumb, anything published in university presses is
written for prestige. Economic pressures have forced university
presses to worry more about print runs, but this pressure has, I
believe, lessened our ability to communicate complex arguments to our
colleagues. Insofar as the new commercial digital libraries increase
exposure, they advance the interests of higher education.
Every new entrant into the commercial market needs to reflect on
the problem of who owns scholarly output. Universities, in my
experience, are less interested in owning their faculty's work
than in making sure the faculty assign only non-exclusive rights.
Although publishers hold exclusive rights to most content, that model
has provoked widespread dissatisfaction. In the print world, where
libraries build up permanent collections of inexpensive books,
copyright is not viewed as a problem: the physical reproduction of
print has been so expensive that it is simpler to buy books from a
publisher. In the electronic world, however, the marginal cost of
additional copies is essentially zero; when (some) print publishers
use their monopolistic positions to drive up prices and create
mechanisms that protect copyright by restricting the spread of
information, many academics question the tradition whereby faculty
transfer exclusive rights to publishers.
If push comes to shove, universities have a very strong case that
they contribute the overwhelming majority of resources to scholarly
publication. Consider this rough and very conservative
estimate: Assume that it takes two years of labor to produce a solid
academic book (this is conservative as it assumes a professor
publishes a major book every six years, working part time during the
academic year, full time in the summer and full time during a six
month sabbatical). Even paying a minimal salary
($40,000/year before benefits), the university has invested at least
$100,000 in that book. How much money does the publisher really
invest with editorial advice and even modern copyediting (i.e.,
stylistic editing and intensive XML tagging)? Note that many of the
most prestigious book series are edited by professors. Faculty
advisors do the work for free or for a pittance because of the
immense power and patronage they receive. The money people at
universities can do the math also. Universities end up paying for the editing
through their library acquisitions budgets. They are perfectly
capable of reallocating that money so that they do the editing
themselves and publish the books electronically.
Consider another figure. If 25,000 of the books in the original
release for a commercial digital library, for example, are university
press books or written by professors, then you are looking at a $2.5
billion investment of academic labor. Huge as a commercial digital
librarys investment may be, it is at least an order of magnitude
smaller than the investment made by the university community. When
we consider the cost of creating content, a strong case can be made
that any commercial entity adds only a drop of value to the immense
pool of knowledge fed and funded by academia.
Already, projects such as the Open Archive Initiative, the
American Memory Project at the Library of Congress, the Perseus
Digital Library, and others make their data freely available. These
may or may not be the wave of the future, but the new commercial
digital libraries need to decide how (and whether) they can survive
if such projects do become the rule. What happens if all the data is
made freely available and all the core software is available under
open source licenses? Can they make money?
I suspect that they can. The academic world invests a substantial
amount of money, through both individual and library purchases, to aquire third party information. The new digital library companies
have an opportunity to explore new models of partnership with higher
education the current economic and social structures certainly
seem problematic. I do not know what those new models should be, but
I offer as a kind of "open letter" to commercial digital libraries
the following ideas for ways they might pursue productive
relationships with higher education:
- "Dot coms" must recognize that they are commercial
firms and that they will be viewed as such. Always start by making it clear
that you need to make money, even if you go on to stress the public
benefit you hope to provide.
- Academics are clannish and pay close attention to who starts and
who runs a company. Corporate web sites that present a team of managers
with few academics may attract investors but can alienate academics.
Many times I have heard librarians and administrators remark that
they really don't want to work with commercial firms if they can help
it; if, however, a web company was founded by an academic, they take
a second and much more positive look at that company.
- Academics in general and humanists in particular are also
professional cynics. They may make short-term deals based on your firm's
current management team, but no responsible leader will establish a
long-term relationship with you based on what she thinks of the
current management team or of your personal ideals. Everyone has to
assume that, if a start-up company is profitable, it will be acquired
by a larger commercial concern; for this reason, no one can give any
commercial digital library the benefit of the doubt on any long-term
deal or assume that it wont exercise the full power of its
market position as ruthlessly as possible.
- Make it a public policy that your company seeks only non-exclusive rights. Academics want a "second source" to
information. They have had plenty of experience with monopoly
capitalism, and while they may accept your position as a necessary
evil, in their
eyes, you will be just another publisher out to make a buck. Your power will be the aggregate of materials and services that
you provide. You may get market advantage from exclusive deals, but taking the high road of non-exclusivity will net
you more in the long run.
- Entering large numbers of books is "the easy part." The real
issue is how to help people improve the questions they can ask.
Giving scholars an on-line research library -- an electronic
environment where all the books were on-line and all the citations
had been converted into active links -- would be an important first
step, but still only a first step. The hard part comes
developing the tools people really need. Such tools tend to be
domain specific and often are not obvious. Some commercial
systems are beautifully organized to perform the staggering job of
converting and delivering hundreds of thousands of books, but I am
not sure whether any are really designed to reach the next level.
- In structuring data, avoid proprietary formats. Use of
proprietary structures will provoke resentment and resignation at the
best, public outrage at the worst. Academics love open standards
that's one reason why UNIX has flourished in universities.
Publish your format. Make it possible for other people to create
documents that conform to your service without your having to do the
data conversion. If you have a brilliant new Document Type Definition (DTD) that lets you add
new services, for example, the XML community will figure out what the
underlying data structure is anyway. It may be that your tags will
only bear fruit in the future (that's a situation we know very well,
having worked with SGML for almost fifteen years) but the ill-will
caused by a proprietary DTD could really damage your standing. Many
academics love Adobe, for example, because Postscript is a public
standard and because Adobe is pushing other open standards (such as
SVG).
- Learn from the Genome debate. There is an emerging consensus
about how to balance public good and private interest, allowing people to patent
some things but making sure that enough information is in the public
domain to help push science and industry. The analogue in the
humanities would be the following: "the primary sources are
free; copyright protects the secondary sources with the notes
and interpretation."
- Make a sharp distinction between public domain data and the
copyrighted data you license. Allocate 10% of your data
conversion program to public domain materials and give the XML source
texts out freely under something like a GNU Public License (GPL)
meaning that you have to get credit and any modifications
people make must be freely distributed. Academics love things like
open source. The more you can do with something like the GPL, the
more people will trust you and the more you will dramatize the fact
that you aren't just another publisher. One member of a commercial
digital library firm mentioned with justified disgust that one
publisher wanted royalties for its reprint of Moby Dick. Lots
of people have had the same experience use the short
sightedness of the publishers to strengthen your own company. If you
start with 90% protected and 10% fully open materials, you still have
a huge added value to sell. And you get the advantage that people
(who unreasonably want everything for free) will direct their
resentment against the publishers and not your company. Every
person who uses the public domain sources will see your company's
name and URL. Banner ads may not prove effective for many companies,
but discrete "PBS style" sponsorship links may be ideal for new
commercial firms seeking credibility as well as exposure. Imagine the
press that you would receive if you were able to "tithe" your data
conversion. "Questia/Netlibrary/Ebrary, etc. presents the Library of
Congress with 5,000 key source texts for literature and history, with
promises of 20,000 more in the coming two years."
- Make tagged sources available. However much money you invest, you
can't possibly pay for the kind of careful tagging and editing that
many complex sources really require. Such "level 5" tagging (to use
an LC taxonomy) requires knowledge of the source materials and the
subject. Even with clever software, this sort of editing is very
laborious. But this sort of editing is exactly what scholars have
traditionally done. There is a growing number of young academics who
combine domain knowledge and technical expertise let them do the hard
work and let them validate the documents (another reason to publish
your formats!). Make fully tagged sources available, however,
individuals are going to be less likely to add value to your e-texts
if they know they are putting back elaborate tags that you have
stripped out.
- In general, look for places where you can help institutions as
well as students. A lot of the most expensive tasks that drive up overhead and eat up budgets could, in a world of fast
networks, be outsourced. Library functions are only one example.
- Your investment in technical infrastructure is very important.
Even in the most radical open source community, people accept the
notion of paying for better service. The Perseus Digital Library,
for example, has many strengths but we are outgrowing our
infrastructure. Thus far, we have been able to add capacity in tiny ($5-10K)
increments. The next jump probably reflects a $500K investment.
Furthermore, a free site such as ours cannot guarantee 7*24 service.
People use the site at their own risk with no guarantees.
- Provide access to expert librarians. My own institution, for
example, is a major university, but we have only one humanities
librarian. It is impossible for him to keep abreast of everything
going on in the humanities. If I want to find out if there are
microfilms of nineteenth century British newspapers, he is not much
better placed to answer this question than I am. If I had access to
a librarian specializing in 19th century British history
and literature, the answer would be authoritative and fast in coming.
Collect a group of domain experts to provide authoritative
information. They could do so by e-mail and phone; perhaps better
still might be having queries be public so that everyone could see
the questions and answers. You can collect domain experts very
quickly and immediately provide a service far more expert than all
but the strongest schools have. Thus, even before you have the
on-line books, you can have on-line reference librarians to level
the playing field between the small schools and the great
universities.
- If you have particular expertise at structuring information,
offer this as a service. Projects such as mine can get basic XML
tagging from India and China, but the important work the
structure that really adds value to the document has to be
done in-house. Real tagging (like cataloguing) requires serious
expertise and (for now at least) benefits greatly from an investment in
customized software tools. OCLC has done pretty well selling
catalogue records. Creating XML is even harder. The key is being
able to draw on expertise not available in a data entry firm
off-shore. I know how much we pay for outsourcing my data entry.
You ought to be able to get great rates given your volume. Why not
turn this into a profit center? My university, for example, is
developing the infrastructure to convert its publications into high
quality XML. You could probably do this more efficiently than we
could in-house. Even after you have entered every back list book you
want, you will have plenty of documents coming through. This industry
plays to the insecurities and aspirations of the universities. They
want to own the data. Let them own it. Come to us and we will
structure your documents for you. You can then put them up on your
own sites. We will, however, also offer added value services
that people can pay for and, if we do the work, then we
guarantee that the data will derive the maximum benefit from our
system. If someone else can do better with their software, more
power to them. Ultimately, publishers may have to go this route
anyway. Why not get there first? And helping universities maintain
non-exclusive rights to their intellectual property would make you
heroes in many circles.
.
References
[1] netLibrary <http://www.netLibrary.com>
[2] ebrary
<http://www.ebrary.com>
[3] Questia
<http://www.questia.com>
Copyright © 2001 Gregory Crane
|