Caslon Analytics elephant logo title for Publishing guide
home | about | site use | resources | publications | timeline   spacer graphic   Ketupa


past & future
























on demand

rights trade





related pages icon






related pages icon

Print &
the Book


section heading icon     Digitisation and archiving

This page looks at digitisation and at archiving of the net.

It covers -

subsection heading icon     introduction

Large-scale projects to 'digitise the past' and thereby ensure future generations have networked access to print publications, photographs, sound recordings, cinefilms and other material have proved contentious. 

Digitisation means users view a 'digital surrogate' (preserving often fragile originals), access is not tied to physical proximity (ie ease of convenience and savings in staff costs) and physical storage requirements are reduced, although costs savings are not as great as anticipated and there has been considerable criticism of institutions - such as the British Library - that digitised and then destroyed major parts of their collection. 

The Preserving Digital Information report of the CPA & RLG suggests that digitisation by individual institutions is often not cost effective; however resource sharing (ie collaborative digitisation and access to shared material through an intranet or a global digital library) is attractive.

Andrew Odlyzko echoed Michael Leask, author of Practical Digital Libraries: Books, Bytes & Bucks (San Francisco: Morgan Kaufmann 1997), in noting that

the costs of just the buildings of the new British Library in London and the new French National Library in Paris are two or three times higher than the costs of converting their book collections to a digital format. In a more rational world, the money going into bricks and mortar would have gone into scanning the books, which would have provided much more rapid and convenient access to the data for scholars. The physical volumes themselves could be housed in cheap warehouses, for the rare occasions when they might have to be consulted. However, user resistance to new media, copyright constraints, and the politicians' and the public's liking for visible edifices and for solid books make it hard to take that step.

.... the entire mathematical literature collected over the centuries is perhaps 30 million pages, so digitizing it at a cost of $0.60 per page would cost $18 million, less than ten percent of the annual journal bill

subsection heading icon     benchmarks

In the US the American Memory (AM) project, aimed at providing digital access to millions of items held by the Library of Congress and other institutions has, for political as well as technological reasons, concentrated on the digitisation of images - including maps, paintings, photographs - and some manuscripts of literary or historic significance. 

Locally the National Archives of Australia (NAA) has digitised key federation documents and commenced the daunting task of providing digital colour facsimiles of the millions of documents in its custody, while the National Library's PictureAustralia (PA) is a gateway for images from the State Library of Victoria, University of Queensland Library, Australian War Memorial and other institutions. 

The University of California's Alexandria Digital Library project (Pharos) aims to create a digital library encompassing maps and pictorial material for use by institutions across the US.

Yale University's Project Open Book (POB) is exploring the conversion of microfilm, hitherto the medium of choice among the archival mafia, to digital imagery.

The Mellon Foundation, noted earlier in this guide, has funded the large-scale Journal Storage (JSTOR) Project, with universities coming together to provide ongoing electronic access in a secure environment to over 147 law, science and humanities journals. Imaging of that print material is now close to the target of 750,000 journal pages, with access by over 1,000 institutions. In April the Foundation announced establishment of artSTOR, a large-scale digital image library.

As part of the Making of America Project a consortium of US universities such as Cornell and the Uni of Michigan are placing the text of several thousand magazines and books online.

subsection heading icon     private projects

Most media attention has focussed on two private initiatives - Bartleby and Gutenberg - although they're dwarfed by major academic and commercial digitisation projects. 

Project Bartleby
(Bartleby) is began with online publication of Whitman's Leaves of Grass and now features a full-text searchable database containing over 200,000 web pages, including over 22,000 quotations and 4,765 poems. Most of the content is out of copyright: Bartleby's essentially capturing old publications.

Project Gutenberg
(Gutenberg) also draws on public domain works. Presentation is in ASCII rather than HTML or PDF and material is added to the database by volunteers so the coverage is eclectic rather than comprehensive. Gutenberg has around 3,000 titles. It's unrelated to the academic Gutenberg-E project. There is a characteristically incisive analysis by Bradford DeLong here, commenting that founder Michael Hart's dream "has failed to achieve any form of critical mass" in contrast to Linux and continues to move ahead at a snail's pace.

The more ambitious Universal Library Project (UL) aims to

start a worldwide movement to make available ALL the Authored Works of Mankind on the Internet so that anyone can access these works from any place at any time.

Searching and viewing would be free; individuals and existing libraries would be able to purchase digital copies.

Everynote ambitiously aims to provide scores for all classical music, with a collection that as of July 2004 encompassed over 4,000 compositions for piano and violin (in pre-1924 editions).

subsection heading icon     archiving the net

There is increasing interest in archiving the net, with projects providing thematic/sectoral collections, offering snapshots or more grandiosely attempting to capture the entire web.

An example of the latter is the US-based Internet Archive, under the leadership of Brewster Kahle.

His 2001 Public Access to Digital Material article (with Rick Prelinger & Mary Jackson) claimed that universal digital access is attainable and is the "epic opportunity of our digital age", since

the technology has reached the point where scanning all books, digitizing all audio recordings, downloading all websites, and recording the output of all TV and radio stations is not only feasible but less costly than buying and storing the physical versions.

That is an intriguing but very problematical vision, with major questions regarding intellectual property and resource identification. We have explored some of the issues in a more detailed profile.

     next page  (
on demand)

this site
the web


version of October 2004
© Caslon Analytics