Metadata profile: Overview

Ketupa

overview

online

DC, AGLS

RDF

PICS

PURLs

numbers

UDDI

thesauri

landmarks

related
profile:

Directories,
Engines
& Behaviour

overview

This profile looks at metadata, the information that identifies web pages and thus forms the basis of some search engines, directories and content management systems.

The profile covers -

this overview - an introduction to metada
online - questions about use of metadata on the web
Dublin Core, AGLS, ANZMETA, IEEE Learning Object Metadata (LOM) and other metadata sets
Resource Description Framework (RDF)
Platform for Internet Content Selection (PICS), SOIF and MCF - the problematical proposal for identifying offensive material and other online content, and other schemas
Permanent Uniform Resource Locator (PURL) Scheme
Numbers - Uniform Resource Naming (URN) and other numbering schemes for identifying documents on the web independent of their location
Proposals for thesauri that would assist resource identification over the web as a whole or within particular sectors, such as the visual arts or biotechnology
landmarks - highlights in the history of metadata and information identification

It complemented by a separate profile on directories, search engines and search behaviour.

what is metadata?

Metadata is literally information about information. It may be very restricted in scope, such as a simple identification number. Or it may be descriptive, allowing the creation of indexes, lists and other tools that can be used for identification and for evaluation of information.

If you've used a library catalogue you've used such a tool. The catalogue is based on metadata - subject, author, publisher etc - about the books and other documents held by that institution. Metadata predates Gutenberg and over the past hundred years has been used in a variety of applications, such as:

dictionary, directory and encyclopaedia publishing - used to organize information on topics and terms
libraries - used to identify and arrange books and journals (eg each item in a collection was tagged with a unique number and identified in a catalogue that enabled users to locate material by title, author, classification number or a subject heading — providing intellectual and physical access to a collection)
database publishing - underpinning user searching of transaction, bibliographic or other databases
editorial functions - used in "back-of-the-book" indexes.

The broadness of characterisations such as "information about information" or "data about data" has led some analysts to suggest that online metadata has the following characteristics.

It -

exists in the electronic environment
describes the attributes of an electronic resource
characterizes its relationships to other resources
supports the discovery, management and efficient use of that resource

Metadata is one of the key features of the web and, as we suggest in the next page of this profile, is the basis of the semantic web - the net generation of the net. It is found within individual web pages, at varying levels of detail and using varying standards, highlighted below. And it is found in the search engines, directories and other tools for finding sites and individual pages.

This site, indeed, can be viewed as metadata about information on the web and offline, since it identifies and evaluates several thousand sites, web documents and print publications.

In the Metrics & Statistics guide on this site we highlight some of the studies about the growth of the web.

There are now many millions of sites and hundreds of millions of pages. Many of those documents change periodically (eg one study suggests that the 'half life' of a page is less than two years, roughly half the time it takes for most books to go out of print and one reason why many big sites - such as this one - have links that have "rotted"). Neither TLDs nor domain names don't reveal all the treasures (or lack of them) within a site. The size and volatility of the web means that it is beyond anyone to list the contents of all sites/pages and to provide an evaluation.

classification and its consequences

The importance of identification and evaluation - so that your customers can search in a particular part of the haystack rather than attempting to scrutinise every piece of straw - is discussed in Elaine Svenonius' The Intellectual Foundation of Information Organisation (Cambridge: MIT Press 2000).

She offers a demanding but comprehensive introduction to the theory underlying attempts to identify, categorise and retrieve the resources in the 'global digital library', ie information accessed via the web.

There is a more accessible overview of identification/evaluation issues and that library in Christine Borgman's From Gutenberg to the Global Information Infrastructure: Access To Information in the Networked World (Cambridge: MIT Press 2000) - strongly recommended - and Web Search: Public Searching of the Web (London: Springer 2004) by Amanda Spink & Bernard Jansen. Both are more persuasive than pop sci tracts such as David Weingerger - Everything Is Miscellaneous: The Power of the New Digital Disorder (New York: Holt 2007) which neglect the extent to which much online content (such as photos on Flickr) are tagged or otherwise categorised.

Richard Belew's Finding Out About: Search Engine Technology From A Cognitive Perspective (Cambridge: Cambridge Uni Press 2001) is a more theoretical study of search processes, complemented by Donald Case's broader Looking for Information: A Survey of Research on Information Seeking, Needs, and Behavior (New York: Academic Press 2002). It can be supplemented by reference to Managing Cataloging and the Organization of Information: Philosophies, Practices and Challenges at the Onset of the 21st Century (New York: Haworth 2000) edited by Ruth Carter and How Reference Works: Explanatory Models for Indexicals, Descriptions, and Opacity (Albany: State Uni of New York Press 1993) by Lawrence Roberts.

The Advanced Internet Searcher's Handbook (London: Library Association 2002) by Phil Bradley and The Invisible Web (2001) by Chris Sherman & Gary Price provide guidance about online search techniques and resources.

Dieter Fensel's Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce, (New York: Springer 2001) is provocative; for us there is a more convincing exposition in Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential (Cambridge: MIT Press 2002) edited by Fensel, James Hendler, Henry Lieberman & Wolfgang Wahlster.

Murtha Baca edited the Getty Research Institute's valuable guide Introduction to Metadata: Pathways To Digital Information.

Among specialist and general journals we recommend the Journal of Internet Cataloging (JIC), D-LIB and the very earnest Information Technologies & Libraries (ITAL).

For a historical perspective see works such as Rudolf Blum's Kallimachos: The Alexandrian Library and the Origins of Bibliography (Madison: Uni of Wisconsin Press 1991), Hope Olson's The Power to Name: Locating the Limits of Subject Representation in Libraries (Dordrecht: Kluwer Academic 2002) .

next page (metadata on the web)