profile looks at metadata, the information that identifies
web pages and thus forms the basis of some search engines,
directories and content management systems.
The profile covers -
overview - an introduction to metada
- questions about use of metadata on the web
Core, AGLS, ANZMETA, IEEE Learning Object Metadata
(LOM) and other metadata sets
Description Framework (RDF)
for Internet Content Selection (PICS), SOIF and
MCF - the problematical proposal for identifying offensive
material and other online content, and other schemas
Uniform Resource Locator (PURL) Scheme
- Uniform Resource Naming (URN) and other numbering
schemes for identifying documents on the web independent
of their location
for thesauri that
would assist resource identification over the web as
a whole or within particular sectors, such as the visual
arts or biotechnology
- highlights in the history of metadata and information
It complemented by a separate profile on directories,
search engines and search
what is metadata?
Metadata is literally information about information.
It may be very restricted in scope, such as a simple identification
number. Or it may be descriptive, allowing the creation
of indexes, lists and other tools that can be used for
identification and for evaluation of information.
If you've used a library catalogue you've used such a
tool. The catalogue is based on metadata - subject, author,
publisher etc - about the books and other documents held
by that institution. Metadata predates Gutenberg
and over the past hundred years has been used in a variety
of applications, such as:
directory and encyclopaedia publishing - used to organize
information on topics and terms
- used to identify and arrange books and journals (eg
each item in a collection was tagged with a unique number
and identified in a catalogue that enabled users to
locate material by title, author, classification number
or a subject heading providing intellectual and physical
access to a collection)
publishing - underpinning user searching of transaction,
bibliographic or other databases
functions - used in "back-of-the-book" indexes.
broadness of characterisations such as "information
about information" or "data about data"
has led some analysts to suggest that online metadata
has the following characteristics.
in the electronic environment
describes the attributes of an electronic resource
characterizes its relationships to other resources
supports the discovery, management and efficient use
of that resource
one of the key features of the web and, as we suggest
in the next page of this profile, is the basis of the
semantic web - the net generation of the net. It is found
within individual web pages, at varying levels of detail
and using varying standards, highlighted below. And it
is found in the search engines, directories and other
tools for finding sites and individual pages.
This site, indeed, can be viewed as metadata about information
on the web and offline, since it identifies and evaluates
several thousand sites, web documents and print publications.
In the Metrics & Statistics guide
on this site we highlight some of the studies about the
growth of the web.
There are now many millions of sites and hundreds of millions
of pages. Many of those documents change periodically
(eg one study
suggests that the 'half life' of a page is less than two
years, roughly half the time it takes for most books to
go out of print and one reason why many big sites - such
as this one - have links that have "rotted").
Neither TLDs nor domain
names don't reveal all the treasures (or lack of them)
within a site. The size and volatility of the web means
that it is beyond anyone to list the contents of all sites/pages
and to provide an evaluation.
classification and its consequences
The importance of identification and evaluation -
so that your customers can search in a particular part
of the haystack rather than attempting to scrutinise every
piece of straw - is discussed in Elaine Svenonius' The
Intellectual Foundation of Information Organisation
(Cambridge: MIT Press 2000).
She offers a demanding but comprehensive introduction
to the theory underlying attempts to identify, categorise
and retrieve the resources in the 'global digital library',
ie information accessed via the web.
There is a more accessible overview of identification/evaluation
issues and that library in Christine Borgman's From
Gutenberg to the Global Information Infrastructure: Access
To Information in the Networked World (Cambridge:
MIT Press 2000) - strongly recommended - and Web Search:
Public Searching of the Web (London: Springer 2004)
by Amanda Spink & Bernard Jansen. Both are more persuasive
than pop sci tracts such as David Weingerger - Everything
Is Miscellaneous: The Power of the New Digital Disorder
(New York: Holt 2007) which neglect the extent to which
much online content (such as photos on Flickr) are tagged
or otherwise categorised.
Richard Belew's Finding Out About: Search Engine Technology
From A Cognitive Perspective (Cambridge: Cambridge
Uni Press 2001) is a more theoretical study of search
processes, complemented by Donald Case's broader Looking
for Information: A Survey of Research on Information Seeking,
Needs, and Behavior (New York: Academic Press 2002).
It can be supplemented by reference to Managing Cataloging
and the Organization of Information: Philosophies, Practices
and Challenges at the Onset of the 21st Century (New
York: Haworth 2000) edited by Ruth Carter and How Reference
Works: Explanatory Models for Indexicals, Descriptions,
and Opacity (Albany: State Uni of New York Press 1993)
by Lawrence Roberts.
The Advanced Internet Searcher's Handbook (London:
Library Association 2002) by Phil Bradley and The Invisible
Web (2001) by Chris Sherman & Gary Price provide
guidance about online search techniques and resources.
Dieter Fensel's Ontologies: A Silver Bullet for Knowledge
Management and Electronic Commerce, (New York: Springer
2001) is provocative; for us there is a more convincing
exposition in Spinning the Semantic Web: Bringing the
World Wide Web to Its Full Potential (Cambridge: MIT
Press 2002) edited by Fensel, James Hendler, Henry Lieberman
& Wolfgang Wahlster.
Murtha Baca edited the Getty Research Institute's valuable
Introduction to Metadata: Pathways To Digital Information.
Among specialist and general journals we recommend the
Journal of Internet Cataloging (JIC), D-LIB
and the very earnest Information Technologies &
For a historical perspective see works such as Rudolf
Blum's Kallimachos: The Alexandrian Library and the
Origins of Bibliography (Madison: Uni of Wisconsin
Press 1991), Hope Olson's The Power to Name: Locating
the Limits of Subject Representation in Libraries
(Dordrecht: Kluwer Academic 2002) .
(metadata on the web)