page looks at web directories - hyperlinked listings that
point to websites and individual pages or other resources.
It covers -
In essence, a web directory is a personal, institutional
or corporate list of websites or online documents. That
list is online; entries on the list are typically hyperlinked
so that users can readily access the sites/documents with
keying the URL.
That list may be personal and small-scale or may aspire
to cover much of the web. It may be publicly accessible
or may instead be restricted to particular users.
Some directories have a flat structure, with all entries
being given equal weight. Other directories feature categorisations
of varying complexity, with the largest commercial directories
for example comprising multi-level hierarchies organised
by subject and nation/region.
Directories predate the search
engine (eg specialist buyers guides, research guides
and 'colour pages')
and co-exist with it. For many early adopters of the web
they were the embodiment of e-commerce, with large commercial
directories being promoted as 'portals' or virtual shopping
malls and expecting to garner substantial revenue from
paid advertising and a share of merchant turnover.
Increasing sophistication of the online population (and
the growth of the web) means that over the past decade
the large-scale commercial directories have become less
significant as a major mechanism for finding information
online. Some demographics, indeed, have abandoned those
directories in favour of search engines such as Google
and specialist directories – in particular those
compiled by subject experts.
The revenue of major directories such as Yahoo! (and access
to capital during the dotcom boom) has, however, resulted
in a blurring of demarcations between the major directories,
search engines and messaging facilities. In particular
the major commercial directories acquired engines –
offering users multiple routes to information –
and sought to underpin their share of the desktop by offering
email or other services, consistent with Andrew Odlyzko's
aphorism that for many people connectivity rather than
content is king.
a brief history of internet directories
In considering the history of directories we can identify
several themes -
as commercial whole-of-web directories expanded from
basic categorised listings to offer search, webmail,
retail and other functionalities
acceptance of industry/subject-specific directories
move away from standalone non-commercial directories
to listings within richer academic, professional and
persistence of directories that are online but not-public,
with access on a subscription/sessional fee basis
first pages of the web had the characteristics of directories,
pointing to other resources. Given the precedent provided
by 'contents' and 'mall' pages on private networks such
as AOL - the maps for navigating the walled gardens -
it is unsurprising that commercial operators developed
directories as the web grew. Those operators faced challenges
in maximising initial and recurrent traffic to their directories.
One response was to increase the breadth and depth of
the individual directory, typically being marketed as
covering all of the web ... or all of the web worth visiting.
Growth of particular directories without commensurate
improvements in usability had three consequences. Savvy
users moved towards search engines, a move recognised
in predictions that normalisation of the online population
would ultimately see directory visitations decline unless
the major directories incorporated a search engine based
on automated spidering of the web.
Another response was churn by users to competing directories,
to smaller directories with a more specific focus or to
'localised' versions of the parent directory (eg that
emphasise information for a specific nation, state or
A third response was to increase the 'stickiness' of individual
directories by making them true portals for activity online.
That 'portalisation' involved expansion from basic categorised
listings through inclusion of news, webmail, personal
ads, 'infotainment' such as horoscopes and other functionalities
concerns about navigation, accuracy and authority were
reflected in market acceptance of industry and subject-specific
years have seen a move away from stand-alone non-commercial
directories to listings within richer academic, professional
and enthusiast sites.
Contrary to claims that the internet necessarily means
the death of 'paid publishing' (in reality the demise
of the publishing model based on direct payments for access
by end users, rather than advertisers) it is clear that
'closed' online directories have persisted and even flourished.
are online but not-public, with access on a subscription/sessional
a decade of the web one striking conclusion is that many
of the communitarian and commercial forecasts have simply
numbers and demographics
How many directories are available on the web?
The answer is that no one knows. That is for three reasons.
The first is disagreement about what constitutes a directory.
Is it restricted to major commercial portals such as Yahoo!
and multi-sector non-commercial resources such as DMOZ?
Does it encompass for-profit directories, often of significant
value, that are not publicly accessible? Does it also
include lists that are not much more than a publicly accessible
set of personal bookmarks?
A second reason is the volatility of the web, with pages
(and directories) appearing and disappearing.
A third reason is academic and industry fashion: there
are few commercial incentives for comprehensive mapping
of directories across the web and they are less exciting
than blogs, soft networks, P2P or other recent developments.
Claims that there are 12,500 (or 125,000) web directories
should thus be regarded with caution, particularly since
the few lists substantiating such claims are decidedly
uneven and unsystematic.
What of user reliance on directories? How many people
are using directories? Are user demographics changing?
There is similar uncertainty about the size and attributes
of the online directory population. Industry studies have
focussed on a handful of major 'whole of web' sites and
Extrapolation from those sites or from figures about smaller
sites is contentious; much information is anecdotal. Confusion
is exacerbated by many published statistics, which for
example conflate traffic to the directory proper with
traffic to an ancillary feature such as a webmail
Overall it appears likely that the major commercial sites
gain substantially more traffic than their more numerous
smaller commercial competitors, some of which appear to
have appropriated parts of their content. That pattern
reflects the greater visibility of the major portals -
attributable to their age (longer time in the public gaze;
perceptions that recent market entrants are copycats),
larger funds for marketing, better opportunities for alliance
building, size of their lists and greater resources for
In 2006 MySpace inched past Yahoo!, recording 38.7 billion
page views in the US.
Are users happy with directories?
Happiness has been taken for granted, given the proliferation
of large-scale commercial directories and their market
valuation during the dot-com
boom and aftermath. There have, however, been few
convincing and independent studies about effectiveness
and behaviour. Much
'research' about commercial directories has recycled media
releases and some claims appear to be inconsistent with
Commercial directories understandably treat specifics
of search -
people are looking for
they are finding that information
they are navigating the directory
quickly they are finding information or giving up
derivation and management
In contrast to search engines, which are often fully automated,
the creation and maintenance of large commercial directories
and smaller specialist directories has a substantial human
With large directories such as Yahoo! information about
individual sites/pages - the basis of entries in the listing
- is typically harvested by a web spider (software that
moves from one resource to another by following hyperlinks
and/or domain names) or submitted by site owners/agents.
Some directories charge a fee for early processing of
information submitted to them; the wait for inclusion
in a major directory may be up to six months. Some commercial
services specialise in submitting information on behalf
of site owners to multiple directories and search engines,
often claiming that their submission process will secure
listing ahead of time or gain a favourable ranking. Such
claims are problematical and have resulted in trade practices
litigation in some jurisdictions.
The information is then assigned to one or more categories,
supposedly of most relevance, with different categorisations
including subject hierarchies, geographical location,
alphabetical order and even age. Some directories rely
on automated assignment (eg based on keywords found by
the spider or in a submission form), with or without close
human oversight. The categorised information is then placed
in a HTML page or a database, held on one or more servers,
for access by users of the internet or an intranet.
Smaller directories, particularly those without a commercial
basis, are often compiled wholly by hand, with information
being identified and evaluated in a way that reflects
the directory owner's expertise and contacts.
Maintenance of directories involves periodic automated
or manual checking of links. That checking, in principle,
encompasses whether sites/pages are still online and whether
the categorisation is still pertinent. One problem, for
example, is non-renewal of domain
registrations by site owners, with the domain being renewed
by adult content or
other site operators seeking a free ride.
Directories pose a range of issues for users and site
we have discussed in the Internet Metrics & Statistics
guide elsewhere on this site, the web is large, volatile
(pages/sites appear and disappear) and continues to grow.
No search engine or directory covers all of the web; most
estimates suggest that the largest engines cover only
a small part of the web. None are truly comprehensive.
Specialist directories may, however, cover all major resources
relating to a particular subject ... or all the resources
that an author considers to be significant.
Questions of authority relate to the
expertise (or merely dedication) of both the directory
operator and user. In essence, can you - and should you
- trust what you see online. As with bibliographies, some
specialist directories are of outstanding value because
they have been compiled by subject experts who are equipped
to make accurate assessments and whose sources of information
are both deep and broad. Some of the larger commercial
directories - and poorly-maintained smaller competitors
- emphasise volume rather than quality. Categorisation
may use ineffective algorithms, rely on information submitted
by site owners (which may be inaccurate) or involve people
with an inadequate grasp of the directory/site's language.
Questions of bias arise because directories
are compiled manually or using algorithms that embody
particular values. Bias can be evident in inclusion of
an entry in an inappropriate part of a hierarchy or in
a placement that is weighted towards payment rather than
notions of 'merit'.
The human element in directory management is expensive
and many directories - particularly those that have 'screen
scraped' information from another directory - are not
closely maintained. The latency of information
in major portals and in smaller competitors varies. Some
directories (or parts of directories) are frequently updated.
Others are littered with dead links because sites have
gone offline or URLs have changed. As noted above, the
link may point to a 'live' site/page whose content has
changed, sometimes for the worse.
Questions of usability encompass basic
some directories fail to meet basic guidelines for access
by people with visual or motor problems (or who merely
have a low bandwidth or an expensive connection). They
also encompass navigation through directories that seek
to maximise revenue by crowding advertisements, listings
and other features such as a webmail gateway onto an entry
page and subsidiary pages. The past decade has accordingly
seen an oscillation between very cluttered and 'noisy'
pages - to the extent that some users found them unusable
and moved to search engines or competing directories -
and more austere layouts.
An associated issue is user understanding of the hierarchies
used by the directory owner. Few people have a background
in taxonomy; many find directory hierarchies to be non-intuitive.
Confusion is exacerbated by 'best guess' classification
by directory editors, resulting in uneven or contradictory
arrangement of items in listings. Some users continue
to mistake paid placement for entries whose ranking is
unaffected by payment.
Fraud is an issue because paid placement
schemes are susceptible to poor performance by directory
operators and to 'click
fraud' by competitors (typically clicking on a paid
link until the site owner's payment is exhausted and the
link moves to a lower position in the ranking).
It is also an issue because of a proliferation of businesses
(or published guides) that claim to be able to get top/high
rankings for sites listed on directories and in search
engines. Consumer organisations
have noted that inclusion in some commercial directories
is simply a question of money or being found by a spider.
It is not equivalent to a trustmark
and does not necessarily signify that a site is legitimate
or that the site owner's undertakings should be trusted.
A final issue is relevance, the nub of
much internet searching and questions about search
behaviour. Much categorisation in some major directories
often seems hit and miss. A range of studies have demonstrated
that few users are expert in online searching or committed
to extensive searching (and thus generally do not venture
more than a few clicks into a hierarchy).
That is a reason why paid placement - whether through
an online advertisement or through high ranking in a list
- has been attractive to directory operators and site
owners. A better match between user needs and available
information may be available if the user can be persistent
and grapple with navigation and other issues noted above.
Questions about relevance, sharp practice (or outright
fraud), latency and navigation are not restricted to online
directories. They are found in dealing with printed directories
and with CD-ROM directories, which inevitably start to
go out of date as soon as they printed and which may not
meet expectations regarding quality.
a community catalogue?
Large-scale directories are not exclusively commercial.
Under the banner of "The Republic of the Web"
the Open Directory Project (ODP) proclaimed
of fighting the explosive growth of the Internet, the
Open Directory provides the means for the Internet to
organize itself. As the Internet grows, so do the number
of net-citizens. These citizens can each organize a
small portion of the web and present it back to the
rest of the population, culling out the bad and useless
and keeping only the best content. ...
The Open Directory was founded in the spirit of the
Open Source movement, and is the only major directory
that is 100% free. There is not, nor will there ever
be, a cost to submit a site to the directory, and/or
to use the directory's data. The Open Directory data
is made available for free to anyone who agrees to comply
with our free use license.
The Open Directory is the most widely distributed data
base of Web content classified by humans. Its
editorial standards body of net-citizens provide the
collective brain behind resource discovery on the Web.
The Open Directory powers the core directory services
for the Web's largest and most popular search engines
and portals, including Netscape Search, AOL Search,
Google, Lycos, HotBot, DirectHit, and hundreds of others.
- often badged as DMOZ (Directory.Mozilla) - is a global
'open' directory compiled and maintained by volunteer
editors. It originated as Gnuhoo in 1998, based loosely
on Usenet categorisation, and was rebadged as Newhoo after
it was savaged as riding on the coat-tails of the GNU
free software project, with claims
that it was a commercial product based on volunteer labour.
Further rebadging as ODP occurred after it was acquired
for US$1 million by Netscape,
now part of Time Warner. ODP content was released under
an open content license.
In discussing Wikipedia,
John Tobler commented
at the time, the ODP permitted volunteers to sign up
as editors with individual, or sometimes joint, responsibility
over categories of knowledge within the Open Directory
... What we built is now used by others, most notably
Google. Google's hierarchical directory uses the ODP
as its starting point but modifies it to suit its percieved
sense of the needs of Google users. The lasting victory
of the original ODP concept is a tribute to the idea
of working openly and together for the benefit of human
knowledge. We did it with *human* editors, not just
algorithms and machines.
A meritocracy emerged over time. Great effort was made
to keep contributions to the ODP within certain boundaries.
The Netscape employees and others who were most responsible,
tried very hard to educate volunteer editors about such
arcanities as ontology and categorization theory. A
sort of peer-enforcement evolved that allowed the volunteer
community to self-police the system, eliminating the
bogus "contributions" of self-serving, and
often profit-making, induhviduals who sought to corrupt
the directory for their own sometimes malicious purposes.
Sigh. Inevitably, the meritocracy became competitive
in nature and certain people who managed to insert themselves
fairly close to the root of the tree got into playing
power games. Some lorded it over other editors who,
perhaps because they held full time jobs, were not able
to devote their entire lives to the project. People,
and I must reveal that this included me, got dissed
for not making enough edits within a certain arbitrary
time period. And then some of this now middle-layer
power group got the power to remove editors who did
not meet their quantity standards.
Newhoo/ODP model inspired a number of competitors, including
Go, Zeal and MusicMoz.
the web directory business
The economics of the commercial directory business encompass
revenue, development/maintenance, marketing and facilitation.
The directory sector comprises a large number of enterprises
- unsurprising given perceptions of low entry costs and
potential revenue - but most traffic and most revenue
appears to involve a handful of major operators. It is
thus common to see metrics studies suggesting that the
top four or five directories in a nation attract around
90% of all traffic to commercial directories. (Traffic
to non-commercial directories is inadequately tracked
by the large metrics companies, primarily because they
see little market interest in that data.)
As with search engines, the sector includes directory
operators and businesses that specialise in submission
of information to directories or advising site owners
on maximising their chances for favourable ranking in
the major directories.
Revenue comes from a number of sources, which include
(or search-specific fees) to online directories that
are not publicly available
by a site to appear on a publicly available directory
or for expedited processing of a submission
placement', whether through provision of a link adjacent
to the top of a list or by buying an appearance within
of advertising, in particular for inclusion of banner
ads on a directory's front page and/or on the main pages
for major categories and fees for click-through from
banner or other ads (from a fraction of a cent to dollars
in revenue from sales through online stores
of 'deidentified' demographic data about traffic to
that is attributable to advertising or other aspects
of ancillary services such as web mail and search engines.