| overview 
 directories
 
 engines
 
 dark web
 
 images
 
 shopping
 
 people
 
 behaviour
 
 wetware
 
 law
 
 cases
 
 anxieties
 
 landmarks
 
 
 
 
 
 
 
 
 
 
 
  related
 Guide:
 
 Metrics
 & Statistics
 
 
 
  related
 Profiles
 & Notes:
 
 Domain
 Name
 System
 
 Portfolios
 
 Myths:
 Everything
 is online?
 
 Most
 Popular
 Search
 Terms
 
 Search
 Engine
 Optimization
 
 Metadata
 
 Colour
 Pages
 
 Browsers
 
 
 
 
 
 
 |  search behaviour 
 This 
                        page considers search strategies and online search behaviour, 
                        including questions about how users navigate and their 
                        assessment of 'good enough' information retrieval.
 
 It covers -
 Questions 
                        about search behavior metrics are explored in the separate 
                        guide on Metrics & Statistics. 
                        Some myths about what is online (and whether that content 
                        is accessible) are explored 
                        in a supplementary profile.
 
  introduction 
 Online search behaviour is of interest for a range of 
                        reasons.
 
 They include what might be characterised as the politics 
                        of information, encompassing questions about -
 
                        money 
                          - being found on the net has a commercial value (whether 
                          direct or indirect) and much electronic commerce is 
                          built on visibility, competing for audiences in the 
                          'attention economy' or providing tools (such as advertising) 
                          that channel traffic to particular locationscultural 
                          values - domain spaces 
                          (eg dot au versus dot com) and restrictions on domain 
                          names (eg trademarks, profanity) embody particular expectations; 
                          the categorisation of directories and blind spots in 
                          search engines are weighted to commerce or against endorsement 
                          of value by a grand cataloguer what 
                          is available online and how easily it can be found, 
                          with for example claims that many search tools are biased 
                          towards the 'anglosphere' and thus discriminate against 
                          non-English speakers or against 'alternative' lifestyles 
                          accessibility 
                          - with perceived biases against searching by users with 
                          visual, motor or cognitive impedimentsfailure 
                          - making sure that users do not find what they are seeking 
                          (eg adult content filters 
                          or government blocking of dissident and news sites) 
                          or muddying the search (eg action by some record/film 
                          companies to seed P2P exchanges with mislabelled and 
                          corrupted files). More 
                        broadly, how people conceptualise 
                        cyberspace and navigate it offer insights of value for 
                        the cognitive sciences, whether you are an adherent of 
                        Chomsky's views on language or Schneiderman on computer-human 
                        interfaces.
 Online resource identification is also of interest because 
                        of search patterns that some specialists describe as "inept" 
                        and others as "good enough". It is clear that 
                        many users - including those who have been online for 
                        several years - misread navigational clues and conduct 
                        rather shallow searches. Persistence and fine-tuning of 
                        queries in search engines would often produce results 
                        that better meet their stated needs.
 
 A range of academic and industry studies have thus shown 
                        that some people still expect to intuit a search by deconstructing 
                        domain names and that when using search engines the majority 
                        of people/searches (eg a claimed 85% of around a billion 
                        Altavista queries in 1998) do not progress beyond the 
                        first screen of search results.
 
 Web Search: Public Searching of the Web (London: 
                        Springer 2004) by Amanda Spink & Bernard Jansen similarly 
                        reports little evolution over time in search behaviour. 
                        Users typically conduct a handful of simple short searches 
                        with one to two words per search (two searches per session) 
                        and examine only the first page of results.
 
 Much searching through hyperlinks - pointers from one 
                        site to another (or merely from one page to another) is 
                        serendipitous ... going for a random walk. That is not 
                        necessarily a bad thing, as anyone who has contrasted 
                        reliance on a catalogue with grazing the stacks in a library 
                        can attest.
 
 
  how we know 
 Knowledge about search objectives, search strategies and 
                        impediments to successful online navigation come from 
                        a range of sources.
 
 One source - still of major value - is observation of 
                        how people interact with information 
                        devices and questioning about what they were trying 
                        to do, what they achieved and how they felt. That observation 
                        might simply involve a human observer watching a user 
                        or employment of technology that monitors keystrokes or 
                        maps eye movement (one example is research under Poynter 
                        auspices, criticised by Jakob Nielsen here).
 
 Another source is site-specific examination of server 
                        logs, identifying the points of entry (home page or subsidiary 
                        pages?), how users moved through the site (what path did 
                        they follow, how long did they stay) and where/why they 
                        departed (eg because a link was broken or page loading 
                        was too slow). A corollary is examination of logs provided 
                        by site-specific search engines, illustrating what users 
                        were seeking (or merely appeared to be seeking) and what 
                        response they received.
 
 At a broader level insights are provided by logs maintained 
                        by 'whole of web' directory and search engines. Much of 
                        that information is closely guarded as a commercial asset 
                        but some data is commoditised by the operators or third 
                        parties or released as promo, for example the 'hottest 
                        search terms' of the year/quarter.
 
 It complements information collected by metrics companies 
                        through manual questionnaires or logging traffic going 
                        through major gateways (eg selected ISPs) or through selected 
                        personal computers (the user agrees to install software 
                        that reports to the metrics aggregator about navigation 
                        by those lab rats). As we have noted in the more detailed 
                        discussion of the metrics 
                        (and online polling) 
                        industries, extrapolation from those figures is contentious 
                        because of disagreements about the accuracy of data collection 
                        and whether the sample is truly representative of national/global 
                        online populations.
 
 
  finding resources online 
 Users find online resources - including web sites, music 
                        files, embedded graphics, PDF or Excel documents - in 
                        a range of ways that include -
 
                        offline 
                          pointersprevious 
                          exposure to the resourcefollowing 
                          hyperlinks from another resourcereference 
                          from an email or chat messagetargeted 
                          or unstructured use of a search enginegrazing 
                          a large scale or specialist directoryfollowing 
                          a link from an online advertisementintuiting 
                          or deconstructing an address Offline 
                        pointers
 Normalisation of the internet - and opportunities for 
                        exposure offline - has seen increasing understanding of 
                        domain names in the general community (most Australians, 
                        for example, have been exposed to and appear to have some 
                        understanding of an URL) and widespread adoption of coffee 
                        cups, caps, billboards, posters, restaurant menus, invoices, 
                        business cards, newspaper/magazine advertisements, movie 
                        trailers and vehicle signage for pointing people to particular 
                        locations in cyberspace.
 
 There has been surprisingly little research on the ubiquity 
                        and effectiveness of such offline signalling. However 
                        for particular demographics it appears to be more effective 
                        (in terms of basic recognition and cost) than much online 
                        advertising.
 
 Previous exposure to the resource
 
 Quixotically, the best way for many users to find a resource 
                        is to have been there previously.
 
 Some users enter URLs into a bookmarks tool on a systematic 
                        or random basis (we confess that much of our bookmarking 
                        is unstructured, with this site being used as a surrogate 
                        for a truly coherent set of marks).
 
 Other users do not bookmark, instead relying on their 
                        browser to provide a prompt when they start to enter a 
                        similar address into the location bar or look at the 'history' 
                        of past surfing.
 
 Some browsers and search engines offer 'predictive searching', 
                        suggesting addresses on the basis of information gained 
                        from past searches or past navigation. Prediction is contentious, 
                        given the difficulty of matching past navigation with 
                        other resources (most prediction algorithms are simplistic 
                        and are inhibited by poor information) and claims that 
                        some services are biased towards particular addresses 
                        (eg a site owner has paid to optimise the likelihood that 
                        the user will find that site when conducting a search).
 
 Following hyperlinks from another resource
 
 The web is built around hyperlinks. As this site demonstrates, 
                        one mechanism for effective identification of online resources 
                        is to follow menus and other links from one page to another 
                        within a specific site or to move from that site to external 
                        resources using such links.
 
 Such linkage can be particularly valuable if the referring 
                        site is based on deep understanding of a subject, has 
                        a better awareness of what is available online or is updated 
                        more frequently than most search engines, which as pointed 
                        out earlier in this profile do not cover all of the web 
                        and may have latency periods of around six months.
 
 Reference from an email or chat message
 
 Many people use addresses to which they are pointed through 
                        email or chat messages, either copying the address and 
                        then pasting it into the address bar on their browser 
                        or clicking on a hyperlink within the message. That method 
                        embodies what is arguably the best and worst of searching.
 
 At its best the user is relying on an endorsement of quality 
                        of interest by a colleague, friend or contact with some 
                        authority.
 
 At its worst the link appears in spam 
                        ... sufficient people make the mistake of responding to 
                        unsolicited bulk messages (clicking on the link or naively 
                        confirming their address through an unsubscribe action) 
                        to make spamming commercially worthwhile and thereby pervasive.
 
 Targeted or unstructured use of a search engine
 
 There is disagreement about non-specialist user reliance 
                        on 'whole of web' and specialist search engines. Some 
                        authorities claim that over 70% of resources are found 
                        using search engines (and that questions of keywords, 
                        for example, are of commercial significance). Others claim 
                        that engines are far less important, with users identifying 
                        sites and individual files through a range of means.
 
 The truth probably lies somewhere in between, given different 
                        experience, objectives and patience of users.
 
 Some clearly start and stop with a single search. Others, 
                        as noted in preceding pages of this profile, may systematically 
                        work through selected entries on a succession of search 
                        screens, conduct multiple searches (sometimes using different 
                        search engines), or use initial results from a search 
                        engine as a point of departure for more extended navigation 
                        through hyperlinks from one site to another.
 
 It is clear that many users - particularly those without 
                        a detailed search strategy or tight objectives - conduct 
                        shallow and unstructured searches, typically entering 
                        a single search term, avoiding 'advanced search' features 
                        (eg boolean text searching and date or other delimiters) 
                        and not progressing beyond the first two screens of search 
                        results. Accurately Interpreting Clickthrough Data 
                        as Implicit Feedback (PDF) 
                        by Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke 
                        & Geri Gay for example comments that 42% of users 
                        clicked the first item in the listing on a search engine 
                        results page (SERP), with 8% of users selecting the second 
                        item.
 
 Grazing a large or specialist directory
 
 As preceding pages have noted, prior to emergence of large-scale 
                        search engines most people relied on directories for identifying 
                        internet resources. Those directories provide categorised 
                        listings of selected web sites or other resources.
 
 Their value for searches is affected by the latency of 
                        the information (most are manually compiled, with delays 
                        in the entry of new information and deletion of superseded 
                        information), the basis of selection and difficulties 
                        encountered by users in navigating through the categorisation.
 
 As with search engines, research suggests that non-specialist 
                        users of the 'whole of web' directories such as Yahoo! 
                        tend not to systematically graze a hierarchy of categories. 
                        Much searching accordingly ceases on the first or second 
                        page. The big directories have accordingly emphasised 
                        'regionalisation', with discrete versions for different 
                        nations/regions
 
 In practice such directories are now significantly less 
                        important for many users, who instead rely on other search 
                        mechanisms identified on this page and on specialist directories 
                        that are often small scale and subject specific (eg a 
                        directory of Mahler manuscripts or papers on the blue-ringed 
                        octopus). Whole of web directories continue to garner 
                        much traffic because they have morphed into broader portals 
                        (eg including webmail access and news) and because they 
                        serve as the default entry pages from many private and 
                        commercial machines, eg in cybercafes.
 
 Following a link from an online advertisement
 
 The online advertising 
                        industry is based on the notion that users
 
                        will 
                          gain some awareness of a product/service through a banner/pop-up 
                          advertisement encountered while surfing or by 
                          clicking on such an ad will pass to a discrete site, 
                          an animation or other advertising content. Sufficient 
                        people report encountering online advertisements or click 
                        through to make ads viable. There is disagreement about 
                        their value as a search mechanism, reflected in changing 
                        user responses to online ads (eg adoption by some demographics 
                        of 'ad washer' or anti-popup software) and the evolution 
                        of 'paid placement' - banners or other strategically positioned 
                        links that may not appear to be ads but if clicked take 
                        the user to a location chosen by the advertiser. 
 Intuiting or deconstructing an address
 
 Many users attempt to intuit the address of an online 
                        resource by assuming that there is a close match between 
                        a brand/corporate name and the online address, for example 
                        simply adding 'www' and 'com' (or the appropriate ccTLD/gTLD) 
                        on either side of an offline name. The shape of the domain 
                        name system and factors such as trademarks mean that 
                        such a match is not always correct.
 
 Others assume that there will be an appropriate match 
                        between the domain name and the type of service/commodity 
                        of interest (or even the person of interest). Users thus 
                        sometimes resort to a generic domain name (such as books.com, 
                        cars.com or flowers.com), which may lead them to an appropriate 
                        site or merely to a site that features advertising for 
                        an unrelated service/product. Such behaviour was common 
                        during the early years of the web, with some users assuming 
                        that domain names embodied a subject directory. More recently 
                        it has been the basis for development of commercial 'keyword' 
                        portfolios, large collections of sites with subject 
                        names and misspelled brand/subject names that feature 
                        advertising.
 
 Eszter Hargittai sagely remarks that
  
                        the 
                          most straightforward way of getting to a page is by 
                          having it as the default page on one's browser. Although 
                          the user may change the original default page, it is 
                          often a page specified by the browser's manufacturer, 
                          the user's internet service provider, or the institution 
                          where the machine is operated  efficiency in searching 
 Online resource identification is a competition for the 
                        user's attention, complicated by -
 
                        the 
                          large number (and volatility) 
                          of sites, documents and other resources the 
                          impatience or lack of expertise of many usersthe 
                          willingness of users to accept search results that are 
                          'good enough' rather than true 'best fit' It 
                        is clear that many users - 
                        use 
                          simple searches (eg one or two search terms) and avoid 
                          'advanced' search features such as date delimitersdo 
                          not systematically work through a large number of search 
                          results (eg move beyond the first two screens of results 
                          from a search engine)have 
                          difficulty distinguishing between sponsored and unsponsored 
                          linksrely 
                          on a handful of sites when conducting research online 
                          (or citing research). The 
                        1999 Analysis of a very large web search engine query 
                        log study 
                        by Craig Silverstein, Hannes Marais, Monica Henzinger 
                        and Michael Moricz for example identified around one billion 
                        queries on Altavista, with users sticking with the first 
                        screen of results in 85% of searches and 77% of sessions 
                        involving only contained one query. 
 Research such as Andy Cockburn & Bruce McKenzie's 
                        paper (PDF) 
                        on What Do Web Users Do? An Empirical Analysis of Web 
                        Use, published in the International Journal of 
                        Human-Computer Studies, and Google's PageRank 
                        and Beyond: The Science of Search Engine Rankings 
                        (Princeton: Princeton Uni Press 2006) by Amy Langville 
                        & Carl Meyer indicates that -
 
                         site 
                          revisitation is common, with up to 81% of pages being 
                          revisited by a particular user most 
                          visits are often of only a few seconds' duration although 
                          some users manage revisitation through large lists of 
                          bookmarks those lists are rarely culled and are thus 
                          often out of date  the death of the specialist? 
 A recurrent meme since the 1960s has been the 'death of 
                        the library' (along with other myths 
                        such as the death of the book and death of the author). 
                        We have also seen hype about the net as a universal library, 
                        a digital repository accessible by all and containing 
                        all the fruits of creativity (along with episodes of Neighbours).
 
 Can we then talk of the death of the librarian? The answer 
                        is clearly no - librarians and other information specialists 
                        are not going to disappear. They will not be replaced 
                        by contemporary search engines or new search technology 
                        based on artificial intelligence.
 
 That is because many specialists have -
 
                        expertise 
                          in searching (particularly non-public legal, scientific, 
                          financial or other technical databases), underpinned 
                          by a professional ethos that emphasises appropriateness, 
                          comprehensiveness and accuracyinstitutional 
                          access to firewalled content (inc large-scale bibliographic, 
                          textual and image collections databases that involve 
                          subscription, sessional or item-based payment) It 
                        is also because much content is likely to remain offline, 
                        with for example the cost of digitisation 
                        (and disagreement about rights) 
                        inhibiting retrospective capture of many 'historic' texts, 
                        still/moving images and sound recordings.
 Notions of 'power searching' and 'digital literacy' are 
                        explored in the following page of this profile.
 
 
  studies 
 As points of entry for questions about information seeking 
                        we recommend Elaine Svenonius' The Intellectual Foundation 
                        of Information Organisation (Cambridge: MIT Press 
                        2000), Christine Borgman's From Gutenberg to the Global 
                        Information Infrastructure: Access To Information in the 
                        Networked World (Cambridge: MIT Press 2000), Human 
                        Interaction with Complex Systems: Conceptual Principles 
                        & Design Practice (Hague: Kluwer 1996) by Celestine 
                        Ntuen & Eui Park, Web Search: Multidisciplinary 
                        Perspectives (Berlin: Springer 2008) edited by Amanda 
                        Spink & Michael Zimmer and Preferred Placement: 
                        Knowledge Politics on the Web (Maastricht: Jan van 
                        Eyck Akademie Editions 2000) edited by Richard Rogers.
 
 They are complemented by the Berkshire Encyclopedia 
                        of Human-Computer Interaction (Great Barrington: 
                        Berkshire 2004) edited by William Bainbridge, Donald Case's 
                        Looking for Information: A Survey of Research on Information 
                        Seeking, Needs, and Behavior (New York: Academic 
                        Press 2002) and the exhaustive ACM Human-Computer Interaction 
                        Bibliography (HCIB).
 
 Lara Catledge & James Pitkow's 1995  
                        paper Characterizing browsing strategies in the 
                        World Wide Web,  Richard Belew's Finding Out About: 
                        Search Engine Technology From A Cognitive Perspective 
                        (Cambridge: Cambridge Uni Press 2001), Linda Tauscher 
                        & Saul Greenberg's 1997 paper 
                        on Revisitation Patterns in World Wide Web Navigation, 
                        Andrew Treloar's June 2000 paper 
                        on Spinning the Right Path: Investigating the Effectiveness 
                        & Impact of Web Navigation Systems, Andy Cockburn 
                        & Bruce McKenzie's  What Do Web Users Do? An Empirical 
                        Analysis of Web Use (PDF), 
                        Lucas Introna & Helen Nissenbaum's 2000 (PDF) 
                        Shaping the Web: Why the Politics of Search Engines 
                        Matters and Erik Selberg's 1999 dissertation  
                        Towards Comprehensive Web Search (PDF) 
                        explore particular issues.
 
 Research by Chun Wei Choo, Brian Detlor & Don Turnbull 
                        may also be of interest. Apart from their Web Work: 
                        Information Seeking & Knowledge Work on the World 
                        Wide Web (New York: Kluwer 2000) we commend the paper 
                        on Information Seeking on the Web, the paper 
                        on Information Seeking on the Web - An Integrated Model 
                        of Browsing & Searching  and their First Monday 
                        article 
                        on Information Seeking on the Web - An Integrated Model 
                        of Browsing & Searching.
 
 Two starting points for understanding issues and processes 
                        are the 1999 paper 
                        on  Results & Challenges in Web Search Evaluation 
                        by Hawking, Craswell, Thistlewaite & Harman, and the 
                        1999 study 
                        by Lawrence & Giles on  Accessibility of Information 
                        on the Web.
 
 Annabel Pollock & Andrew Hockley's 1997  What's Wrong with 
                        Internet Searching paper 
                        and Modern Information Retrieval (London: 
                        Longman 1999) by Ricardo Baeza-Yates & Berthier Ribero-Neto 
                        are valuable in understanding retrieval principles and 
                        effectiveness studies. Bernard Jansen's 2000 paper 
                        A Review of Web Searching Studies is a useful literature 
                        review.
 
 
 
 
 
 
 
  next page  (wetware) 
 
 
 | 
                        
                         |