| overview 
 domains
 
 content
 
 population
 
 traffic
 
 navigation
 
 demographics
 
 methods
 
 teledensity
 
 ranks
 
 divides
 
 jargon
 
 sources
 
 lies & spin
 
 business
 
 visualisation
 
 analytics
 
 pageviews
 
 
 
 
 
 
 |  lies, spin and web stats 
 This 
                        page considers internet statistics and their abuse.
 
 It covers -
  introduction 
 The internet is a young technology, with unfamiliar terms, 
                        uncertain measures and markets where the desire for information 
                        often outweighs an ability to critically evaluate data.
 
 It is also a technology where some people place an almost 
                        religious faith in numbers. It is one where many people 
                        have come to expect that figures will be both large and 
                        inconsistent with data from life offline, because the 
                        internet is supposedly 'special', eg during the dot com 
                        bubble -
 
                        pundits 
                          forecast that traffic would double 
                          every hundred days during the coming decade gurus 
                          claimed that dot-com alchemy 
                          would allow enterprises to make substantial profits 
                          even though costs stubbornly remained greater than sales 
                          revenue. It 
                        is thus unsurprising that some observers have concerns 
                        regarding the abuse of internet statistics (in particularly 
                        demographic projections) and conflicting reports about 
                        particular markets, where figures from different vendors 
                        frequently diverge by over a thousand per cent. As with 
                        past media revolutions such as radio and television many 
                        audience measurement 
                        mechanisms are fuzzy and there is a temptation to lie 
                        or simply echo dubious claims, which if repeated enough 
                        are embodied in conventional wisdom.
 Instances of spin and outright 
                        lies reflect factors such as -
 
                        the 
                          audience's unfamiliarity with statistical concepts and 
                          discomfort with statistical analysis, characterised 
                          by some as an aspect of digital 
                          literacythe 
                          absence of authoritative benchmarksuncritical 
                          propagation by government agencies (including Australia's 
                          NOIE and DCITA) and by other gatekeepers of problematical 
                          datathe 
                          nature of much mass and specialist media, with journalists 
                          and publishers having an interest in 'exciting' news 
                          or striking figures (and on occasion being captured 
                          by their sources)hype 
                          by vendors of products and services and by promoters 
                          such as brokers, venture 
                          capital and private equity 
                          fund managers triumphalism, 
                          with some observers failing to recognise similarities 
                          with past economic and technological developments and 
                          thus not scrutinising some of the more outrageous claimscheerleading 
                          by analysts and advocacy organisations, with bodies 
                          such as ISOC feeling a need to defend 'their' internetthe 
                          absence, particularly prior to the 2000 Crash, of penalties 
                          for naivety, characterised by one Canberra official 
                          as "no one ever got fired for believing Gartner 
                          but people get monstered for pointing out that the king 
                          is wearing digital clothes"subversion 
                          through click fraud Pages 
                        throughout this site highlight conflicting claims regarding 
                        infrastructure, online publishing (eg the number of sites), 
                        commercial activity (adult 
                        industry advocates and critics both have an incentive 
                        to exaggerate the size of the online erotica business) 
                        and acharacteristics of online populations. 
 A simple example is the number of "internet users" 
                        in Australia as of early 2007. eMarketer estimates that 
                        the number of users was 13.1 million. The Nielsen//NetRatings 
                        figure was 11.5 million; the Australian Bureau of Statistics 
                        estimate of 10.6 million users was some 2.5 million less 
                        than eMarketer.
 
 As with traditional teledensity 
                        counts a polemicist can pick a figure to illustrate a 
                        particular argument - Australia's ahead of the pack. lagging 
                        behind peers, digital divides 
                        are widening or narrowing, market opportunities beckon 
                        ...
 
 
  common fudges 
 What are some common fudges? They include -
 
                        confusion 
                          in termsextrapolation 
                          from an unrepresentative samplemistaking 
                          correlation for causationassuming 
                          that growth rates will remain constantproviding 
                          a gross rather than a per capita figureassuming 
                          that the availability of connectivity (or access to 
                          hardware and software) equals ongoing use or a specific 
                          type of use Examples 
                        are  
                        the 
                          Australian government's announcement that all agencies 
                          are "online" (a metric that does not differentiate 
                          between whether a single official has a dialup connection 
                          or every officer has broadband, whether "online" 
                          equals a single web page or a rich resource for citizens, 
                          or the quality of what is online)acknowledgement 
                          that approximately 50% of people who download Firefox 
                          actually try it and that 25% actively use it on an ongoing 
                          basisclaims 
                          that one in 10 players who regularly play online games 
                          start a physical relationship with a fellow gamer  Such 
                        abuses are evident elsewhere. One London tabloid for example 
                        shrilled in 2006 that "Britain's plumbers, electricians 
                        and locksmiths drink the equivalent of 1.3 baths of tea" 
                        each year, a figure that is somewhat less exciting when 
                        you do the maths and recognise that annual consumption 
                        of 120 litres of Darjeeling equals roughly a soft drink 
                        can per day. Announcement in 2007 of a £300 million increase 
                        in UK spending on childcare unimpressed people with a 
                        calculator who could do the math and recognised that meant 
                        only £1.15 per child per week.
 Many of the web traffic statistics accepted by advertisers 
                        and scholars are artefacts from a 'faith based science', 
                        as the user is reliant on claims that can not be readily 
                        tested and compared. Those claims might be made by a site 
                        owner (whose figures are not independently audited) and 
                        third party web tracking 
                        services (which may use different mechanisms or merely 
                        different definitions to those of their competitors and 
                        thus not enable ready benchmarking).
 
 As noted earlier in this guide, site operators have claimed 
                        that their figures are accurate because they see the number 
                        of hits on their pages, rather than inferring hits from 
                        toolbars used by an unrepresentative demographic or data 
                        provided by individual ISPs. That has provoked questions 
                        about whether advertisers can trust an individual site 
                        operator not to 'cook' its figures and whether it is possible 
                        for advertisers to choose between competing sites on the 
                        basis of claimed figures.
 
 In the US Forbes famously claimed some 15 million visitors 
                        per month to its sites, more than double the 7.3 million 
                        that metrics specialist comScore reported for the same 
                        sites. Confidence in claims and counterclaims is eroded 
                        by 'restatements' from specialists, with Nielsen/NetRatings 
                        for example in 2006 restating its reported figures regarding 
                        Entrepreneur.com from 7.6 million monthly visits to 2 
                        million visits. That is a substantial change if you were 
                        paying for ad exposure or investing in the site operator 
                        on the basis of claimed traffic. (The discussion elsewhere 
                        on this site regarding audience measurement notes that 
                        similar restatements have occurred in relation to radio, 
                        television and newspaper readership figures: net data 
                        restatements are merely the most egregious).
 
 Confidence is also eroded by potential partiality in much 
                        sponsored research. Sponsorship of some studies has led 
                        some savvy observers to suggest that the data should be 
                        labelled as 'vendor research' or simply as promo.
 
 Conflicts in claims about what people are searching for 
                        are highlighted here.
 
 
  glossy factoids 
 Why is problematical research influential. One reason 
                        is that users want to believe. Another reason is that 
                        much output from commercial research firms is wrapped 
                        in the trappings of authority: priced out of the reach 
                        of many scholars or other independent analysts, replete 
                        with jargon and buzzwords, hyped as commissioned or used 
                        by leading private and public sector organisations, embodying 
                        a range of charts and tables, drawing on proprietary data 
                        analysis mechanisms and surveys.
 
 Influence can be self-reinforcing: users refer to studies 
                        and to specialists because they know their peers use them. 
                        The more a report is cited the more likely it will be 
                        referred to and the greater the authority for its author 
                        to gain support for further research (alas, often research 
                        that just massages the initial figures and that may not 
                        be relevant in another location).
 
 Many journalists and (more importantly) most end-users 
                        seem unwilling or unable to articulate why they believe 
                        such studies and the extent to which they believe. That 
                        is perhaps because many of the statistics are pulled from 
                        media releases (free) rather than the full reports (expensive).
 
 A more significant reason is that the basis of the data 
                        and compliance with any standards are usually opaque, 
                        even if an observer has access to the full text of the 
                        particular report and has had an opportunity to scrutinise 
                        past reports from the vendor in inrder to identify 'restatements' 
                        and anomalies.
 
 
  primers 
 Darrell Huff's How To Lie With Statistics (New 
                        York: Norton 1993) has not been substantially updated 
                        since its first appearance in the early 1950s but is of 
                        excellent value. John Paulos' A Mathematician Reads 
                        The Newspaper (New York: Anchor 1996) and The 
                        Tiger That Isn't: Seeing Through a World of Numbers 
                        (London: Profile 2007) by Michael Blastland & Andrew 
                        Dilnot are other lighthearted looks at the use and abuse 
                        of mathematics in the mass and specialist media, complemented 
                        by Gene Epstein's more splenetic Econospinning: How 
                        to Read Between the Lines When the Media Manipulate the 
                        Numbers (New York: Wiley 2006).
 
 Joel Best's Damned Lies & Statistics: Untangling 
                        Numbers From The Media, Politicians & Activists 
                        (Berkeley: Uni of California Press 2001) and Jane Miller's 
                        The Chicago Guide to Writing about Numbers: The Effective 
                        Presentation of Quantitative Information (Chicago: 
                        Uni of Chicago Press 2004) are harder going but perhaps 
                        more valuable.
 
 The Design guide on this site points 
                        to recommended studies about the interpretation and creation 
                        of statistical graphics. Three of particular note are 
                        Edward Tufte's 
                        The Visual Display of Quantitative Information 
                        (1992), Envisioning Information (1990) and 
                        Visual Explanations: Images & Quantities, Evidence 
                        & Narrative (1997) - all published by Graphics 
                        Press (Cheshire, Connecticut).
 
 For an overview of data collection and interpretation 
                        issues we recommend Andrew Odlyzko's important 2000 paper 
                        on Internet Growth: Myth & Reality, Use & Abuse 
                        and Michael Dahn's paper 
                        Counting Angels on a Pinhead: Critically Interpreting 
                        Web Size Estimates.
 
 For another perspective see Alain Desrosières' 
                        The Politics of Large Numbers - a History of Statistical 
                        Reasoning (Cambridge: Harvard Uni Press 1998), Michael 
                        Anderson's The American Census: A Social History (New 
                        Haven: Yale Uni Press 1988) and essays in Statistics 
                        & Society: The Arithmetic of Politics (London: 
                        Arnold 1999) edited by Daniel Dorling and Stephen Simpson.
 
 
  sectoral studies and standards 
 The US White Paper on Electronic Journal Statistics 
                        (WPEJS), 
                        reflecting the 1998 International Coalition of Library 
                        Consortia 1998 Guidelines for Statistical Measures 
                        of Usage of Web-Based Indexed, Abstracted & Full Text 
                        Resources (ICOLC), 
                        deals with library statistics.
 
 The Australian Internet Industry Association (IIA) 
                        is encouraging development of a set of standard measures 
                        for the local online industry, including agreed standards 
                        for "Site Centric/Rating, and Ad Server Measurement". 
                        The University of Southen California has published a paper 
                        (PDF) 
                        mapping competing US industry measures. It should be read 
                        in conjunction with the outstanding paper 
                        by Thomas Novak & Donna Hoffman on New Metrics 
                        for New Media Toward the Development of Web Measurement 
                        Standards.
 
 
  measuring the information economy 
 Questions about mapping the size, shape and volatility 
                        of the 'new economy' are explored in the Information Economy 
                        guide elsewhere on this 
                        site.
 
 
  DIY spin generators 
 Robert Orenstein's 'Irresponsible Internet Statistics 
                        Generator' (IISG) 
                        retains its value for those trying to make sense of some 
                        of the loopier government, academic and business projections.
 
 
 
 
  next page  
                        (the metrics business) 
 
 
 |  
                        
                        
                       |