| overview
 directories
 
 engines
 
 dark web
 
 images
 
 shopping
 
 people
 
 behaviour
 
 wetware
 
 law
 
 cases
 
 anxieties
 
 landmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  related
 Profiles:
 
 Metadata
 
 Optimisation
 
 Search
 Terms
 
 Colour
 Pages
 
 Browsers
 
 
 
 
 
 |  images and sounds 
 This 
                        page considers searching of audio, video and still images.
 
 It covers -
  introduction 
 Searching of online content, for the moment, remains resolutely 
                        text-based despite the flood of personal snaps, sound 
                        recordings, archival film, contemporary video and maps 
                        pouring onto the web through institutional/corporate sites 
                        and services such as Flickr or YouTube.
 
 It is likely that the amount of such data, 
                        in terms of bytes, already accounts for over half of the 
                        bright and dark web. 
                        It will continue to increase as individuals and organisations 
                        unlock their archives 
                        or merely assume that peers are interested in video 
                        blogs. One indication of growth is Google's announcement 
                        in February 2005 (predating global uptake of YouTube) 
                        that its cache of the web had reached over a billion images, 
                        up from some 880 million in February 2004. Another has 
                        been announcement that the BBC and other major moving 
                        image owners will place much of their collections online.
 
 Searching is text-based because existing web search engines 
                        are not good at -
 
                         
                          comparing non-text content, in particular comparing 
                          items that are not exact copiesmaking 
                          sense of non-text content, eg determining that a picture 
                          is upside down That 
                        means the 'universal library' or 'global jukebox' remains 
                        a myth: there is a lot 
                        of content online but if you cannot find it, for practical 
                        purposes it does not exist.
 Searching is text-based because the major search engines 
                        rely on text associated with an image or an audio/video 
                        recording, rather than independent interpretation of that 
                        content. That text enables identification of the images 
                        and sounds. It also underpins much of the sorting by the 
                        engines of that content.
 
 
  identification 
 Basic identification of content by whole-of-web search 
                        engines such as Google involves determination of the file 
                        type: the .gif, .jpg, .html, .wmv or other suffix that 
                        forms part of an individual file's name. Without that 
                        suffix the file will not be recognised.
 
 In spidering the web (or a particular set of files, which 
                        might be on an intranet) search engines typically parse 
                        file directories and individual files, such as web pages 
                        that comprise a mix of html code and 'embedded' graphics.
 
 The content or other attributes (eg date of creation) 
                        of digital audio, video or still image files can sometimes 
                        be inferred from the title of the individual file, from 
                        any 'alt' tag intended to enhance the file's accessibility 
                        or more broadly from the domain name and from the type 
                        of links pointing to that domain (eg to an adult 
                        content site). Much of the time the file title is 
                        not useful, either being generic (eg catpicture.jpg or 
                        header.gif) or meaningless from the engine's perspective 
                        (eg 01457.gif or 3257.mp3). Many images do not have 'Alt' 
                        tags; those tags are often non-descriptive or generic.
 
 Some files do feature metadata, 
                        which might include detailed DRM 
                        information. Only a small proportion of all image/audio 
                        files on the net feature useful data, in particular DRM 
                        tags (which are thus not a fundamental resource for the 
                        major engines but may be used by specialist art history 
                        or other cultural database engines).
 
 Most engines accordingly identify still images through 
                        associated text. That text might be the name of a web 
                        page and the words within a web page (engines typically 
                        assume that the words nearest to an image, particularly 
                        what appears to be a caption, relate to that image). The 
                        text might instead be the wording in a link to the still/moving 
                        image or audio file.
 
 Contributors of files to services such as YouTube are 
                        encouraged to include concise keywords during the submission 
                        process. Those keywords - metadata - may not adequately 
                        describe the particular file, for example because they 
                        are mispelt, confusing or do not provide enough information 
                        for an appropriately granular search where there are a 
                        large number of files with similar content or similar 
                        keywords.
 
 Outside of those cues the major search engines do not 
                        have the capacity to consistently differentiate between 
                        images (and between audio files), for example determining 
                        purely on the pixels that one image represents a pear, 
                        another represents a penguin and a third represents the 
                        more intimate parts of a porn star.
 
 
  sorting 
 Having identified audiovisual files, how do the major 
                        search engines sort them (eg rank those files in a way 
                        that presents the results in a way that most closely reflects 
                        the user's search terms or other criteria such as date)?
 
 As with the discussion earlier in this profile, most search 
                        engine algorithms are proprietary and are refined to reflect 
                        research and feedback from users. As a result there is 
                        some agreement about basic principles but specifics of 
                        current and past sorting mechanisms for the major engines 
                        are unavailable. Some sense of those principles is provided 
                        by the industry and academic studies cited elsewhere in 
                        this profile.
 
 Much image searching is implicitly a subset of standard 
                        searches of web pages, with the engine for example identifying 
                        all image-bearing pages that match the user's search criteria. 
                        Image-specific searching in major engines such as Google 
                        trawls the particular engine's cache of web pages and 
                        then presents thumbnail images from that cache in a rank 
                        that usually reflects factors such as -
 
                        the 
                          ranking of the page with which the image is associatedwhether 
                          users have clicked on the thumbnail during identical 
                          past image searchesany 
                          weighting given to text that is likely to be directly 
                          associated with the image, eg whether it has a relevant 
                          caption, whether the image was 'embedded' close to multiple 
                          instances of the search term in the page Searches 
                        of moving image collections, such as YouTube, typically 
                        rank the files by the closeness of the match between the 
                        user's search request and the terms supplied by the person 
                        who uploaded each file onto the collection. Other rankings 
                        (for example by date of upload, place of upload, most 
                        viewed, most commented or most linked-to) are possible; 
                        that ranking is mechanistic and does not involve the engine 
                        interpreting the content of that video. 
 The same mechanisms are used to rank audio file collections, 
                        with engines leveraging external cues rather than sorting 
                        on the basis that a particular performance 'sounds like' 
                        David Bowie rather than Iggy Popp or Enrico Caruso.
 
 
  futures 
 A less mechanistic identification and ranking of non-text 
                        files is often characterised as the "next frontier" 
                        in search or even in artificial intelligence, with ambitious 
                        claims being made for imminent breakthroughs in content 
                        analysis or for application of neural networks and other 
                        technologies that at the moment are just buzzwords in 
                        search of a research grant.
 
 What is often characterised as 'pattern recognition' has 
                        attracted major commercial, military and academic attention. 
                        That is unsurprising given -
 
                         
                          awareness that interpretation of images is a key for 
                          roboticsarguments 
                          that the algorithms and hardware used in image and audio 
                          recognition will be associated with, if not drive, breakthroughs 
                          in artificial intelligenceperceptions 
                          that problems associated with recognition are conceptually 
                          demanding (and thus attract the best researchers) and 
                          if solved are likely to be extremely lucrative (thus 
                          attracting venture capital 
                          or other investment) 
                          awareness that matching and interpreting images is central 
                          to some forms of biometrics 
                          and satellite-based or other geospatial intelligence 
                          systems  Some 
                        researchers have placed their faith in a combination of 
                        falling hardware costs and technologies such as SMIL, 
                        with software for example automatically 'listening' to 
                        the soundtrack of a video, converting that sound into 
                        text (ie a script synchronised to the particular image) 
                        and thereby being able to index a film without substantial 
                        human intervention. 
 In practice current applications centre on 'brute force' 
                        comparisons between a known audio or video recording and 
                        one that is suspected to be a copy. That is attractive 
                        for intellectual property rights owners concerned to place 
                        the copyright or trademark genie back into the digital 
                        bottle. Some enthusiasts have envisaged enforcement agents 
                        able to automatically and comprehensively -
 
                        trawl 
                          the web in search of copies of musical performances, 
                          movies, television programs or even trademarksdetermine 
                          whether those copies or uses were authorisedinstruct 
                          associates (eg social software services) and third parties 
                          (eg ISPs, ICHs, search engines) to delete or merely 
                          block unauthorised copies/uses Apart 
                        from fundamental technical challenges, that vision conflicts 
                        with a range of commercial and legal realities.
 Other enthusiasts have suggested that software can or 
                        shortly will be able to readily interpret the content 
                        of still/moving images without external cues and without 
                        the preliminary selection (and analysis of standard data 
                        in a highly circumscribed field) used by ANPR 
                        systems. Competing promoters of content 
                        filters have for example recurrently referred to artificial 
                        intelligence in claiming that their products can very 
                        accurately discriminate between adult content and youth-friendly 
                        pictures on the basis of shape, colour (in particular 
                        "skin tones") or luminosity. One vendor accordingly 
                        claims to distinguish with 98% accuracy between backgrounds 
                        and foreground, identify faces, differentiate between 
                        a beach scene and pornography, and between a puppydog's 
                        pink tummy, a peach and a naughty Miss America.
 
 As Rea, Lacey, Lambe & Dayhot in 'Multimodal Periodicity 
                        Analysis for Illicit Content Detection in Videos' (PDF) 
                        and James Wang in Integrated Region-Based Image Retrieval 
                        (New York: Kluwer 2006) note, that task is challenging, 
                        particularly for video rather than still images and on 
                        a real time basis. As a search mechanism such technologies 
                        accordingly offer exclusion - accurate or otherwise - 
                        rather than a useful way of finding all still images of 
                        peaches (by Cézanne or otherwise) and any video 
                        which features plums on the kitchen table.
 
 
 
  next page  (shopping) 
 
 
 
 
 
 
 
 
 
 | 
                        
                         |