& the GII
This page considers privacy aspects of online searching:
who knows what you are doing online and what is the status
of that information?
It covers -
is complemented by a discussion of consumer anxieties
regarding search engines and comments on 'social
search' (aka 'people mining').
It is common to encounter the meme that use of the net
is necessarily anonymous or pseudonymous - "on the
net no one knows you are dog". We have suggested
elsewhere on this site that the scope for such anonymity
is often overstated: in practice government agencies and
marketers can often 'triangulate' identity and, particularly
through accessing data maintained by internet service
providers and other intermediaries, can sometimes track
online activity in great detail.
That electronic reader peering over the user's shoulder
may, for example, be able to identify -
a user was online
email was sent, and
even where it was sent to
much data was downloaded or uploaded, often with a close
indication of the nature of that data and its point
sites were visited
searches were conducted.
for example is reported
to have the capacity to -
produce a list of people who searched for a specific
term, identified by IP address and/or Google cookie
produce a list of the terms searched by the user of
a specified IP address or Google cookie value.
have worried about server logs, scattered across cyberspace
and marking visits to a particular site, what pages were
viewed and the visitor's IP address.
Those concerns gained attention in 2006, when it was revealed
that the US government had ordered Google to supply information
regarding a million random web addresses and records of
all Google searches over a one week period. That order
would supposedly enable the government to determine how
often pornography shows up in online searches, thereby
substantiating a defence of the Child Online Protection
Act (which as noted earlier in this guide was struck
down by the Supreme Court in 2004). The government has
argued that COPA is the only viable way to combat child
Tim Wu of Columbia Law School and co-author with Jack
Goldsmith of Who Controls The Internet: Illusions
Of A Borderless World (New York: Oxford Uni Press
the big news for most Americans shouldn't be that the
administration wants yet more confidential records.
It should be the revelation that every single search
you've ever conducted—ever—is stored on
a database, somewhere. Forget e-mail and wiretaps—for
many of us, there's probably nothing more embarrassing
than the searches we've made over the last decade ...
Americans today feel great freedom to tell their deepest
secrets; secrets they won't share with their spouses
or priests, to their computers. The Luddites were right—our
closest confidants today are robots. People have a place
to find basic anonymous information on things like sexually
transmitted diseases, depression, or drug addiction.
The ability to look in secret for another job is not
merely liberating, it's economically efficient. But
all this depends on our feeling free to search without
The other alternative is that we all just accept this
limitation on our freedom and learn to be more careful.
If you go around googling "gay cowboy," perhaps
you're just asking for trouble. Perhaps one should live,
as they say, as if everything you do will soon show
up on Page A1 of the New York Observer. But
living like that—as if everything you do will
be publicly aired one day—is wretched, and the
exact opposite of what it means to be living in a free
Today's search engines are close to an "always
on" wiretap. Even for someone like myself who's
hardly a privacy activist, that's a bit too scary. Google,
and the rest of the search engine industry need to learn
how to better keep our secrets.
in the same year AOL inadvertently released search log
data covering searches by 658,000 AOL subcribers from
March to May. A spokesperson confessed
This was a screw up, and we're angry and upset about
it. It was an innocent enough attempt to reach out to
the academic community with new research tools, but
it was obviously not appropriately vetted, and if it
had been, it would have been stopped in an instant.
Although there was no personally-identifiable data linked
to these accounts, we're absolutely not defending this.
It was a mistake, and we apologize. We've launched an
internal investigation into what happened, and we are
taking steps to ensure that this type of thing never
The information featured identification numbers rather
than names or user IDs, although critics sniffed that
people could be readily identified on the basis of their
searches. Rebecca Jeschke of the EFF claimed that "It's
reasonably easy for people to see what their neighbors
are searching for, since most people usually google themselves".
Legislation in Europe and elsewhere over the past five
years has sought to mandate long-term retention by ISPs
and telecommunication providers of traffic records, including
information about messaging (SMS and email) and server
logs. Governments have sought access to that data under
cybercrime or national security legislation, which also
encompasses privileged access to personal and corporate
computers and networks.
Legal regimes typically provide some protection for email
(on the model of written correspondence) but little or
no protection for search information.
Both government and civil litigators are increasingly
finding logs a valuable target for subpoenas. That is
of interest because of the growing capacity to "wring
every ounce of useful information out of such logs",
eg identifying a user's identity from an IP address by
correlating data from different sources.
Some advocates have suggested that anonymisation services
are the most effective response to concerns, although
the effectiveness of those services is uncertain and they
are arguably beyond the capacity of most users.
Critics have suggested a number of potential responses
by search engines such as Google and Yahoo! in order to
minimise privacy-related risks while not significantly
inhibiting research and development.
Wu asked Google to
stop keeping quite so much information attached to our
IP addresses; please modify logging practices so that
all identifying information is stripped. And please
run history's greatest "search and delete,"
right now, and take out the IP addresses from every
file that contains everyone's last five years of searches.
Lauren Weinstein similarly suggests that search services
1) Minimize the length of time that full log records
are maintained for users not using enhanced services.
For instance, full records might be maintained for 30
days (an arbitrary figure for this example). These would
be available to law enforcement queries and the like
for ongoing investigations. However, after the expiration
period, records would be anonymized (stripped of IP,
cookie, and other connection-related data identifying
the user). Logged search query strings (though they
also can contain personal information, as we know) would
not be affected at this stage and would continue to
be available for R&D and other purposes, but now
with a significantly lower outside abuse potential.
2) After some longer period of time (a year? - again,
an arbitrary period for the sake of this example) the
remaining portion of the records for non-enhanced service
users would be deleted. I of course cannot address the
non-trivial issues of system and related data backups
in this regard, since I have no idea how Google has
structured backup activities across their enterprise,
but this aspect in particular might make for an interesting
notes that protecting users of enhanced search-history-based
services poses other problems. In order for those services
to work some form of detailed data must be maintained
for the users. It has been suggested that the potential
for abuse could be greatly reduced through various cryptographic
In March 2007 Google announced that all identifying data
will be erased after 18-24 months and that
privacy is one of the cornerstones of trust. We will
be retroactively going back into our log database and
anonymising all the information there.
June 2007 it informed
European Union privacy watchdog group the Article 29 Data
Protection Working Party that it would cut back retention
of identifiable web search histories to 18 months from
24, with data being anonymised after a year and a half.
Google noted that it faced a great lack of legal clarity,
with some of its services potentially falling under EU
data retention rules that require organisations to keep
some electronic communication records for up to two years,
and that "We looked at what other companies in the
industry do, and we were not able to find explicit and
clear privacy policies".
next page (tracking)