Explosion in Data Collection and Data Sharing |
Keywords: data collection practices, privacy
Citation:
The term Global Disk Storage per Person (GDSP) is defined
in this paper as the amount of rigid disk drive space sold
in a year divided by the adult world population.
This measure provides a means for characterizing the amount of
disk drive storage that could be used to store information on individuals.
Below is a graph of GDSP over time. Also reported is how that
figure translates into how much data storage is available
to record a minute of a person's time.
Abstract
This paper examines the tremendous growth in information being collected on individuals. From the examples provided in the paper, it is clear that many details in the lives
of most people are being documented in databases somewhere. Recent behavioral
tendencies are revealed in the collection of person-specific data. These tendencies are:
(1) the "collect more" trend: given an existing person-specific data collection, expand the number of fields being collected; (2) the "collect specifically" trend:
replace an existing aggregate data collection with a person-specific one; and,
(3) the "collect it if you can" trend: given a question or problem to solve or merely provided the opportunity, gather information by starting a new person-specific data collection related to the question, problem or opportunity.
All three tendencies result in more and more information being collected on individuals
and make it increasingly difficult to provide useful data with privacy protections.
Information Explosion. Confidentiality, Disclosure, and Data Access:
Theory and Practical Applications for Statistical Agencies,
L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001.
Paper: 26 pages in PS or
PDF.
Sample of data collections discussed
Characterizing the amount of information collected on individuals