Explosion in Data Collection and Data Sharing

Information Explosion

by Latanya Sweeney


This paper examines the tremendous growth in information being collected on individuals. From the examples provided in the paper, it is clear that many details in the lives of most people are being documented in databases somewhere. Recent behavioral tendencies are revealed in the collection of person-specific data. These tendencies are: (1) the "collect more" trend: given an existing person-specific data collection, expand the number of fields being collected; (2) the "collect specifically" trend: replace an existing aggregate data collection with a person-specific one; and, (3) the "collect it if you can" trend: given a question or problem to solve or merely provided the opportunity, gather information by starting a new person-specific data collection related to the question, problem or opportunity. All three tendencies result in more and more information being collected on individuals and make it increasingly difficult to provide useful data with privacy protections.

Keywords: data collection practices, privacy

Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Paper: 26 pages in
PS or PDF.

Sample of data collections discussed

Characterizing the amount of information collected on individuals

The term Global Disk Storage per Person (GDSP) is defined in this paper as the amount of rigid disk drive space sold in a year divided by the adult world population. This measure provides a means for characterizing the amount of disk drive storage that could be used to store information on individuals. Below is a graph of GDSP over time. Also reported is how that figure translates into how much data storage is available to record a minute of a person's time.

