Carnegie Mellon University

Data Privacy Center

Data Privacy Course


Lab: Information Explosion




Objective

The objective of this lab is for you to get some first-hand experience with the amount of person-specific information available over the World Wide Web.

Considerations that should come to mind:


Parallel activity

While you complete the in-class activities in this lab involve examining collections of information available on the World Wide Web, we will be capturing images from the next round of activities on the Tracemetrics project. Please take a moment to have your left hand imaged.


Part I. Scavenger Hunt on All the People

For the first part of the lab, we will examine an on-line repository of links to publicly available person-specific information. This is termed the DataWatch effort.

Identifying Computer Science Undergraduates

Another project spawned from DataWatch is the Identifying Computer Science Undergraduates project, ICU, which is a database of undergraduate computer science students across the USA. Take a peek at the ICU project and see if you are listed. This information was automatically harvested from on-line lists of students. Notice the motivation behind this project concerned Princeton's illegal access to stduent information at Yale. The project also exposes school violations of FERPA statements.

Trial version of DataWatch

A trial version of DataWatch is available for your use. It has very few links in this version. You will go on a scavenger hunt to locate additional links and include them in the database. Before we begin, take a look at the database. Each URL included in the database contains the following information:

In the scavenger hunt, your team will locate new URLs about person-specific data that may be available for the following kinds of information.


birth information
death information
homicides
assaults
lawsuits
criminal records
political contributions
voting records
real estate holdings
education or school information
medical or health information

road travel
video surveillance cameras of a location
attendance at an event
financial, income, taxes
employment
memberships
clubs or activities
automotive vehicle
demographics: date of birth, gender, ethnicity, race
web sites that lists these kinds of URLs

The competition.

  1. Divide yourselves up into teams of no more than 3 people.
    You can work alone if you want to. Once you have your team identified, register your team by listing your names on the computer provided.

  2. Eligible Content.
    Use your favorite search engine and do some web mining to find sources of person-specific data.

    The information you are looking for may be immediately available on line, such as real estate information or death information. Or, you may find a web page that tells you how to order the data. The data themselves may not necessarily be available over the Internet; they may send a CD, for example. Both of these kinds of web pages are considered a successful find in the hunt.

    The kind of information you are looking for must be for an identified population. For example, the data may be based on people within a particular community, ZIP code, city or town, county, state, profession, or other identifiable grouping of people.

    The data values must be person-specific. That is, if the data appears in a tabular format, each row would relate to a specific person, family or household. Aggregate tabular information is not acceptable.

  3. The hunt.
    Organize your team in any way that seems best for you. We will see which team has the most number of URLs for at least 9 of the categories of information wins. And, we will see which teams found the most interesting kinds of data. Prizes will be awarded to the first and second place teams. Use the submit option in DataWatch to submit your team's URL's.

    SUBMIT URL | VIEW RESULTS

Awards will be given.


Part II. Finding information on Targeted Individuals

In the first part of this Lab you found lots of information on lots of people, but what can you learn about a targeted individual?

Review the URLs of the student Face Book examples. These will be our targets. Put together a dossier on each student. See how much information you can find out about the student. You should include the basics and then add from there. Here is the basic information:

Compose your dossier on the student in a single email message. There should be one email message for each of the students on which you compile a dossier. Each fact you list should include the URL where the information was found.

You may not use any insider access to Face Book. Use only publicly-available information. You may use Face Book information, if the information is generally available to everyone.

Send the email message to dp1staff@dataprivacylab.org. We will review and award prizes for the most complete dossiers as well as for the most interesting.


Spring 2006 Privacy Technology
Professor: Latanya Sweeney, Ph.D. [latanya@dataprivacylab.org]