Carnegie Mellon University

Data Privacy Center

Data Privacy Course


Project Track 1: Privacy Concerns of the Past




In Lab 1, you searched for articles in the New York Times Historical Archive that contained the word privacy. You then summarized these articles in terms of the privacy issues present in the article. You also identified the entities (people or organizations) involved and the notion of 'privacy space' that was the subject of the article. In this project, you will build tools that perform these tasks semi-automatically. The end result is an indexed abstract of privacy articles from the abstract.

Here are some links for reference:

Assignment 1

The first project assignment which you must conduct if you want to do a project in this track is to extract all the privacy articles from the on-line archive. You may use any semi-automated, automated, or even manual system that works best for you. Save a local copy of each article that contained the word 'privacy' for further processing.

Build a database containing the extracted articles. The fields in the database should include title and citation sub-fields, as well as, date and content. These fields should be correctly filled with the information from the extracted articles.

From your database, report summary statistics on the abstracted articles. Validate your results using searches from the original database.

Submit a text dump of your database by FTP'ing the results into your space on dataprivacylab.org. The text dump may have one file per article or all articles in one file, whichever is easiest for you. Submit your dump on the day Project Assignment 1 is due.

Submit a one-page abstract describing your methods and statistics of results. Send the abstract to paddataprivacylab.org by the due date for Project Assignment 1.

Note. You may complete assignment 1 and then later change your mind about which project you will in fact provide as your term project, provided your final decision occurs prior to the second project assignment and is approved by the instructor. See the course schedule.

Note. You will have to provide a public 5-minute presentation of your work to the course. You may elect to present your results from assignment 1 or 2.


Project: Classifying Privacy Articles

In the remainder of this project, you will develop an annotated database to facilitatemore intelligent retrieval of articles. The resulting database should enable more insight into the nature of privacy concerns over time than was possible searching the original database.

You may elect to do any one of the following:

Assignment 2. Provide a description of your method and the algorithm you will use for your system. Include some initial results. Discuss what you perceive as the advantages and disadvantages of your approach. Submit a summary report (3 to 5 pages) by email. Include text that provided the basis of your initial results as an appendix. Send your report to paddataprivacylab.org.

Final report. Feel free to revise and modify your method and algorithm as you deem appropriate. Use your final version on all the articles in your database. Review the results by providing meaning descriptive statistics. Also, analyze the usefulness of your resulting database in terms of uses you can perceive. Report on a couple of interesting facts that are learned as a result of your final database. Submit your final report by email to paddataprivacylab.org. Provide your supporting final database in a text format by FTP'ing the results to your personal space on dataprivacylab.org.

Graduate credit: If you are taking this course for graduate credit, you must also provide a rigorous analysis of the results in terms of the methods used, as well as, in termns of the benefits afforded by your new database. Rather than writing a project report, you will write a conference-style paper on your work.


Fal 2004 Privacy and Anonymity in Data
Professor: Latanya Sweeney, Ph.D. [latanya@dataprivacylab.org]