Carnegie Mellon University

Data Privacy Center

Data Privacy / Privacy Technology Course


Identity Angel Project




Objective

The objective of this project is to develop an automated benevolent program that locates resumes on the Web having Social Security Numbers, and then, for each resume found, sends an automated email message alerting the subject that they are needlessly placing themselves at risk to identity theft.

This project builds on the Identity Angel Project of Prof. Sweeney and on work of students done in earlier versions of this course. The goal of the work this semester is to automate the system and have it operate 24/7 for the remainder of the semester and measure its overall performance and impact.

Slides: Introduction, Update 2, Update 3

Assignments: Assignment 1, Assignment 2, Assignment 3

Resources:
Video, Paper
Database slides, test0.java, test1.java, test2.java, test3.java, test4.java

Access database

Java code:



Assignment 1 (Due Thursday 1/26/2006)

The following assignment is due next week, by Friday 1/27/2006 at 9am. No extensions!

As a student in the course, select any one of the following activities and complete it.

Notify the TAs (dp1staff@dataprivacylab.org) as soon as possible of which activity you intend to do. All students must report which activity they have selected by Tuesday 1/24/2006.

Select an activity

Some activities are more involved than others. Some require programming and others do no. You can select any one of these activities to achieve.

  1. Identifying "real" resumes. Write a Java program that attempts to identify real resumes from documents that are retrieved from a Google search "resume vitae." [Hard] Additional resource: Filtered Search using Google API

  2. Harvest email address from resumes. Given a resume text file, write a Java program that harvests the email address of the subject of the resume, if the email address is present. The goal is to identify the email address of the subject of the resume. So, if there are multiple email addresses present, the program should return the email address it considers the subject's. You may also want your program to have an option to return all email addresses found. [Easy] Additional resource: databases of resumes

  3. Harvest Social Security number from resumes. Given a resume text file, write a Java program that harvests the Social Security number of the subject of the resume, if present. Your program should not return values that are not Social Security numbers. [Medium] Additional resource: databases of resumes

  4. Harvest date of birth from resumes. Given a resume text file, write a Java program that harvests the date of birth of the subject of the resume, if present. The goal is to identify the date of birth of the subject of the resume. So, if there are multiple dates present, the program should return the only the date of birth of the subject. [Medium] Additional resource: databases of resumes

  5. Estimate the number of on-line resumes. Review literature and/or conduct your own experiment to determine the number of on-line resumes. Be careful to not just cite a source. Instead, you will want to find means to verify claims made by others, especially job banks. Alternatively, you may come up with your method to estimate the number. [Easy]

  6. Job bank review. Suvery on-line job banks. Decide on a set of questions to ask about a job bank and then get answers from each job bank (most likely using their on-line materials). Sample questions: How many job banks are there? How many resumes do they contain? How is access to resumes controlled? How many of resumes contain Social Security numbers? [Medium]

  7. Investigate credit card application fraud. The nature of the investigation is up to you. It should be insightful. Same ideas: identify some criminal cases and report on them. What are common ways to conduct credit card fraud done? What, if anything, has credit card companies attempted to do to combat the problems. Brainstorm on ways credit card companies might combat the problem. [Medium]

  8. Convert PDF into text. Write a Java program that given a resume in PDF format, produces a text file containing the content. You may write this from scratch, pipe through Google's server, or use any other means. The format does not have to be preserved. [Medium]

  9. Convert HTML into text. Write a Java program that given a resume in HTML format, produces a text file containing the content (no HTML tags). The format does not have to be preserved. [Easy]

  10. Send email messages. Write a Java program that sends an email message. [Easy]

See Architecture and Format for more details.

What to submit


Spring 2006 Data Privacy (Privacy Technology) Course
Professor: Latanya Sweeney, Ph.D. [latanya@dataprivacylab.org]