Carnegie Mellon University

Data Privacy Center

Data Privacy Course


Project Track 5: Identifiability of IP Addresses




There are numerous network tools that provide inferences about an IP address. In this project, you will use these network tools, as well as, other recorded information that is publicly available to provide information about people using IP addresses found in publicly available weblogs.

Assignment 1

You must complete the first assignment to do a project in this track. You may complete assignment 1 and then later change your mind about which project you will in fact provide as your term project, provided your final decision occurs prior to the second project assignment and is approved by the instructor. See the course schedule.

If you perform a Google search using robots.txt get and you will see lots of web pages, many about how to write robots.txt files, but among them will be actual web logs as well. An example of a weblog appears at https://www.ursaoutdoors.com/logs/access.log. Below are the first few lines of the log.


216.39.48.161 - - [18/Oct/2003:11:54:26 -0700] "GET /robots.txt HTTP/1.1" 404 347 "-" "Scooter/3.2"
216.39.48.161 - - [18/Oct/2003:11:54:29 -0700] "GET /logs/ HTTP/1.1" 200 1367 "-" "Scooter/3.2"
152.163.253.69 - - [18/Oct/2003:18:49:33 -0700] "GET /rainforestgentle2.html HTTP/1.0" 404 343 "https://www.google.com/search?q=free+brids+or+quaker+hl=en&lr=&ie=UTF-8&start=10&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; Hotbar 4.3.5.0)"

The first line informs us that the machine whose IP address was 216.39.48.161 visited this site on October 18, 2003.

In this assignment, you will locate an on-line weblogs and report information about IP addresses. Below are the steps you must perform.

  1. Perform a manual Google search on robots.txt get and examine the retrieved pages. Pay particular attention to the webpages that are actual weblogs. Select 3 of the weblogs for your use. Each weblog should contains at least 30 distinct IP addresses. These weblogs will be your sample. Make a local copy of the information contained in your sample and record the URL of each of the 3 weblogs.

  2. Research out available network tools, such as reverse nslookup and network connection and allocation information. See what tools are available to tell you information about an IP address based on routing and assignment information. Summarize your findings by constructing a chart of what each tool accepts what kind of information as input and provides which fields of output.

  3. For 30 distinct IP addresses found in your sample, report the information found in step 2. Note the order in which you may have to perform look-ups and searches. Record the completeness of the information.

Write a 2-3 page report of your findings. Provide a summary discussion about the usefulness of the results you have found and identify any privacy concerns you may have. Include as an appendix the information for each of the 30 IP addresses you tested.


The Project

Below is an overview of steps for this project.
  1. Sometimes an email address can be associated with an IP address if the user has posted an email message. To uncover such information, write a program to search the web on an IP address, and identify whether the IP address (or its name) appears in an email header of an email message that is on-line. If so, return the email message content.

  2. Write a program that given an IP address automatically queries the tools you found in Assignment 1 to provide information on the given IP address. In this step, you are automating the task you performed manually in Assignment 1.

  3. Enhance your program by incorporating your program in step 2 (immediately above) with the one in step 1. Given an IP address, your enhanced program should report various network information about the IP address as well as any associated email postings.

  4. Run you enhanced program on the 30 IP addresses you used in Assignment 1. Record your results and compare them to your manual findings. Did you learn any new information? Run your enhanced program on other IP addresses in your sample. Report your results. Note how much and what kind of information was found on each IP address.

  5. Graduate credit. If you are enrolled for graduate credit, enhance your program further by allowing it to operate on a weblog rather than a single IP address. This will allow you to check many IP addresses quickly because you can fetch the URLs of numerous weblogs and see how many IP addresses give you interesting information. Be sure to use distinct IP addresses. People tend to visit the same sites regularly, so you don't want to count the same IP address multiple times. RFecord how much information is typically gained. See if you can make more generalized statements as supported by your findings.


Assignment 2. Report on your results from steps 1,immediately above. Describe your algorithmic design and report what kind of information is available by email and what can be learned about people because their IP address is included.

Final report. Complete the steps above and gather findings. Be selective in what information you present. You may elect to report on other aspects not necessarily those listed above. Write a final report for the project and prepare and conduct an in-person poster presentation of your work.

Graduate credit: If you are taking this course for graduate credit, you must write a conference-style paper on your work, rather than a report.


SPring 2004 Privacy and Anonymity in Data
Professor: Latanya Sweeney, Ph.D. [latanya@dataprivacylab.org]