Project Overview

The goal of the ICU project is to automatically identify computer science undergraduates in attendance at selected universities by mining publicly available, official university web pages. This does not include student webpages. This work was inspired by the paradox that Princeton illegally attempted to get the identities of Yale's accepted students over the web, even though the same information is available publicly and legally over the web. It is also inspired by the fact that some on-line registers of student names appear despite university privacy policies prohibiting such.

It seems so ironic that Princeton illegally attempted to get the identities of Yale's accepted students over the web, at a time when the same information is often available publicly and legally over the web. By examining the ease and difficulty of acquiring this information from official web school web pages, we are left to ask questions about the role technology plays, the quality of the information obtained, uses of information beyond its intended purpose, and inconsistency in public policies with respect to information access and privacy.

In July 2002, a representative from admissions at Yale University complained to the US Federal Bureau of Investigation that someone in admissions at Princeton University had illegally gained access to their admissions system in order to snoop on students who had been admitted to both schools. Such information provided strategic knowledge to Princeton, but the price of acquring it illegally led to public embarrassment and legal action against Princeton. (See Washington Post and Wired news articles for more information.)

Without minimizing the ethical issues this incident underscores, we at the Data Privacy Lab found Princeton's actions additionally disturbing because they were unnecessary from an information acquisition perspective. We believe that the very information Princeton sought illegally over the web can often be easily acquired over the web legally. By simply reviewing publicly available, official web pages maintained at a school's web domain, we can identify the students in attendance by name and therefore know who was admitted.

A question we seek to answer in this project is how easy is it for schools to gain strategic information on students accepted for admission at other schools by information provided by the schools themselves. A second question we seek to answer is how often a university's stated privacy policy conflicts with its practice in releasing student information and what tools can help.

In the United States, the Family Educational Rights and Privacy Act (FERPA) addresses rights parents and students may have concerning school records. Section 5(a) and 5(b) of FERPA requires any school making directory information about students publicly available to give public notice of the categories of information that will be provided. Some schools expressly state that no information, including the names of the students in attendance, will be made publicly available, yet seemingly complete rosters of students may appear on official web pages of these schools anyway. This is particularly disturbing because despite the school's intention to keep the names of its students confidential, there is no existing technology to assist in enforcing this policy in a large organization. By providing automated tools to detect the existence of such rosters, we provide tools to help a school enforce its stated policy.

