You can access the archive through the the Carnegie Mellon Library's set of on-line databases located at https://www.library.cmu.edu/Search/AZ.html. Select the New York Times Historical archive, which is a database of articles appearing in the New York Times from 1851 through 1999.
An extraction of citations from the database (along with abstracts) for articles related to privacy is available for your use. This is more complete and accurate than the results from Lab 1. Contact Prof. Sweeney for a copy of this database by sending a message to latanya@dataprivacylab.org. You will want to use this version of the database as the subject of your work.
Your project presentation will show interesting results you found. Additionally, you should allow visitors to your project booth to browse the articles related to the peaks.
Your project presentation will show interesting results you found. Additionally, you should allow visitors to your project booth to browse the articles related to the technologies.
Your project presentation will show interesting results you found. Additionally, you should allow visitors to your project booth to browse the articles that are clustered by keywords.
Below are the keyword groups to use (though you may augment the list as you deem appropriate). A citation "belongs" to the group of keywords if one or more of the keywords in the group appears in the title or abstract of the citation. Matching may be based on parts of words --i.e., substring matching.
There are several techniques you may use to cluster articles. Here is one based on cosine similarity. It is augmented because of the brevity of the text appearing in the titles and abstracts of the citations.
[1] the [2] privacy [3] of [4] rights [5] court ...
[1] privacy [2] rights [3] court ...
[1] 1 [2] 1 [3] 0 ...
Modification! Because of the brevity of the titles and abstracts, there are not many words appearing for each citation. Therefore, you may want to repeat the steps above, but instead of having each word appear as the basis of the index. Combine words together to make groups of related words be the basis of the index. For example, instead of "police" and "crime" each having a separate index position, combine these so that any occurrence of {police, judge, crime, law enforcement} counts together. You may want to use the keyword groupings identified further above.