Talks on Technology Science (ToTS) and Topics in Privacy (TIP)

Fall 2016

This is the schedule of weekly talks on Technology Science from expert researchers, public interest groups, and others on the social impact of technology and its unforseen consequences.

Join us on most Monday afternoons 2:30-4 PM at CGIS Knafel, Room K354 (1737 Cambridge St, Cambridge, MA).

Date Speakers Topic
9/19 Virgilio Almeida, Harvard University Re-Identification and the Right to Be Forgotten: a data driven study
9/26 Alan Mislove, Northeastern University Increasing the transparency of online algorithms
10/3 Christo Wilson, Northeastern University Caught Red Handed: Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
10/10 Columbus Day None
10/17 Mike Katell, University of Washington Reputational Justice: Transparency vs. Equity in the Information Society
10/24 John Byers, Boston University Witnessing the Rise of the Sharing Economy: Empirical Observations of Airbnb
10/31 Joe Hand, Dat Persistent, decentralized data sharing of research: Introducing Dat
11/7 Cancelled Cancelled
11/18 (Fri) Jean-Paul Schmetz Tracking the Trackers - 11AM - 12PM in room K262
11/21 Gabriel Magno Identifying Stereotypes in the Online Perception of Physical Attractiveness
11/28 Susan Crawford, Harvard Law School, and Holly St. Clair, Commonwealth of Massachusetts How Data and Technology Can Help Improve Government
12/5 Andrea Matwyshyn, Northeastern University Changing the Paradigm Between Cybersecurity and Law
12/12 Edda Grabar, journalist Tracking Google's Accumulation of Healthcare Data



Re-Identification and the Right to Be Forgotten: a data driven study

Data analysis can be an important tool for policy-makers to evaluate and propose policies for cyberspace governance. In this talk I take a quantitative approach to analyze the content that is being delisted by traditional media outlets (i.e., newspapers and broadcasters) as a consequence of the “Right to be Forgotten” policy. Specifically, I discuss a detailed data analysis of a number of delisted links and the corresponding articles. Based on the delisted links, it is possible to determine the names of requesters and perform a demographic analysis on these requesters. I also discuss possible data-driven attacks that “transparency activists” or other third parties could take to discover delisted links.

Speaker: Virgilio Almeida is a full professor of the Computer Science Department at the Federal University of Minas Gerais (UFMG), Brazil. His areas of research interest include large scale distributed system, Internet governance, social computing, autonomic computing and performance modeling and analysis. He received a Ph.D. degree in Computer Science from Vanderbilt University, an MS in Computer Science, from the Pontifical Catholic University in Rio de Janeiro and a BS Electrical Engineering from UFMG, Brazil. He was a visiting professor at Boston University, Technical University of Catalonia (UPC) in Barcelona, Polytechnic Institute of NYU and held visiting appointments at Santa Fe Institute, Hewlett-Packard Research Laboratory and Xerox Research Center.

He is a former National Secretary for Information Technology Policies of the Ministry of Science, Technology and Innovation of Brazil (2011 to 2015). He is the chair of the Brazilian Internet Governance Committee ( He was the chair of NETmundial, Global Multistakeholder Conference on the Future of Internet Governance, that was held in Sao Paulo in 2014.

He published over 150 technical papers and co-authored five books on performance modeling od computer systems, including "Performance By Design" (2004) "Capacity Planning for Web Services" (2002), and "Scaling for E-business" (2000) published by Prentice Hall. He has supervised more 50 PhD theses and MSc dissertations. Prof. Almeida is a full member of the Brazilian Academy of Sciences.

He is currently a Visiting Professor at School of Engineering and Applied Sciences at Harvard University and a Fellow at Berkman Center for Internet & Society.


Increasing the transparency of online algorithms

We have recently entered the era of "big data", where the online activities of billions of people are now routinely collected and analyzed. This explosion of data has led to the development of numerous algorithms for tasks as diverse as online content recommendations, dynamic pricing of goods, and prediction of criminal activity. However, external observers---including researchers, lawmakers, and regulators---typically have only limited visibility into such systems, as both the algorithm itself and the input data are typically considered proprietary. As a result, the increasingly popularity of these systems has brought up significant concerns about their fairness, transparency, and potential discrimination.

In this talk, I discuss my group's recent work that aims to increase the transparency of these systems via online algorithmic auditing. We have developed techniques that allow an external observer to determine properties of the algorithms, such as the extent to which outputs vary between users and the most important input features used to generate outputs. I describe our results from applying our techniques to three different real-world systems: content personalization in Google search, price discrimination in popular e-commerce retailers, and the surge pricing algorithm in Uber. Overall, our results offer a first step towards increasing the transparency of big data algorithms.

Speaker: Alan Mislove is an Associate Professor at the College of Computer and Information Science at Northeastern University. He received his Ph.D. from Rice University in 2009. Prof. Mislove’s research concerns distributed systems and networks, with a focus on using social networks to enhance the security, privacy, and efficiency of newly emerging systems. He is a recipient of an NSF CAREER Award (2011), and his work has been covered by the Wall Street Journal, the New York Times, and the CBS Evening News.


Caught Red Handed: Tracing Information Flows Between Ad Exchanges Using Retargeted Ads

Numerous surveys have shown that Web users are seriously concerned about the loss of privacy associated with online tracking. Alarmingly, these surveys also reveal that people are unaware of the amount of data sharing that occurs between ad exchanges, and thus underestimate the privacy risks associated with online tracking.

In reality, the modern ad ecosystem is fueled by a flow of user data between trackers and ad exchanges. Although online tracking itself is a well-studied phenomenon, the relationships between trackers and ad exchanges remain opaque, and the implications of this data sharing on user privacy sharing are poorly understood.

In this study, we develop a methodology that is able to detect client- and server-side flows of information between arbitrary ad exchanges. Our key insight is to leverage retargeted ads as a mechanism for identifying information flows. Intuitively, our methodology works because it relies on the semantics of how exchanges serve ads, rather than focusing on specific cookie matching mechanisms. Using crawled data on 35,448 ad impressions, we show that our methodology can successfully categorize four different kinds of information sharing between ad exchanges, including cases were existing heuristic methods fails.

Speaker: Christo Wilson is an Assistant Professor in the College of Computer and Information Science at Northeastern University. Professor Wilson's research focuses on Algorithmic Auditing, which is the process of examining black box systems to understand how they work, the data they use, and ultimately how these algorithms impact individuals. To date, he has examined systems like personalization on Google Search, price discrimination in e-commerce, and surge pricing on Uber. Professor Wilson got his PhD from the University of California, Santa Barbara, and his research is supported by the NSF, the European Commission, the Knight Foundation, and the Data Transparency Lab.

10/10 None

Reputational Justice: Transparency vs. Equity in the Information Society

The data industry has evolved the practice of online user profiling from its humble origins as a means to target advertising into the “reputation economy, where, in order to succeed, individuals must work to build and maintain positive digital profiles while data brokers aim to render them completely transparent. While there may be social benefits to increasing transparency and reducing information asymmetries among transactants in business and social interactions, aspects of the reputation economy also present serious risks to cherished values, legal protections, and hard-fought struggles for social equity. Questions arise about data bias and the power of machine inference to surface sensitive or protected information that would be unavailable or off-limits to decision makers in a less connected world. In this talk I discuss the emergence of reputation as an important socio-technical feature of the information society. I also offer a blueprint for confronting some of the moral hazards of algorithmically derived reputation using a multi-agent negotiation approach in order to perpetuate and assert the intent of existing social policies and legal norms in the data ecosystem.

Speaker: Mike Katell is a PhD student at the University of Washington Information School where he is a research assistant in the Tech Policy Lab and a member of the Value Sensitive Design Lab. His work concerns the ethics of information systems and employs the tools of critical design to address questions of race, gender, and class equity in the information society.


Witnessing the Rise of the Sharing Economy: Empirical Observations of Airbnb

Peer-to-peer markets, collectively known as the sharing economy, have emerged as alternative suppliers of goods and services traditionally provided by long-established industries. Hosts offering short-term accommodations on Airbnb, for example, act as hoteliers on a micro-entrepreneurial scale: they market their properties, set prices, manage their online reputation, and decide how much to invest in cleaning, customer service and upkeep. Moreover, they receive continuous public feedback in the form of ratings and reviews left by their guests.

With the unprecedented visibility enabled by datasets collected from sharing economy platforms, data scientists are busy investigating research questions ranging from racial discrimination to reputation management strategies to the economic impacts on incumbent firms. I will discuss our research on Airbnb's differentiated impacts on the hotel industry in the state of Texas, where we identify a causal impact on hotel revenue in the 8-10% range in Austin, the city seeing the greatest impact. I will also touch on some of the nuances of interpreting user-generated ratings in our study of reputation at over 600,000 Airbnb properties worldwide.

Speaker: John Byers is a Professor of Computer Science at Boston University, which he joined in 1999. He is also founding Chief Scientist of Cogo Labs, a technology incubator in Kendall Square, where he has held an executive role since 2005. Professor Byers's academic research centers on data-analytic and algorithmic challenges in two disciplines: the empirical study of Internet platforms and the science of computer networking. His recent research studies the effectiveness of e-commerce platforms such as Groupon, the utility of rating and review sites such as Yelp and TripAdvisor, and the broader impact of sharing economy firms such as Airbnb. His research has been covered in the New York Times, The Economist, in TIME magazine, on NPR, and on Bloomberg TV. Dr. Byers received his B.A. from Cornell University and his Ph.D. in Computer Science at the University of California at Berkeley (1997).


Persistent, decentralized data sharing of research: Introducing Dat

Collaboration in the scientific community requires accessing data and reproducing results. Citations and data linked on the web are not persistent, so when servers break or repositories go offline, data is at risk of becoming inaccessible with time. This talk will cover Dat, free and open source software and infrastructure to preserve, version-control and distribute research data. We will discuss how decentralized data sharing models differ from traditional web infrastructure and the benefits of decentralized tools for privacy, security, and preservation. The talk will include demonstrations of the Dat publishing workflow and Beaker, a peer-to-peer applications platform and web browser.

Dat is a grant-funded, open-source, decentralized data sharing infrastructure for efficiently versioning and syncing changes to data. Dat is designed to be integrated into the workflow of scientific analysis and publishing, as well as to be an easy-to-use tools for distributing datasets. Dat shares data through a free, redundant, distributed network that ensures the integrity and openness of data.

Speaker: Joe Hand is a developer, researcher, and open data enthusiast. Joe's work with Dat focuses on scientific reproducibility and helping researchers share data for collaboration and publishing. Previously Joe worked at the Santa Fe Institute studying cities, urbanization, and development. During his time there he helped develop tools for slum communities worldwide to collect data about their cities. He also works with local governments and volunteer developers to improve the accessibility of civic data.

11/7 Cancelled
11/18 (Fri)

Tracking the Trackers - 11AM - 12PM in room K262

Online tracking poses a serious privacy challenge that has drawn significant attention in both academia and industry.

In this talk, I discuss my company's recent work in detecting tracking and exposing both the extent of tracker and the (mostly unseen) profiles generated by the tracking for the trackers. I will also reflect on whether the benefits of "Big Data" actually requires a massive privacy breach on a global scale or whether this is just a convenience for the companies involved to have all the data at hand. The interesting case of the browser (a major component/accomplice in tracking) as a precursor of things to come when everything becomes a computer (IoT etc...) will be discussed.

Speaker: Jean-Paul Schmetz is the Chief Scientist of Burda GmbH (a major German Media Company) and the founder/CEO of Cliqz GmbH (a browser/search engine company owned by Burda and Mozilla). He received his MS in Computer Science from Stanford University and his MA in Philosophy from the University of Louvain.


Identifying Stereotypes in the Online Perception of Physical Attractiveness

Stereotyping can be viewed as oversimplified ideas about social groups. They can be positive, neutral or negative. The main goal of our work is to identify stereotypes for female physical attractiveness in images available in the Web. We look at the search engines as possible sources of stereotypes. We conducted experiments on Google and Bing by querying the search engines for beautiful and ugly women. We then collect images and extract information of faces. We propose a methodology and apply it to analyze photos gathered from search engines to understand how race and age manifest in the observed stereotypes and how they vary according to countries and regions. Our findings demonstrate the existence of stereotypes for female physical attractiveness, in particular negative stereotypes about black women and positive stereotypes about white women in terms of beauty.

In a following study, we examine the local and global impact of the internet on the formation of female physical attractiveness stereotypes in search engine results. By investigating datasets of images collected from two major search engines in 42 countries, we identify a significant fraction of replicated images. We find that common images are clustered around countries with the same language. We also show that existence of common images among countries is practically eliminated when the queries are limited to local sites. In summary, we show evidence that results from search engines are biased towards the language used to query the system, which leads to certain attractiveness stereotypes that are often quite different from the majority of the female population of the country.

Speaker: Gabriel Magno is a PhD student of Computer Science at Federal University of Minas Gerais (UFMG), Brazil. He is interested in studying social interactions, language patterns and privacy issues in social media and online social networks. He received an MS and a BS in Computer Science from UFMG, Brazil. He was a research assistant at the Social Computing group of the Qatar Computing Research Institute.


How Data and Technology Can Help Improve Government

This will be a brainstorming session on the ways data and technology can improve local government. The session begins with a presentation by Susan Crawford. Her recent book The Responsive City highlights the promising intersection of government and data through vivid case studies featuring municipal pioneers and big data success stories from Boston, Chicago, New York, and more. She explores topics including:

  • Building trust in the public sector and fostering a sustained, collective voice among communities
  • Using data-smart governance to preempt and predict problems while improving quality of life
  • Creating efficiencies and saving taxpayer money with digital tools
  • Spearheading these new approaches to government with innovative leadership

Holly St. Clair will respond and provide a few words about her thoughts and vision for the State of Massachusetts.

Then, the remainder of the session will be spent brainstorming ideas for how data and technology can help improve government. What are some low-hanging opportunities?

Speakers: Susan Crawford is a Professor at Harvard Law School and co-director of the Responsive Communities Initiative of the Berkman ​Klein ​Center​ for Internet & Society at Harvard University​. She is the author of Captive Audience: The Telecom Industry and Monopoly Power in the New Gilded Age, co-author of The Responsive City: Engaging Communities Through Data-Smart Governance, and a contributor to ​Condé Nast's Backchannel. She was a partner at Wilmer, Cutler & Pickering (now WilmerHale) before becoming a law professor.

Holly St. Clair is the Chief Digital Officer overseeing the Commonwealth of Massachusett's activities in data management, data analysis, research, and public access to data. Ms. St. Clair has pioneered the use of advanced decision support tools in Metropolitan Boston, managing a variety of projects that use scenarios modeling, community indicators, and innovative meeting formats to engage stakeholders in dialogue about policy choices. She has a excellent track record in public sector innovation and is recognized by Planetizen as one of the Leading Thinkers and Innovators in the field of Urban Planning and Technology.


Changing the Paradigm Between Cybersecurity and Law

This talk challenges the basic assumptions of the emerging legal area of “cyber” or “cybersecurity.” I argue that the two dominant “cybersecurity” paradigms – information sharing and deterrence –channel law and policy in misguided directions. In their current form they will neither meaningfully thwart technology-mediated attacks on our national security nor meaningfully bolster consumer protection. Drawing insights from the work of philosopher of science Michael Polanyi, I reverse engineer the “cybersecurity” conversation. I identify four flaws that are currently pervasive in the legal academic and policy analysis of security – privacy conflation, incommensurability, internet exceptionalism, and technology unsuitability. I then offer a radically new paradigm – reciprocal security inducement. Reciprocal security inducement reframes the information security conversation around key two elements: information vigilance infrastructure and defense primacy. I conclude with a series of concrete legal and policy proposals embodying the reciprocal security inducement paradigm.

Speaker: Andrea Matwyshyn is an academic and author whose work focuses on technology and innovation policy, particularly information security, consumer privacy, intellectual property, and technology workforce pipeline policy. Professor Matwyshyn is a US-UK Fulbright Commission Cyber Security Scholar award recipient in 2016-2017, collaborating with the University of Oxford Global Cyber Security Capacity Centre. She is a (tenured full) professor of law / professor of computer science (by courtesy) at Northeastern University, a faculty affiliate of the Center for Internet and Society at Stanford Law School, and a visiting research collaborator at the Center for Information Technology Policy at Princeton University, where she was the Microsoft Visiting Professor of Information Technology Policy during 2014-15.

She has worked in both the public and the private sector. In 2014, she served as the Senior Policy Advisor/ Academic in Residence at the U.S. Federal Trade Commission. As public service, she has testified in Congress on issues of information security regulation, and she maintains ongoing policy engagement. Prior to entering the academy, she was a corporate attorney in private practice, focusing her work on technology transactions. She continues to maintain collaborative technology industry relationships.

Professor Matwyshyn has previously held primary appointments in University of Pennsylvania's Wharton School, Northwestern University School of Law, and the University of Florida Levin College of Law. She has also held visiting appointments or affiliations at the University of Oxford, University of Cambridge, University of Edinburgh, Singapore Management University, Indian School of Business and University of Notre Dame.


Tracking Google's Accumulation of Healthcare Data

Just three years ago, a newspaper article appeared on my desk. Google has developed a smart glasses. And the only ones who would use them were doctors. A short time later the pharmaceutical group Novartis announced a cooperation with Google. Then it quickly went on: Google and 23andme, the Life Science Center Calico, a big Google-study to investigate what is health. Cooperations with Genentech and Sanofi. Investments biotech and health insurance. Many media outlets always reported about it - but covered always only one small part. I've tried to point out the network that Google is using - as a data company - to penetrate the health care system. This led to the question: What happens when large IT corporations capture and process such data?

Speaker: Edda Grabar has been investigating this and other scientific and health issues since 2001. Before that, she studied biology and journalism at the Johannes Gutenberg-University of Mainz, where she researched the BSE- and (possibly also) Alzheimer-causing prions and eventually gave her a practical training in health care. In recent years, she has written for Die Zeit, Süddeutsche Zeitung, Focus, Financial Times Deutschland (Germany), Der Standard (Vienna), Technology Review, and the related online editions.

In 2008, she received the Prize for Science Journalism of the French Foreign Ministry for her work on neurological research. In 2013 she was distinguished with the Expopharm Media Prize 2013 for a critical examination of drug tests in the GDR. 2015 she won a fellowship for journalistic investigations about unneeded and badly conducted hip implementations in Germany. In 2016 she received the highest remunerated fellowship for science journalist in German speaking countries for her investigation of the way Google starts to penetrate health care systems.

Prior Sessions

Spring 2016 | Fall 2015 | Spring 2014 | Fall 2013 | Spring 2013 | Fall 2012 | Spring 2012 | Fall 2011

Copyright © 2012-2015. President and Fellows Harvard University.   |   IQSS   |   Data Privacy Lab   |   Technology Science