Data futures: Transforming digital traces into public goods in the age of commercial surveillance
Author(s)
Berke, Alex
DownloadThesis PDF (18.88Mb)
Advisor
Larson, Kent
Terms of use
Metadata
Show full item recordAbstract
For decades, government agencies have collected surveys to produce datasets and statistics that serve as public goods, enabling research and empowering communities from whom data are collected. These data sources are costly to collect and are in decline as survey response rates drop. In contrast, increasing quantities of data are collected from the public by companies -- data we unavoidably generate by making purchases, using the Internet, or simply operating a mobile phone. This data collection might be considered a form of surveying the public, but where privatized datasets empower corporations rather than communities, and the ensuing potential harms cannot be empirically assessed without access to these data.
This thesis considers a future where corporations can more accurately track populations and estimate statistics than the government agencies traditionally tasked with such efforts. This thesis illustrates how this future may be nearby and explores resulting questions through case studies. Namely, are there more privacy-preserving or equitable or cooperative ways to manage these data, to benefit the public from whom they are sourced?
The first set of case studies use location data from mobile phones, first developing a more privacy-preserving approach by leveraging recurrent neural networks to generate realistic synthetic data, and second developing aggregated mobility metrics to improve country level population estimates and COVID-19 epidemic models. The next set of case studies use web browser data to evaluate risks of cross-site user tracking that are present despite privacy-enhancing browser developments. The first web study repurposes data collected by a data broker; the second uses a dataset we crowdsourced and openly published to benefit this research and future research. For the next set of case studies, we crowdsourced and published a first-of-its-kind open dataset of purchase histories from thousands of Amazon.com users, along with their sociodemographics. We use this dataset to demonstrate how corporate data can provide insights into societal changes and also evaluate privacy risks due to inferring sensitive consumer information from purchases.
The data used in this thesis (mobile device locations, web browsing data, purchase histories) are examples of digital traces collected continuously from people throughout everyday activities, without explicit consent. This work points towards cooperative data sharing as a paradigm to empower research that benefits the public while prioritizing consent. Could such a paradigm exist with public support and participation? In order to study this and inform future crowdsourcing efforts, we embedded behavior experiments and surveys into our crowdsourcing tools, shedding light on what impacts users' likelihood to share their data, how users believe their data should be used, and how results differ across demographics.
Throughout these studies, this thesis asks a broader question: Can we envision, and build towards, a future with alternative data economies that shift the power dynamics of data collection, along with the control and benefits of these data? To begin to address this question, this thesis proposes speculative, privacy-enhancing, and cooperative commerce networks. Such system changes may incur new costs for consumers. The final case study measures consumers' willingness to pay for privacy in new package delivery networks.
Date issued
2025-02Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)Publisher
Massachusetts Institute of Technology