Under-Coverage of Double Machine Learning Due to Implementation Choices
Author(s)
Siegmann, Charlotte B.
DownloadThesis PDF (623.4Kb)
Advisor
Andrews, Isaiah
Terms of use
Metadata
Show full item recordAbstract
Double ML estimators can estimate coefficients of interest with far fewer functional form assumptions than linear econometric methods. However, DML requires researchers to make a range of implementation choices, including the selection of the function class, the random seed, and hyperparameter configurations. While asymptotic theory suggests these choices should not affect final estimates, we show that for 10 economic analyses (8 of them published and peer-reviewed), implementation choices affect the results. In half of the datasets, different implementation choices even change the interpretation of findings between negative, null, or positive effects. We link these results to a framework for empirically assessing the performance of machine-learning-based estimators, focusing on precision, coverage, and susceptibility to manipulation. This is meant to complement asymptotic theory. We demonstrate that the coverage of DML confidence intervals is too low—placing an upper bound of 48% on the expected coverage of conventional 95% confidence intervals for published DML economics papers. We show that in the status quo, the susceptibility of DML to manipulation by researchers is high, but propose ways to mitigate this susceptibility.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of EconomicsPublisher
Massachusetts Institute of Technology