Limits of Literature-Conditioned Large Language Models for Predicting Behavioral Experiments
Author(s)
Na, Robin
DownloadThesis PDF (1.211Mb)
Advisor
Almaatouq, Abdullah
Terms of use
Metadata
Show full item recordAbstract
Large language models (LLMs) have recently shown potential in various capabilities contributing to scientific progress. Recent work shows that they can predict experimental outcomes as accurately as human forecasters. Other works show that retrieval-augmented generation (RAG)– conditioning LLMs on relevant documents or databases–can improve the quality of model outputs across various research synthesis tasks. Here, we combine these two streams of work and ask: does conditioning LLMs on published research articles improve their predictions of outcomes in new behavioral experiments? We test this using 20 new experiments on peer punishment in cooperation dilemmas, where the prediction task is to determine how much punishment mechanisms increase or decrease group welfare across different settings. Consistent with prior findings, the baseline offthe-shelf GPT-4.1 model performance matches or exceeds every human (laypeople and experts) forecaster we tested. We then condition the model on 1,398 published papers studying punishment, testing both individual papers and collections constructed by grouping papers in different ways (e.g., theory-focused versus empirical studies, recent versus older publications, high-impact versus lower-impact journals). To our surprise, conditioning on individual papers rarely reduces prediction error, and in many cases it makes predictions worse. Conditioning on collections substantially increases the model’s confidence without increasing its accuracy. Simply providing research articles to language models does not seem to improve predictions of outcomes in new experiments, suggesting that more effective systems may require different approaches to representing and processing scientific evidence.
Date issued
2026-02Department
Sloan School of ManagementPublisher
Massachusetts Institute of Technology