Artificial Intelligence for Derivative Security Classification: Applications to DoD

Gelbard, Andrew; Hamilton, Lei

dc.contributor.author	Gelbard, Andrew
dc.contributor.author	Hamilton, Lei
dc.date.accessioned	2025-09-10T13:47:08Z
dc.date.available	2025-09-10T13:47:08Z
dc.date.issued	2025-09-10
dc.identifier.uri	https://hdl.handle.net/1721.1/162628
dc.description.abstract	The accurate classification of government documents according to their sensitivity (e.g., UNCLASSIFIED, SECRET, TOP SECRET) is critical for national security, yet historically has relied on time-intensive manual review. The current manual classification process consumes millions of labor hours annually within the U.S. government, significantly diverting skilled personnel from essential analytical tasks. This research explores automating this security classification task using recently available declassified materials from the DISC dataset [1], addressing practical challenges such as noisy Optical Character Recognition (OCR) output, imbalanced data distributions, and potential leakage of explicit classification markers within document text. This dataset contains declassified government documents sourced from the Digital National Security Archive, providing authentic textual examples representative of actual classification scenarios. We evaluate both traditional machine learning approaches and advanced transformerbased language models to classify documents accurately across multiple sensitivity levels. Our results highlight that transformer-based models, particularly DeBERTa, effectively improve identification of the minority but critical TOP SECRET class, achieving recall over 70% and an overall balanced performance (macro F1 score of 0.75), while traditional methods exhibit similar overall accuracy but struggle with minority class recall. Despite promising findings, we caution that conclusions drawn here remain constrained by limited training data size and inherent uncertainties in human-labeled documents. We emphasize the need for larger, rigorously preprocessed datasets and suggest future research integrating authoritative classification guidelines directly into model training, potentially via retrieval-augmented methods. This work thus contributes a foundational, reproducible framework that demonstrates significant potential for machine-assisted security classification, guiding future research and practical applications in the information security domain.	en_US
dc.description.sponsorship	The Department of the Air Force Artificial Intelligence Accelerator	en_US
dc.language.iso	en_US	en_US
dc.subject	Air Force Artificial Intelligence Accelerator	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	Derivative Security Classification	en_US
dc.title	Artificial Intelligence for Derivative Security Classification: Applications to DoD	en_US
dc.type	Technical Report	en_US
dc.contributor.department	Lincoln Laboratory	en_US

DSpace@MIT

Artificial Intelligence for Derivative Security Classification: Applications to DoD

Files in this item

This item appears in the following Collection(s)