MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Research Computing
  • AIA
  • Reports
  • View Item
  • DSpace@MIT Home
  • Research Computing
  • AIA
  • Reports
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Artificial Intelligence for Derivative Security Classification: Applications to DoD

Author(s)
Gelbard, Andrew; Hamilton, Lei
Thumbnail
DownloadTechnical Report (1.662Mb)
Metadata
Show full item record
Abstract
The accurate classification of government documents according to their sensitivity (e.g., UNCLASSIFIED, SECRET, TOP SECRET) is critical for national security, yet historically has relied on time-intensive manual review. The current manual classification process consumes millions of labor hours annually within the U.S. government, significantly diverting skilled personnel from essential analytical tasks. This research explores automating this security classification task using recently available declassified materials from the DISC dataset [1], addressing practical challenges such as noisy Optical Character Recognition (OCR) output, imbalanced data distributions, and potential leakage of explicit classification markers within document text. This dataset contains declassified government documents sourced from the Digital National Security Archive, providing authentic textual examples representative of actual classification scenarios. We evaluate both traditional machine learning approaches and advanced transformerbased language models to classify documents accurately across multiple sensitivity levels. Our results highlight that transformer-based models, particularly DeBERTa, effectively improve identification of the minority but critical TOP SECRET class, achieving recall over 70% and an overall balanced performance (macro F1 score of 0.75), while traditional methods exhibit similar overall accuracy but struggle with minority class recall. Despite promising findings, we caution that conclusions drawn here remain constrained by limited training data size and inherent uncertainties in human-labeled documents. We emphasize the need for larger, rigorously preprocessed datasets and suggest future research integrating authoritative classification guidelines directly into model training, potentially via retrieval-augmented methods. This work thus contributes a foundational, reproducible framework that demonstrates significant potential for machine-assisted security classification, guiding future research and practical applications in the information security domain.
Date issued
2025-09-10
URI
https://hdl.handle.net/1721.1/162628
Department
Lincoln Laboratory
Keywords
Air Force Artificial Intelligence Accelerator, Artificial Intelligence, Derivative Security Classification

Collections
  • Reports

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.