LLM-Based Entity Extraction for Cyber Threat Reports
Author(s)
Alperin, Kenneth; de Silva, Alexis
DownloadTechnical Report (509.0Kb)
Metadata
Show full item recordAbstract
As the cyber threat landscape and capabilities
of advance persistent threats continue to expand, applying
cutting-edge technology to the domain of cyber intelligence
is necessary for the United States Space Force
to keep pace in the Great Power Competition. Cyber
intelligence analysts spend an estimated time of nearly
840 man-hours annually on the extraction and validation
of relevant intelligence from cyber threat reports (CTRs).
Named entity recognition (NER) is a natural language
processing technique capable of automatically extracting
and labeling all relevant information from a given text.
Although not a novel idea, this paper aims to expand
the current but limited research on the applications of
NER to the domain of cyber intelligence. This study
uses a new openly-licensed dataset, AnnoCTR, to finetune
a cybersecurity-specific, transformers-based model,
CYBERT. The performance of the model is compared
to the models from the derived literature. Although the
results showed an F1 score of 0.733 – a less optimal
performance compared to previous models – there is
still more work to explore to reduce the production time
of intelligence analysis by half.
Date issued
2025-09-10Department
Lincoln LaboratoryKeywords
Artificial Intelligence, LLSC, Machine Learning, Intelligence community