LLM-Based Entity Extraction for Cyber Threat Reports

Alperin, Kenneth; de Silva, Alexis

Author(s)

Alperin, Kenneth; de Silva, Alexis

DownloadTechnical Report (509.0Kb)

Metadata

Show full item record

Abstract

As the cyber threat landscape and capabilities of advance persistent threats continue to expand, applying cutting-edge technology to the domain of cyber intelligence is necessary for the United States Space Force to keep pace in the Great Power Competition. Cyber intelligence analysts spend an estimated time of nearly 840 man-hours annually on the extraction and validation of relevant intelligence from cyber threat reports (CTRs). Named entity recognition (NER) is a natural language processing technique capable of automatically extracting and labeling all relevant information from a given text. Although not a novel idea, this paper aims to expand the current but limited research on the applications of NER to the domain of cyber intelligence. This study uses a new openly-licensed dataset, AnnoCTR, to finetune a cybersecurity-specific, transformers-based model, CYBERT. The performance of the model is compared to the models from the derived literature. Although the results showed an F1 score of 0.733 – a less optimal performance compared to previous models – there is still more work to explore to reduce the production time of intelligence analysis by half.

Date issued

2025-09-10

URI

https://hdl.handle.net/1721.1/162632

Department

Lincoln Laboratory

Keywords

Artificial Intelligence, LLSC, Machine Learning, Intelligence community

Collections

Reports