LLM-Based Entity Extraction for Cyber Threat Reports

Alperin, Kenneth; de Silva, Alexis

dc.contributor.author	Alperin, Kenneth
dc.contributor.author	de Silva, Alexis
dc.date.accessioned	2025-09-10T15:40:09Z
dc.date.available	2025-09-10T15:40:09Z
dc.date.issued	2025-09-10
dc.identifier.uri	https://hdl.handle.net/1721.1/162632
dc.description.abstract	As the cyber threat landscape and capabilities of advance persistent threats continue to expand, applying cutting-edge technology to the domain of cyber intelligence is necessary for the United States Space Force to keep pace in the Great Power Competition. Cyber intelligence analysts spend an estimated time of nearly 840 man-hours annually on the extraction and validation of relevant intelligence from cyber threat reports (CTRs). Named entity recognition (NER) is a natural language processing technique capable of automatically extracting and labeling all relevant information from a given text. Although not a novel idea, this paper aims to expand the current but limited research on the applications of NER to the domain of cyber intelligence. This study uses a new openly-licensed dataset, AnnoCTR, to finetune a cybersecurity-specific, transformers-based model, CYBERT. The performance of the model is compared to the models from the derived literature. Although the results showed an F1 score of 0.733 – a less optimal performance compared to previous models – there is still more work to explore to reduce the production time of intelligence analysis by half.	en_US
dc.description.sponsorship	The Department of the Air Force Artificial Intelligence Accelerator	en_US
dc.language.iso	en_US	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	LLSC	en_US
dc.subject	Machine Learning	en_US
dc.subject	Intelligence community	en_US
dc.title	LLM-Based Entity Extraction for Cyber Threat Reports	en_US
dc.type	Technical Report	en_US
dc.contributor.department	Lincoln Laboratory	en_US

DSpace@MIT

LLM-Based Entity Extraction for Cyber Threat Reports

Files in this item

This item appears in the following Collection(s)