Show simple item record

dc.contributor.authorAlperin, Kenneth
dc.contributor.authorde Silva, Alexis
dc.date.accessioned2025-09-10T15:40:09Z
dc.date.available2025-09-10T15:40:09Z
dc.date.issued2025-09-10
dc.identifier.urihttps://hdl.handle.net/1721.1/162632
dc.description.abstractAs the cyber threat landscape and capabilities of advance persistent threats continue to expand, applying cutting-edge technology to the domain of cyber intelligence is necessary for the United States Space Force to keep pace in the Great Power Competition. Cyber intelligence analysts spend an estimated time of nearly 840 man-hours annually on the extraction and validation of relevant intelligence from cyber threat reports (CTRs). Named entity recognition (NER) is a natural language processing technique capable of automatically extracting and labeling all relevant information from a given text. Although not a novel idea, this paper aims to expand the current but limited research on the applications of NER to the domain of cyber intelligence. This study uses a new openly-licensed dataset, AnnoCTR, to finetune a cybersecurity-specific, transformers-based model, CYBERT. The performance of the model is compared to the models from the derived literature. Although the results showed an F1 score of 0.733 – a less optimal performance compared to previous models – there is still more work to explore to reduce the production time of intelligence analysis by half.en_US
dc.description.sponsorshipThe Department of the Air Force Artificial Intelligence Acceleratoren_US
dc.language.isoen_USen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectLLSCen_US
dc.subjectMachine Learningen_US
dc.subjectIntelligence communityen_US
dc.titleLLM-Based Entity Extraction for Cyber Threat Reportsen_US
dc.typeTechnical Reporten_US
dc.contributor.departmentLincoln Laboratoryen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record