AIA

Synthetic Network Data Generation for Analyst Training

2026-04-01T00:00:00Z

Synthetic Network Data Generation for Analyst Training Smith, Liam; Wright, Matthew Rapidly evolving cyber threats demand continuous, high-fidelity training for defense analysts. However, generating realistic network traffic datasets creates a significant barrier to entry, often requiring extensive virtualization infrastructure, specialized hardware, and knowledge in cyber range administration. This paper introduces a streamlined architecture, called Generative Packet Captures (GenCap), built upon the foundational capabilities of the FOSR benign traffic generator and the ID2T attack injector. By abstracting these complex tools behind an automated orchestration layer, it enables users to generate scenario-specific PCAP files on demand. This approach democratizes access to training data, allowing analysts to create rigorous network defense scenarios without the need for complex provisioning or systems engineering knowledge.

Office AI Automation using Existing DAF-Approved Software

2026-03-20T00:00:00Z

Office AI Automation using Existing DAF-Approved Software Cui, Wei; Kennedy, Laura The Department of the Air Force (DAF) continues to face mounting administrative workloads that hinder mission focus and operational efficiency. Executive officers and staff officers spend substantial time generating reports, managing emails, routing documents, and organizing taskers across multiple systems. This paper presents the Smart Executive Assistant, an office AI initiative to automate repetitive administrative tasks using existing DAF-approved technologies without a new Authority- To-Operate (ATO). By integrating DAF 365 applications, Power Automate, and approved large language models (LLMs) within secure IL5 and IL6 environments, this solution seeks to reduce time spent on low-value administrative processes by 90% while maintaining compliance and data security.

AI for Scalable Defensive Cyber Log Analysis

2026-03-20T00:00:00Z

AI for Scalable Defensive Cyber Log Analysis Schofield, Catherine; Jananthan, Hayden; Kepner, Jeremy Centralized cyber logging platforms ingest large volumes of heterogeneous telemetry, yet high dimensionality and query-driven workflows often limit scalable analytic insight on these systems. This work presents an automated pipeline for ingesting, characterizing, and analyzing large-scale hostbased logs using sparse representations and distribution-aware statistics. A systematic dimensional analysis reduces hundreds of raw log fields to a small set of informative dimensions suitable for aggregation across extended time windows. Temporal analysis of the reduced representation reveals coordinated deviations in activity volume and distributional behavior that are not apparent in individual log streams. The results demonstrate that dimensional reduction enables scalable, interpretable analysis of enterprise cyber telemetry. Furthermore, these results were obtained using host-based sensors designed for event-oriented point-defense and demonstrate the feasibility of integrating such sensors to enable long-range, long-duration area defense.

Cross-Aircraft Flight Phase Classification Using ADS-B Data and Transfer Learning

2026-03-20T00:00:00Z

Cross-Aircraft Flight Phase Classification Using ADS-B Data and Transfer Learning Kiefer, Jacob; Alemany, Sheila Flight phase identification (FPI) approaches that apply traditional machine learning techniques are expensive to scale, difficult to generalize across platforms, and frequently unavailable in permissive or distributed training environments. We propose a scalable, data-driven pipeline for automatic FPI using open-source Automatic Dependent Surveillance-Broadcast (ADS-B) data, with an emphasis on cross-aircraft generalization through transfer learning. Leveraging ADS-B telemetry from USAF Initial Flight Training aircraft, a neural network classifier is trained on Diamond DA-20 flight data and evaluated on Texan T-6 aircraft under zero-shot and fine-tuned transfer learning conditions. We describe a robust ADS-B preprocessing pipeline integrating digital elevation model (DEM) data, a data labeling strategy using unsupervised learning, and a transfer learning approach enabling adaptation across aircraft types with limited labeled data. Our results demonstrate that transfer learning significantly improves classification accuracy for flight phases with limited data, highlighting the potential of ADS-B-based models to support scalable, behavior-aware airspace intelligence across heterogeneous fleets and permissive environments. This research advances FPI capabilities for USAF training analysis and broader operational priorities in autonomy, situational awareness, and data-driven decision support.

Neural Networks for Stress Intensity Factor Vertex Prediction

2026-03-20T00:00:00Z

Neural Networks for Stress Intensity Factor Vertex Prediction Hokaj, Ian; Ghanem, Janelle Structural fatigue in aging metallic aircraft is a primary driver of sustainment costs for the U.S. Air Force, significantly impacting fleet readiness. Fatigue life prediction tools like AFGROW depend on interpolating between computationally expensive stress intensity factors (K-solutions) to approximate unknown values. However, interpolation errors in the current approach introduce uncertainty and force overly conservative maintenance schedules. This paper investigates the use of a machine learning surrogate to replace AFGROW’s dimensionreduction interpolation for the finite-width c orner-cracked hole geometry. We developed a robust data processing pipeline for a large FEA dataset and trained a neural network model. Our results reveal a critical insight: the surrogate model offers substantial performance gains over AFGROW’s interpolation in low-data regimes, emphasizing both the potential of the model and its sensitivity to dataset size. For the original, sparse dataset—which is characteristic of computationally expensive problems—the neural network significantly o utperformed the baseline interpolation, reducing the mean absolute percentage error (MAPE) by over 40% (from 2.77% to 1.60%) and achieving an R² value exceeding 0.99. However, experiments on synthetically generated dense datasets showed that the traditional interpolation method becomes more accurate as the data grid becomes less sparse. This study concludes that while neural network surrogates offer a powerful, high-fidelity solution for computationally intensive engineering problems, their adoption should be guided by a careful analysis of data density after dataset has been cleaned of outliers. It also highlights the necessity of employing rigorous, application-relevant validation strategies that move beyond simplistic random splits to accurately assess model performance.

Evaluating Adaptive AI for Contracting Officer Readiness: Design and Pedagogical Proposal for the Warrant Board RAG Chatbot

2026-03-20T00:00:00Z

Evaluating Adaptive AI for Contracting Officer Readiness: Design and Pedagogical Proposal for the Warrant Board RAG Chatbot Mullen, Julia; Grosvenor, Sarah The United States Air Force (USAF) requires a sustained and expanding pool of warranted Contracting Officers (COs) to meet growing operational and fiscal demands across its global enterprise. The authority to obligate funds and bind the government contractually—granted through the issuance of a warrant—requires successful completion of a multi-stage evaluation process culminating in a scenario-based oral board. This final interview assesses a candidate’s ability to interpret and apply acquisition policy under complex and ambiguous conditions. This paper proposes the design of an adaptive artificial intelligence (AI) training system—the Warrant Board Retrieval- Augmented Generation (RAG) Chatbot—to serve as a simulated board-preparation environment. This chatbot inverts the common ’user question, AI answer’ model, and instead has the chatbot ask the learner a series of critical thinking scenariobased questions. The prototype design adopts a model-agnostic LLM gateway capable of operation through either commercial APIs (e.g., OpenAI) or secure, government-hosted environments such as GenAI.mil, ensuring accessibility within unclassified Air Force networks. This research contributes to the emerging field of AI-assisted professional education by developing a transparent, auditable, and pedagogically grounded framework for formative learning in acquisition training.

AgentNexus: Accelerating AI Agent Development and Enhancing Interoperability with MCP

2026-02-17T00:00:00Z

AgentNexus: Accelerating AI Agent Development and Enhancing Interoperability with MCP Yae, Jung; Hamilton, Lei The DoD faces significant challenges in its pursuit of AI superiority, as disparate data and development platforms create redundant efforts and limit interoperability. Additionally, existing DoD systems are ill-equipped to handle the recent paradigm shift toward agentic AI, which requires modern standards and tools. To address these gaps, this paper introduces AgentNexus, an application designed to streamline the development, deployment, and servicing of AI agents. AgentNexus provides an application featuring an advanced agents processing backend, a scalable service layer, and an intuitive user interface. It provides pre-built toolkits, sophisticated RAG pipeline, and MCP for enhanced interoperability. The successful development of an Education Assistant agent validates the application’s capacity to support the rapid implementation of multi-agent workflows. By fostering a collaborative and standardized environment, AgentNexus mitigates critical barriers of interoperability and duplicated effort, accelerating the delivery of multi-agent AI to warfighters.

Intelligent C-17 Load Planning for Flight Optimization

2026-02-17T00:00:00Z

Intelligent C-17 Load Planning for Flight Optimization McAlister, Catherine; Jones, Mathew; McConville, Sean C-17 Globemaster III cargo capacity is significantly underutilized, with many sorties transporting only a few pallets despite the aircraft’s 170,900-pound payload capability. Historical flight data analysis reveals inefficient scheduling practices that increase operational costs, crew workload, and overall negatively effect mission capability. This paper details the development of an AI-powered optimization model to improve C-17 cargo utilization and reduce required flight operations. We analyzed historical C-17 transportation data and created both traditional optimization algorithms and predictive AI models to determine optimal flight scheduling for 3-week operational periods. The AI model achieved 97.9% accuracy in predicting optimal flight count requirements and 89.3% accuracy in predicting optimal flight assignment for specific cargo, representing a 23% reduction in total flights and a 15% increase in average cargo utilization. These results demonstrate that data-driven flight scheduling can significantly improve C-17 operational efficiency, reduce costs across the airlift community, and enabling additional time towards advanced training, contingency support, and critical warfighter operations, ultimately increasing the lethality and readiness of the Department of Defense.

Securing Intelligence: The Strategic Necessity of Air-Gapped AI Systems in the Age of Cloud-Based LLMs

2026-02-17T00:00:00Z

Securing Intelligence: The Strategic Necessity of Air-Gapped AI Systems in the Age of Cloud-Based LLMs Viggh, Herbert; Tsagaratos, Jennifer The increasing use of large language models (LLMs) in applications, from military strategy to customer service, raises concerns about data sovereignty, security, and privacy. Cloudbased API models, created by companies such as OpenAI, pose significant risks due to training data exposure and prompt injection attacks, which can compromise sensitive information and hidden biases that could influence reporting or executive decision-making processes. Real-world incidents, such as the leakage of Samsung’s proprietary source code through ChatGPT, highlight the dangers of relying on cloud providers with complete visibility into client queries. Furthermore, data localization laws and regulations, such as the General Data Protection Regulation (GDPR), underscore the risks associated with outsourcing intelligence and decision support systems to foreign entities. Airgapped AI solutions, which run on isolated networks disconnected from the outside world, offer a secure alternative for sensitive environments such as national defense, research laboratories, and critical infrastructure. By maintaining control over AI processes, organizations can ensure information safety, comply with regulations, and mitigate risks associated with cloud-based AI infrastructure, ultimately safeguarding their data integrity, privacy, and operational independence.

RAIMOND Requirements AI for Military Operational Needs Development

2026-02-17T00:00:00Z

RAIMOND Requirements AI for Military Operational Needs Development Garcia, Fabio; Steilberg, Jackson The Joint Capabilities Integration and Development System (JCIDS) was created as a means to overhaul military procurement processes. Ideally, the requirements development process is meant to take a total of 2-4 years from concept to manufacturing. However the actual length of concept development is much longer. As a result, technologies that are conceptualized through the analytical process often enter the acquisition too late to need for the warrior. To reduce the lengthy timeline in requirements development, we used Large Language Models (LLMs) to conduct the necessary research and synthesize documents that abide by strict JCIDS guidelines. Prompt engineering can achieve these results as a proof of concept. However, the output responses lack the content length and depth necessary to pass through the requirements validation process. Therefore, a combination of agentic workflows, prompt engineering, and sufficient context is needed to achieve the desired outcomes. This project utilizes a novel framework to derive Capabilities Based Assessments (CBAs) at an approximate 80 percent readiness level requiring the final steps of validation and verification by subject matter experts.

Machine Learning for the Enhancement of Adaptive Optics

2026-02-17T00:00:00Z

Machine Learning for the Enhancement of Adaptive Optics Hall, Robert; Chen, Justin Optical systems (telescopes, lasers, microscopes, etc.) have degraded performance over long distances due to scintillation caused by Earth’s atmosphere, where adaptive optics (AO) is often used to enhance its signal-to-noise (SNR) ratio or image quality. Astronomers have found success in laser-based adaptive optics where they survey the atmosphere with a laser and subtract its effects on the resultant image. Although effective in most cases, these systems can be extremely costly, are computationally intensive in real time, and fall short in some edge cases. We propose an autoencoder/ decoder and a generalized sequence to sequence model (LSTM) as a cost-effective method to off-load computational complexity from real time and enhance performance in edge cases. This study utilizes four simulated datasets of wavefront sensor frames for a variety of atmospheric conditions, done in collaboration with MIT Lincoln Laboratory [1]–found auto-encoding performance just shy of traditional methodology and found LSTM performance that predicts well the general shape on the WFS, but suffers from scaling issues.

From Hype to Reality: Real-World Lessons and Recommendations for AI in Military Applications

2026-02-17T00:00:00Z

From Hype to Reality: Real-World Lessons and Recommendations for AI in Military Applications Lynch, Joshua; Niss, Laura The current use cases, limitations, and future capacity of large language models (LLMs) as assistants to military personnel remain an open question. This paper presents a case study of an Airman’s interaction with and trust calibration of LLMs over three months, both as an everyday assistant and for development of ROMAD-AI, a tactical military application. Through intuitive, AI-generated software development, an approach relying on iterative code generation through natural language prompting of LLMs from a technical novice rather than human generated programming from a technical expert, the research reveals significant gaps between industry curated AI capability demonstrations and operational reality, requiring systematic trust calibration and realistic scope management. Outcomes are analyzed through operational and technical expertise perspectives to provide practical guidance for both military service members seeking effective AI integration and researchers developing military-focused AI systems.

Large Language Models and Defense Strategy: Escalation Risks and National Security Challenges

2026-02-17T00:00:00Z

Large Language Models and Defense Strategy: Escalation Risks and National Security Challenges Hou, Jonathan; Lax, Edwin This literature review examines the strategic vulnerabilities posed by Large Language Models (LLMs) in military and national security contexts. It synthesizes recent research on their propensity for escalatory reasoning, cultural misalignment, semantic manipulation, and dual-use ambiguity. Findings from conflict s imulations a nd c oalition p lanning m odels reveal how LLMs may default to aggressive or biased outputs under ambiguity. These tendencies threaten alliance cohesion, distort decision-making, and undermine trust in AI-enabled operations. The review concludes by advocating for safeguards such as culturally calibrated training, rigorous output verification, a nd the integration of human-AI intermediaries to prevent destabilizing outcomes.

Synchronization-Aware Diffusion Models for Intra-Family RF Signal Classification

2026-02-17T00:00:00Z

Synchronization-Aware Diffusion Models for Intra-Family RF Signal Classification Hayden, Hunter; Botero, Joey Classification of radio frequency (RF) signals in the presence of channel-induced synchronization errors remains a critical challenge in spectrum awareness systems. Traditional classification pipelines generally rely on fixed synchronization algorithms or assume aligned signals, which limits robustness under real world timing, phase, and frequency distortions. We introduce SyncDiff, a novel encoder-only diffusion model architecture that predicts synchronization parameters through iterative denoising steps prior to classification. By replacing conventional synchronization algorithms with a learned datadriven correction mechanism, our approach enables adaptive signal alignment based on current channel distortions in unsynchronized input data. SyncDiff employs a UNet based encoder to refine synchronization parameters across multiple inference steps, dynamically reducing channel-induced alignment errors while preserving the inherit modulation specific characteristics that allow these signals to be discriminable. Evaluations of the RadioML2018 RF standard benchmark data set [1] demonstrates improved classification accuracy across varying SNRs, modulation schemes and synchronization impairments. Our findings highlight the potential of diffusion-based synchronization learning to improve downstream RF classification without reliance on expert-engineered synchronization routines.

ML Prediction Models to Identify Novel Beyond Visual Range Tactics and Error Analysis for DARPA AIR Agents

2026-02-12T00:00:00Z

ML Prediction Models to Identify Novel Beyond Visual Range Tactics and Error Analysis for DARPA AIR Agents Li, William; Castor, Jeremy This paper investigates the utility of using machine learning models to predict the outcome of simulated 2 vs. 2 Tactical Intercept engagements flown by autonomous agents in support of the DARPA Artificial Intelligence Reinforcements (AIR) program. We investigated the performance of four models: Feed Forward Neural Network, Random Forest, Extreme Gradient Boost, and Long Short Term Memory (LSTM). We examined their ability to successfully predict the outcomes of simulated engagements, tactical errors, and the execution of novel game plans by autonomous agents. The models were trained on 53 features pertaining to the agents including distance between aircraft, altitude, speed, missile availability, and other eventbased features from simulated runs. The LSTM model had the best performance towards the beginning of a run and was able to predict the correct winner with 87.8% accuracy only one minute into a run while the XGBoost model achieved the best overall performance with a 91.7% classification accuracy and an R² of 0.712. The XGB model was also able to correctly predict the winner of 84.7% of the runs after only seven minutes into the simulated engagement. These results demonstrate the utility and need for further investigation into other ML models potential to identify unique attributes and predictive analysis of more complex multi-agent scenarios that include additional criteria such as varying rules of engagement, incorporating acceptable levels of risk as well as other requirements fighter pilots must take into account during offensive and defensive operations needed to gain air superiority and support the objectives of the Joint Forces Commander.

Artificial Intelligence for Tactical Network Troubleshooting

2025-09-12T00:00:00Z

Artificial Intelligence for Tactical Network Troubleshooting Jaimes, Rafael; Mendez, Maximillian The tactical network is a key component of most United States Marine Corps missions. It is critical to expeditiously stand up a robust communications architecture for both voice and data transmissions across a variety of classification levels. However, when there are unforeseen or induced faults in network configurations, the establishment time can increase by hours if not days. The research described in this report sought to determine if a large language model (LLM), when provided the correct baseline network configurations, would be able to identify errors in active working network configurations and reduce network establishment time. A/B testing was conducted to see whether teams assisted by artificial intelligence (AI) or control teams with no AI assistance could establish the network faster. The LLM hosted by NIPRGPT decreased the establishment time by 50 percent (p <0.05) compared to warfighters unaided by AI. The results conclude that AI agents such as LLMs can be useful in providing commanders with a course of action to establish command, control, communications, and computers (C4) faster. The Department of the Air Force Artificial Intelligence Accelerator

A KNOWLEDGE GRAPH IS ALL YOU NEED

2025-09-10T00:00:00Z

A KNOWLEDGE GRAPH IS ALL YOU NEED Streilen, William; Brooks, Nicholas; Burill, Daniel; Smith, Corey The Department of the Air Force (DAF) faces unique challenges in adopting Large Language Models (LLMs). Commercially available models often lack the domain-specific knowledge necessary to support airmen, as this information is not inherently embedded. To maintain a competitive edge, the integration of LLMs to improve efficiency and decision making is a critical priority. This presentation explores two innovative methodologies designed to better integrate domain-specific knowledge into language models and improve the discovery of relevant information. The first is EntiGraph Continuous Pretraining, which leverages continuous training to embed specialized knowledge into language models. The second is the GFM-RAG Graph RAG Framework, a novel approach to knowledge retrieval and synthesis that enhances model performance by improving multi-hop retrieval and complex information connections. Through both quantitative and qualitative evaluations, we assess their impact on retrieval accuracy and response relevance. Our findings demonstrate the potential of these customized approaches to streamline information access, improve decision making, and better support the operational needs of the DAF.

LLM-Based Entity Extraction for Cyber Threat Reports

2025-09-10T00:00:00Z

LLM-Based Entity Extraction for Cyber Threat Reports Alperin, Kenneth; de Silva, Alexis As the cyber threat landscape and capabilities of advance persistent threats continue to expand, applying cutting-edge technology to the domain of cyber intelligence is necessary for the United States Space Force to keep pace in the Great Power Competition. Cyber intelligence analysts spend an estimated time of nearly 840 man-hours annually on the extraction and validation of relevant intelligence from cyber threat reports (CTRs). Named entity recognition (NER) is a natural language processing technique capable of automatically extracting and labeling all relevant information from a given text. Although not a novel idea, this paper aims to expand the current but limited research on the applications of NER to the domain of cyber intelligence. This study uses a new openly-licensed dataset, AnnoCTR, to finetune a cybersecurity-specific, transformers-based model, CYBERT. The performance of the model is compared to the models from the derived literature. Although the results showed an F1 score of 0.733 – a less optimal performance compared to previous models – there is still more work to explore to reduce the production time of intelligence analysis by half.

Democratizing Data: An Intelligent Querying System for Marine Corps Data

2025-09-10T00:00:00Z

Democratizing Data: An Intelligent Querying System for Marine Corps Data Johnson, Lane; Nam, Kevin This research presents the development and implementation of a text-to-Structured Query Language (SQL) system tailored for Marine Corps logistics, capitalizing upon the proven capabilities of Large Language Models (LLMs). By fine-tuning an open-source LLM on a curated Global Combat Support System - Marine Corps supply and maintenance dataset, we demonstrate how non-technical users can intuitively interact with Marine Corps data through natural language queries, enhancing data accessibility and operational decision-making. Our approach assumes a resource-constrained environment, demonstrating that fine-tuning and deploying the model on a single NVIDIA A100 graphics processing unit (GPU) is not only feasible, but also highlights the potential for local or edgebased artificial intelligence (AI) solutions. We further identify the critical importance of high-quality, representative datasets and propose a hybrid approach combining prompt engineering with fine-tuning to improve performance. Our findings culminate in concrete recommendations for the Marine Corps regarding data governance, AI integration, and workforce development.

Pixels to Places: Improving Zero-Shot Image Geolocalization using Prior Knowledge

2025-09-10T00:00:00Z

Pixels to Places: Improving Zero-Shot Image Geolocalization using Prior Knowledge Cha, Miriam; Borg, Trent The ability to predict the geographic origin of a photo is critical for open-source investigation applications. However, image geolocalization is highly challenging due to the vast diversity of images captured worldwide. While vision transformer-based approaches have demonstrated success— even outperforming grandmasters in geolocation games like GeoGuessr—their performance does not generalize well to unseen locations. Prior methods rely solely on visual cues, neglecting broader contextual knowledge that image analysts typically employ. To bridge this gap, our research integrates the contextual understanding of geographic regions that imagery analysts possess into the geolocalization model. Specifically, we develop a variant of StreetCLIP, which embeds CLIP within geolocalization tasks and facilitates the incorporation of user-supplied prior knowledge such as continental or national boundaries. Our results on the IM2GPS3K benchmark dataset demonstrate a 10.66% improvement in regional prediction (within 200 km) and a 15.27% improvement in country-level prediction (within 750 km) over baseline models. Our results suggest that humanprovided supervision can enhance image geolocalization accuracy, highlighting the potential of interactive systems where human expertise and AI work collaboratively to refine predictions. Index Terms—image geolocalization, CLIP, human-machine teaming, vision transformers

Improved Automatic Electronic Intelligence Collection System for Internal and External Forward Fusion and Collaborative Geolocation of Adversary Emitters

2025-09-10T00:00:00Z

Improved Automatic Electronic Intelligence Collection System for Internal and External Forward Fusion and Collaborative Geolocation of Adversary Emitters Botero, Joey; Benge, Arianne; Heisey, Curtis With the 2022 National Defense Strategy shifting focus from counterinsurgency operations to near-peer adversaries, airborne ISR platforms within the USAF and DoD must be improved for effectiveness in a near-peer conflict. They need to be able to operate quickly and effectively in contested environments with longer-range threats, act as a forward edge intelligence node for blue forces and provide DoD Research and Development efforts with cutting-edge data regarding new adversary signals and technology. To aid in tackling these challenges, this project introduces a Machine Learning (ML)-driven approach that revamps the Automatic Electronic Intelligence Collection System (ACS) on U.S. Airborne ISR platforms in four ways: First, by providing nodal analysis to the user in real time by automatically aggregating existing data across the aircraft to the user for decreased operator cognitive load. Second, increasing internal aircraft database information with external intelligence database information to increase confidence in targeting. Third, by providing automatic signal anomaly detection to the operator utilizing a support vector machines algorithm that cues operators to potential signals of interest based on previous activity and pattern of life prediction. Lastly, by providing better surface against airborne identification through utilization of cone angle to the system to help operators with faster threat warning and situational awareness of the environment. Findings include Support Vector Machines being the most effective tested binary classifier for predicting single signal anomaly detection at 84% AUC and a rule-based method of averages successfully classifying 1089 surface versus air ELINT samples with a success rate of 89% compared to other tested methods, such as Gaussian Mixture Models at 68% and KNearest Neighbor at 66%.

Artificial Intelligence for Derivative Security Classification: Applications to DoD

2025-09-10T00:00:00Z

Artificial Intelligence for Derivative Security Classification: Applications to DoD Gelbard, Andrew; Hamilton, Lei The accurate classification of government documents according to their sensitivity (e.g., UNCLASSIFIED, SECRET, TOP SECRET) is critical for national security, yet historically has relied on time-intensive manual review. The current manual classification process consumes millions of labor hours annually within the U.S. government, significantly diverting skilled personnel from essential analytical tasks. This research explores automating this security classification task using recently available declassified materials from the DISC dataset [1], addressing practical challenges such as noisy Optical Character Recognition (OCR) output, imbalanced data distributions, and potential leakage of explicit classification markers within document text. This dataset contains declassified government documents sourced from the Digital National Security Archive, providing authentic textual examples representative of actual classification scenarios. We evaluate both traditional machine learning approaches and advanced transformerbased language models to classify documents accurately across multiple sensitivity levels. Our results highlight that transformer-based models, particularly DeBERTa, effectively improve identification of the minority but critical TOP SECRET class, achieving recall over 70% and an overall balanced performance (macro F1 score of 0.75), while traditional methods exhibit similar overall accuracy but struggle with minority class recall. Despite promising findings, we caution that conclusions drawn here remain constrained by limited training data size and inherent uncertainties in human-labeled documents. We emphasize the need for larger, rigorously preprocessed datasets and suggest future research integrating authoritative classification guidelines directly into model training, potentially via retrieval-augmented methods. This work thus contributes a foundational, reproducible framework that demonstrates significant potential for machine-assisted security classification, guiding future research and practical applications in the information security domain.

The Area-of-Measurable-Performance (AOMP) Method Standard as a Foundational Archetype for the Cyclical Enhancement of the State of the Art Joint Simulation Environment (JSE) Technology

2025-09-10T00:00:00Z

The Area-of-Measurable-Performance (AOMP) Method Standard as a Foundational Archetype for the Cyclical Enhancement of the State of the Art Joint Simulation Environment (JSE) Technology Li, William; Johnson, Kevin; Picardo, Christopher; Ambion, Francis The Department of the Air Force (DAF) envisions the need to incorporate Artificial Intelligence and Machine Learning (AI/ML) models into novel systems it develops for the purpose of enhancing them to meet its primary goal of maintaining total air superiority [2]. There is currently a need for developing a standard process for the design of successful AI/ML models capable of enhancing the novel systems the DAF develops. In this white paper we introduce the Area of Measurable Performance (AOMP) Method Standard and apply it to the Joint Simulation Environment (JSE) Technology, a state of the art system of systems under test, to identify AOMPS and their modular requirements [3] and metrics that lead to the accurate characterization of modular AI/ML models through a process that offers a high degree of trust and reuse, resulting in a method standard that organically promotes the development of successful modular AI/ML models for use in the performance improvement of the JSE technology or other system of system(s) [4] under test.