CovertComBench: A First Domain-Specific Testbed for LLMs in Wireless Covert Communication
Zhaozhi Liu, Jiaxin Chen, Yuanai Xie, Yuna Jiang, Minrui Xu, Xiao Zhang, Pan Lai, Zan Zhou

TL;DR
This paper introduces CovertComBench, a specialized benchmark for evaluating LLMs in wireless covert communication tasks, highlighting current limitations and suggesting future directions for trustworthy AI systems in security-sensitive applications.
Contribution
We developed CovertComBench, a comprehensive benchmark for assessing LLMs in wireless covert communication, and analyzed their performance across various tasks and security constraints.
Findings
LLMs excel in conceptual and coding tasks with over 80% accuracy.
Performance in mathematical derivations for security is low, between 18% and 55%.
Current LLMs are better as assistants than autonomous solvers for security-critical tasks.
Abstract
The integration of Large Language Models (LLMs) into wireless networks presents significant potential for automating system design. However, unlike conventional throughput maximization, Covert Communication (CC) requires optimizing transmission utility under strict detection-theoretic constraints, such as Kullback-Leibler divergence limits. Existing benchmarks primarily focus on general reasoning or standard communication tasks and do not adequately evaluate the ability of LLMs to satisfy these rigorous security constraints. To address this limitation, we introduce CovertComBench, a unified benchmark designed to assess LLM capabilities across the CC pipeline, encompassing conceptual understanding (MCQs), optimization derivation (ODQs), and code generation (CGQs). Furthermore, we analyze the reliability of automated scoring within a detection-theoretic ``LLM-as-Judge'' framework.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Wireless Signal Modulation Classification · Privacy-Preserving Technologies in Data
