Transformers for End-to-End InfoSec Tasks: A Feasibility Study
Ethan M. Rudd, Mohammad Saidur Rahman, Philip Tully

TL;DR
This study explores the feasibility of using transformer models for end-to-end cybersecurity tasks involving URLs and PE files, highlighting the importance of tailored training strategies and architectural adaptations for optimal performance.
Contribution
The paper introduces novel end-to-end transformer architectures for InfoSec data formats and proposes a mixed objective training method, demonstrating competitive results with existing benchmarks.
Findings
Auto-regressive pre-training on URLs does not transfer well to classification tasks.
Auxiliary auto-regressive loss improves URL classification performance.
Adaptive span self-attention enables transformers to handle longer byte sequences in PE files.
Abstract
In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experimental settings to determine the ingredients necessary for performant detection models. We show that in contrast to conventional transformers trained on more standard NLP-related tasks, our URL transformer model requires a different training approach to reach high performance levels. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to binary classification of malicious or benign URLs, but 2) that using an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
