SecEncoder: Logs are All You Need in Security
Muhammed Fatih Bulut, Yingqi Liu, Naveed Ahmad, Maximilian Turner,, Sami Ait Ouahmane, Cameron Andrews, Lloyd Greenwald

TL;DR
SecEncoder is a specialized small language model pretrained on security logs that outperforms general models on security-specific tasks and even some natural language tasks, demonstrating the value of domain-specific pretraining.
Contribution
The paper introduces SecEncoder, a security log pretrained language model that improves performance on security and natural language tasks compared to general models.
Findings
SecEncoder outperforms BERTlarge, DeBERTa-v3-large, and textembedding-ada-002 on security tasks.
Pretraining on logs enhances performance on incident prioritization and threat intelligence retrieval.
Domain-specific pretraining with logs benefits a range of security-related NLP applications.
Abstract
Large and Small Language Models (LMs) are typically pretrained using extensive volumes of text, which are sourced from publicly accessible platforms such as Wikipedia, Book Corpus, or through web scraping. These models, due to their exposure to a wide range of language data, exhibit impressive generalization capabilities and can perform a multitude of tasks simultaneously. However, they often fall short when it comes to domain-specific tasks due to their broad training data. This paper introduces SecEncoder, a specialized small language model that is pretrained using security logs. SecEncoder is designed to address the domain-specific limitations of general LMs by focusing on the unique language and patterns found in security logs. Experimental results indicate that SecEncoder outperforms other LMs, such as BERTlarge, DeBERTa-v3-large and OpenAI's Embedding (textembedding-ada-002)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Information and Cyber Security
