LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis
Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan

TL;DR
LogSieve is a lightweight, semantics-preserving log reduction technique for CI logs that significantly reduces data volume, lowers computational costs, and enhances sustainability while maintaining high relevance for downstream analysis.
Contribution
We introduce LogSieve, a novel log reduction method tailored for CI logs that preserves semantic content and improves efficiency compared to existing approaches.
Findings
Achieves 42% line reduction and 40% token reduction with minimal semantic loss.
Preserves high semantic fidelity with cosine similarity of 0.93.
Automates relevance detection with 97% accuracy.
Abstract
Logs are essential for understanding Continuous Integration (CI) behavior, particularly for diagnosing build failures and performance regressions. Yet their growing volume and verbosity make both manual inspection and automated analysis increasingly costly, time-consuming, and environmentally costly. While prior work has explored log compression, anomaly detection, and LLM-based log analysis, most efforts target structured system logs rather than the unstructured, noisy, and verbose logs typical of CI workflows. We present LogSieve, a lightweight, RCA-aware and semantics-preserving log reduction technique that filters low-information lines while retaining content relevant to downstream reasoning. Evaluated on CI logs from 20 open-source Android projects using GitHub Actions, LogSieve achieves an average 42% reduction in lines and 40% reduction in tokens with minimal semantic loss.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software Engineering Research
