TL;DR
TEMPEST introduces a novel transformer-based approach that directly learns from compressed data streams, reducing tokenization complexity and computational costs while maintaining competitive accuracy across various datasets.
Contribution
It presents a new method to leverage compressed file structures for semantic representation learning using transformers, bypassing full decoding.
Findings
Achieves state-of-the-art accuracy on multiple datasets.
Reduces token count and computational resources needed.
Demonstrates broad applicability across data types and coding schemes.
Abstract
Compressed file formats are the corner stone of efficient data storage and transmission, yet their potential for representation learning remains largely underexplored. We introduce TEMPEST (TransformErs froM comPressed rEpreSenTations), a method that exploits the inherent byte-stream structure of compressed files to design an effective tokenization and encoding strategy. By leveraging this compact encoding, a standard transformer can directly learn semantic representations from compressed data streams, bypassing the need for raw byte-level processing or full media decoding. Our proposal substantially reduces the number of tokens required for semantic classification, thereby lowering both computational complexity and memory usage. Through extensive experiments across diverse datasets, coding schemes, and modalities, we show that TEMPEST achieves accuracy competitive wit the…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper’s central idea—treating self-contained codec blocks as tokens—is clean and broadly appealing, because it avoids byte-boundary misalignment while preserving the semantics that matter. 2. Despite its simplicity, the approach delivers strong efficiency: it cuts sequence length and attention cost substantially yet remains competitive in accuracy, especially on data-limited regimes like ESC-50. 3. The small reconstruction head is a practical touch that consistently sharpens the block emb
1. The approach assumes block-structured formats with discoverable boundaries. Not all compression schemes expose clean block markers (even JPEG MCU boundaries require approximation), which may limit generality and complicate deployment outside MP3/Opus/JPEG. 2. TEMPEST trails AST on SC2 and AudioSet at ~65% FLOPs and much shorter sequences; the paper doesn’t explore accuracy–compute scaling laws (e.g., what happens if TEMPEST matches AST’s FLOPs, tokens, or parameters?). 3. Ablations show accu
- Directly embedding features from compressed files instead of raw data avoids the additional storage and transmission overhead introduced by the decoding process, which is a meaningful and practical advantage for real-world applications. - Experimental results demonstrate that TEMPEST can improve efficiency while maintaining competitive performance.
- The method relies heavily on structural characteristics of specific compression formats, which limits its generality. TEMPEST’s core idea is to use compression blocks rather than bytes as token units, which requires the compression format to have explicit and easily parsable minimal independent decoding units. The authors may need to explicitly clarify which compression formats are supported and whether extra engineering adaptation is required for different compression algorithms. - The experi
1. **Novel Conceptual Framework**: The paper introduces TEMPEST, a fundamentally new paradigm to multimedia processing by working directly with compressed formats, e.g., MP3 and JPEG. The authors also incorporate innovative training techniques like bit rate augmentation and multi-bit rate inference that further improve model performance, generalization, and robustness. 2. **Practical Utility**: The demonstrated efficiency gains have clear practical applications in real-world systems with 3x re
1. **Limited Baseline Comparisons:** The current evaluation primarily benchmarks TEMPEST against AST. While AST is a direct and relevant baseline, it was published in 2021, and the field has seen significant advancements since then. To fully contextualize TEMPEST's contributions and thoroughly assess its performance, it would be beneficial to include comparisons with more recent methods in the relevant domain. 2. **Dataset Complexity:** For the image experiment, the evaluation largely relies
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
