DWTSumm: Discrete Wavelet Transform for Document Summarization
Rana Salama, Abdou Youssef, Mona Diab

TL;DR
This paper introduces a DWT-based multi-resolution framework that improves domain-specific document summarization with LLMs by preserving semantics and reducing hallucinations.
Contribution
The paper presents a novel DWT-based approach that decomposes text embeddings into global and local components, enhancing summarization quality and factual grounding in domain-specific documents.
Findings
DWT-based summaries achieve comparable ROUGE-L scores to baselines.
Semantic similarity and factual grounding improve over 2% and 4% respectively.
Fidelity reaches up to 97%, indicating reduced hallucinations.
Abstract
Summarizing long, domain-specific documents with large language models (LLMs) remains challenging due to context limitations, information loss, and hallucinations, particularly in clinical and legal settings. We propose a Discrete Wavelet Transform (DWT)-based multi-resolution framework that treats text as a semantic signal and decomposes it into global (approximation) and local (detail) components. Applied to sentence- or word-level embeddings, DWT yields compact representations that preserve overall structure and critical domain-specific details, which are used directly as summaries or to guide LLM generation. Experiments on clinical and legal benchmarks demonstrate comparable ROUGE-L scores. Compared to a GPT-4o baseline, the DWT based summarization consistently improve semantic similarity and grounding, achieving gains of over 2% in BERTScore, more than 4\% in Semantic Fidelity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
