Measuring Information Distortion in Hierarchical Ultra long Novel Reconstruction:The Optimal Expansion Ratio
Hanwen Shen, Ting Ying

TL;DR
This paper analyzes how different compression-expansion ratios affect semantic preservation in ultra-long novel reconstruction, proposing an information-theoretic approach to optimize the process.
Contribution
It introduces an information-theoretic framework to quantify semantic distortion in hierarchical novel reconstruction and identifies optimal ratios for minimal information loss.
Findings
Optimal compression-expansion ratio reduces semantic distortion
Outline length impacts information preservation
Experiments validate the effectiveness of the proposed ratio
Abstract
A two stage novel generation framework (outline -> section outline -> manuscript) is widely used in long novel generation,(e.g., \textsc{DOME}, \textsc{Plan\&Write}, \textsc{Long Writer}), but study of such framework in ultra long novel(>1M words) reconstruction is little. Building on recent text compression methods (\textsc{LLMZip}, \textsc{LLM2Vec}), we conduct an information-theoretic analysis to quantify semantic distortion under different compression-expansion ratios. We examine how outline length affects information preservation. Experiments on ultra-long novels show that the optimal compression-expansion ratio significantly reduces semantic distortion compared to other non-optimal compression-expansion ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
