Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method
Chris Forrester, Octavia Sulea

TL;DR
This paper introduces a novel token compression method for NLP that significantly reduces prompt size by over 90% while maintaining semantic integrity, applicable across various texts and models.
Contribution
The paper presents a new semantic compression scheme based on hypernym-based token optimization, enabling lossless text reduction with controllable detail granularity.
Findings
Achieves over 90% token reduction in prompts
Maintains high semantic similarity after compression
Effective across multiple genres and language models
Abstract
Compute optimization using token reduction of LLM prompts is an emerging task in the fields of NLP and next generation, agentic AI. In this white paper, we introduce a novel (patent pending) text representation scheme and a first-of-its-kind word-level semantic compression of paragraphs that can lead to over 90% token reduction, while retaining high semantic similarity to the source text. We explain how this novel compression technique can be lossless and how the detail granularity is controllable. We discuss benchmark results over open source data (i.e. Bram Stoker's Dracula available through Project Gutenberg) and show how our results hold at the paragraph level, across multiple genres and models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
