Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method

Chris Forrester; Octavia Sulea

arXiv:2505.08058·cs.CL·May 16, 2025

Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method

Chris Forrester, Octavia Sulea

PDF

Open Access

TL;DR

This paper introduces a novel token compression method for NLP that significantly reduces prompt size by over 90% while maintaining semantic integrity, applicable across various texts and models.

Contribution

The paper presents a new semantic compression scheme based on hypernym-based token optimization, enabling lossless text reduction with controllable detail granularity.

Findings

01

Achieves over 90% token reduction in prompts

02

Maintains high semantic similarity after compression

03

Effective across multiple genres and language models

Abstract

Compute optimization using token reduction of LLM prompts is an emerging task in the fields of NLP and next generation, agentic AI. In this white paper, we introduce a novel (patent pending) text representation scheme and a first-of-its-kind word-level semantic compression of paragraphs that can lead to over 90% token reduction, while retaining high semantic similarity to the source text. We explain how this novel compression technique can be lossless and how the detail granularity is controllable. We discuss benchmark results over open source data (i.e. Bram Stoker's Dracula available through Project Gutenberg) and show how our results hold at the paragraph level, across multiple genres and models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques