ECR: Manifold-Guided Semantic Cues for Compact Language Models

Chung-Wei Victor Yuan

arXiv:2601.00543·cs.CL·January 5, 2026

ECR: Manifold-Guided Semantic Cues for Compact Language Models

Chung-Wei Victor Yuan

PDF

Open Access

TL;DR

This paper introduces Embedding Consistency Regulation (ECR), a novel framework that preserves the semantic manifold structure in compact multilingual models, improving their stability, task alignment, and efficiency without altering architecture.

Contribution

ECR is a new method that maintains the geometric structure of embeddings in compact models, independent of distillation, enhancing their semantic fidelity and task performance.

Findings

01

ECR stabilizes training across multilingual tasks.

02

ECR produces more task-aligned and compact representations.

03

ECR improves semantic structure preservation in low-capacity models.

Abstract

Compact models often lose the structure of their embedding space. The issue shows up when the capacity is tight or the data spans several languages. Such collapse makes it difficult for downstream tasks to build on the resulting representation. Existing compression methods focus on aligning model outputs at a superficial level but fail to preserve the underlying manifold structure. This mismatch often leads to semantic drift in the compact model, causing both task behavior and linguistic properties to deviate from the reference model. To address those issues, we provide a new framework called Embedding Consistency Regulation (ECR). This framework first derives a set of semantic anchors from teacher embeddings (computed once offline). Then, the compact model learns to maintain consistent geometry around these anchors, without relying on matching logits or internal features. ECR adds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications