Controlled Paraphrase Geometry in Sentence Embedding Space: Local Manifold Modeling and Latent Probing
Leonid Bedratyuk

TL;DR
This paper explores the local geometric structure of sentence embeddings induced by controlled paraphrases, proposing models and methods for explicit local manifold modeling and latent space analysis.
Contribution
It introduces nonlinear local geometric models, a surface-based latent probing method, and a new dataset, CoPaGE-300K, for analyzing sentence embedding spaces.
Findings
Nonlinear models better describe embedding clouds than affine models.
Surface-based generation maintains geometric fidelity.
Geometric validity does not necessarily improve classification performance.
Abstract
The paper studies the local geometry of embedding clouds induced by \emph{controlled local classes of semantically close sentences}. The central question is how controlled paraphrase-like semantic variation is organized in sentence embedding space and whether this local structure can be explicitly modeled by low-degree fitted carriers. We introduce a local geometric modeling scheme based on affine, quadratic, and cubic fitted models. We also use a surface-based latent probing procedure that constructs synthetic latent points in a reduced local PCA space with respect to the fitted carrier. The procedure is intended as an offline method for representation-space analysis, local manifold modeling, and geometry-aware latent probing. Generated latent points are evaluated using criteria that measure consistency with the fitted surface, preservation of neighborhood structure, agreement with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
