Gram2Vec: An Interpretable Document Vectorizer

Peter Zeng; Hannah Stortz; Eric Sclafani; Alina Shabaeva; Maria Elizabeth Garza; Daniel Greeson; Owen Rambow

arXiv:2406.12131·cs.CL·November 27, 2025

Gram2Vec: An Interpretable Document Vectorizer

Peter Zeng, Hannah Stortz, Eric Sclafani, Alina Shabaeva, Maria Elizabeth Garza, Daniel Greeson, Owen Rambow

PDF

Open Access

TL;DR

Gram2Vec is an interpretable document embedding method that uses grammatical features to represent texts, enabling applications like authorship verification and AI detection with improved explainability and performance.

Contribution

Introduces Gram2Vec, a grammatical style embedding system that offers interpretability and demonstrates its effectiveness in authorship verification and AI detection tasks.

Findings

01

Gram2Vec outperforms Biber features in AI detection.

02

Provides explainability in authorship verification.

03

Offers a higher-dimensional, interpretable document representation.

Abstract

We present Gram2Vec, a grammatical style embedding system that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In this paper, we use authorship verification and AI detection as two applications to show how Gram2Vec can be used. For authorship verification, we use the features from Gram2Vec to explain why a pair of documents is by the same or by different authors. We also demonstrate how Gram2Vec features can be used to train a classifier for AI detection, outperforming machine learning models trained on a comparable set of Biber features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques