Gram2Vec: An Interpretable Document Vectorizer
Peter Zeng, Hannah Stortz, Eric Sclafani, Alina Shabaeva, Maria Elizabeth Garza, Daniel Greeson, Owen Rambow

TL;DR
Gram2Vec is an interpretable document embedding method that uses grammatical features to represent texts, enabling applications like authorship verification and AI detection with improved explainability and performance.
Contribution
Introduces Gram2Vec, a grammatical style embedding system that offers interpretability and demonstrates its effectiveness in authorship verification and AI detection tasks.
Findings
Gram2Vec outperforms Biber features in AI detection.
Provides explainability in authorship verification.
Offers a higher-dimensional, interpretable document representation.
Abstract
We present Gram2Vec, a grammatical style embedding system that embeds documents into a higher dimensional space by extracting the normalized relative frequencies of grammatical features present in the text. Compared to neural approaches, Gram2Vec offers inherent interpretability based on how the feature vectors are generated. In this paper, we use authorship verification and AI detection as two applications to show how Gram2Vec can be used. For authorship verification, we use the features from Gram2Vec to explain why a pair of documents is by the same or by different authors. We also demonstrate how Gram2Vec features can be used to train a classifier for AI detection, outperforming machine learning models trained on a comparable set of Biber features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
