PUCP-Metrix: An Open-source and Comprehensive Toolkit for Linguistic Analysis of Spanish Texts
Javier Alonso Villegas Luis, Marco Antonio Sobrevilla Cabezudo

TL;DR
PUCP-Metrix is an open-source toolkit providing 182 linguistic metrics for detailed analysis of Spanish texts, enhancing interpretability and supporting various NLP tasks.
Contribution
It introduces a comprehensive, extensible set of linguistic features for Spanish, filling gaps in existing tools and demonstrating effectiveness in readability and text detection tasks.
Findings
Competitive performance in readability assessment
Strong results in machine-generated text detection
Extensible framework for diverse NLP applications
Abstract
Linguistic features remain essential for interpretability and tasks that involve style, structure, and readability, but existing Spanish tools offer limited coverage. We present PUCP-Metrix, an open-source and comprehensive toolkit for linguistic analysis of Spanish texts. PUCP-Metrix includes 182 linguistic metrics spanning lexical diversity, syntactic and semantic complexity, cohesion, psycholinguistics, and readability. It enables fine-grained, interpretable text analysis. We evaluate its usefulness on Automated Readability Assessment and Machine-Generated Text Detection, showing competitive performance compared to an existing repository and strong neural baselines. PUCP-Metrix offers a comprehensive and extensible resource for Spanish, supporting diverse NLP applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Authorship Attribution and Profiling
