A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Adam M. Morgan; Adeen Flinker

arXiv:2507.22187·cs.CL·July 31, 2025

A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models

Adam M. Morgan, Adeen Flinker

PDF

TL;DR

This paper introduces a scalable, automated pipeline leveraging large language models to estimate verb frame frequencies, improving accuracy and resource efficiency over traditional methods, and producing a comprehensive, customizable database for linguistic analysis.

Contribution

The authors develop a novel LLM-based pipeline for estimating verb frame frequencies that surpasses existing tools in scale, accuracy, and resource efficiency.

Findings

01

Outperforms traditional syntactic parsers on multiple datasets.

02

Requires fewer resources than manual annotation.

03

Produces a comprehensive, fine-grained VFF database.

Abstract

We present an automated pipeline for estimating Verb Frame Frequencies (VFFs), the frequency with which a verb appears in particular syntactic frames. VFFs provide a powerful window into syntax in both human and machine language systems, but existing tools for calculating them are limited in scale, accuracy, or accessibility. We use large language models (LLMs) to generate a corpus of sentences containing 476 English verbs. Next, by instructing an LLM to behave like an expert linguist, we had it analyze the syntactic structure of the sentences in this corpus. This pipeline outperforms two widely used syntactic parsers across multiple evaluation datasets. Furthermore, it requires far fewer resources than manual parsing (the gold-standard), thereby enabling rapid, scalable VFF estimation. Using the LLM parser, we produce a new VFF database with broader verb coverage, finer-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.