A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models
Adam M. Morgan, Adeen Flinker

TL;DR
This paper introduces a scalable, automated pipeline leveraging large language models to estimate verb frame frequencies, improving accuracy and resource efficiency over traditional methods, and producing a comprehensive, customizable database for linguistic analysis.
Contribution
The authors develop a novel LLM-based pipeline for estimating verb frame frequencies that surpasses existing tools in scale, accuracy, and resource efficiency.
Findings
Outperforms traditional syntactic parsers on multiple datasets.
Requires fewer resources than manual annotation.
Produces a comprehensive, fine-grained VFF database.
Abstract
We present an automated pipeline for estimating Verb Frame Frequencies (VFFs), the frequency with which a verb appears in particular syntactic frames. VFFs provide a powerful window into syntax in both human and machine language systems, but existing tools for calculating them are limited in scale, accuracy, or accessibility. We use large language models (LLMs) to generate a corpus of sentences containing 476 English verbs. Next, by instructing an LLM to behave like an expert linguist, we had it analyze the syntactic structure of the sentences in this corpus. This pipeline outperforms two widely used syntactic parsers across multiple evaluation datasets. Furthermore, it requires far fewer resources than manual parsing (the gold-standard), thereby enabling rapid, scalable VFF estimation. Using the LLM parser, we produce a new VFF database with broader verb coverage, finer-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
