Agentic Framework for Political Biography Extraction
Yifei Zhu, Songpo Yang, Jiangnan Zhu, Junyan Jiang

TL;DR
This paper introduces an agentic framework leveraging Large Language Models to automate the extraction of structured political biographies from unstructured web sources, improving accuracy and scalability over traditional methods.
Contribution
The paper presents a novel two-stage 'Synthesis-Coding' framework that enhances political biography extraction using recursive LLMs, outperforming human experts and reducing bias.
Findings
LLM coders match or outperform humans in accuracy
Agentic system synthesizes more information than Wikipedia
Synthesis stage reduces bias from long, multi-language texts
Abstract
The production of large-scale political datasets typically demands extracting structured facts from vast piles of unstructured documents or web sources, a task that traditionally relies on expensive human experts and remains prohibitively difficult to automate at scale. In this paper, we leverage Large Language Models (LLMs) to automate the extraction of multi-dimensional elite biographies, addressing a long-standing bottleneck in political science research. We propose a two-stage ``Synthesis-Coding'' framework for complex extraction task: an upstream synthesis stage that uses recursive agentic LLMs to search, filter, and curate biography from heterogeneous web sources, followed by a downstream coding stage that maps curated biography into structured dataframes. We validate this framework through three primary results. First, we demonstrate that, when given curated contexts, LLM coders…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Misinformation and Its Impacts · Wikis in Education and Collaboration
