NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction

Soyeon Kim; Namhee Kim; Yeonwoo Jeong

arXiv:2505.17125·cs.DB·May 26, 2025

NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction

Soyeon Kim, Namhee Kim, Yeonwoo Jeong

PDF

TL;DR

This paper introduces a comprehensive evaluation framework for web data record extraction, enabling fair comparison of traditional algorithms and LLM-based methods across diverse datasets with improved metrics and input formats.

Contribution

It presents a new evaluation framework with dataset generation, annotation, and structure-aware metrics, along with preprocessing strategies and a synthetic dataset for benchmarking extraction methods.

Findings

01

LLM with Flat JSON input achieves F1 score of 0.9567

02

Flat JSON input reduces hallucination in LLM extractions

03

Benchmarking shows LLMs outperform traditional algorithms with the new framework

Abstract

Effective evaluation of web data record extraction methods is crucial, yet hampered by static, domain-specific benchmarks and opaque scoring practices. This makes fair comparison between traditional algorithmic techniques, which rely on structural heuristics, and Large Language Model (LLM)-based approaches, offering zero-shot extraction across diverse layouts, particularly challenging. To overcome these limitations, we introduce a concrete evaluation framework. Our framework systematically generates evaluation datasets from arbitrary MHTML snapshots, annotates XPath-based supervision labels, and employs structure-aware metrics for consistent scoring, specifically preventing text hallucination and allowing only for the assessment of positional hallucination. It also incorporates preprocessing strategies to optimize input for LLMs while preserving DOM semantics: HTML slimming,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.