Generating Titles for Web Tables
Braden Hancock, Hongrae Lee, Cong Yu

TL;DR
This paper introduces a neural sequence-to-sequence model with copy and generation mechanisms to produce high-quality, contextually relevant titles for web tables, outperforming previous selection-based methods.
Contribution
It presents the first application of text generation techniques for web table titles, improving quality and generalizability over prior selection-based approaches.
Findings
Model with both copy and generation mechanisms outperforms simpler models.
Approaches trained on fewer than 10,000 examples achieve near-crowdsourced quality.
Sequence-to-sequence model establishes new state-of-the-art for table title generation.
Abstract
Descriptive titles provide crucial context for interpreting tables that are extracted from web pages and are a key component of table-based web applications. Prior approaches have attempted to produce titles by selecting existing text snippets associated with the table. These approaches, however, are limited by their dependence on suitable titles existing a priori. In our user study, we observe that the relevant information for the title tends to be scattered across the page, and often--more than 80% of the time--does not appear verbatim anywhere in the page. We propose instead the application of a sequence-to-sequence neural network model as a more generalizable means of generating high-quality titles. This is accomplished by extracting many text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Web Data Mining and Analysis
