# Fonduer: Knowledge Base Construction from Richly Formatted Data

**Authors:** Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas,, Philip Levis, Christopher R\'e

arXiv: 1703.05028 · 2018-03-05

## TL;DR

Fonduer is a machine-learning system designed to extract knowledge from richly formatted data, integrating textual, structural, and visual information to improve knowledge base construction significantly.

## Contribution

The paper introduces Fonduer, a novel deep-learning-based KBC system with a new data model and programming interface tailored for richly formatted data, outperforming existing methods.

## Key findings

- Achieves 41 F1 points higher accuracy than state-of-the-art approaches.
- Produces up to 1.87 times more correct entries than public knowledge bases.
- Enables non-experts to build effective KBC systems in 30 minutes.

## Abstract

We focus on knowledge base construction (KBC) from richly formatted data. In contrast to KBC from text or tabular data, KBC from richly formatted data aims to extract relations conveyed jointly via textual, structural, tabular, and visual expressions. We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. Fonduer presents a new data model that accounts for three challenging characteristics of richly formatted data: (1) prevalent document-level relations, (2) multimodality, and (3) data variety. Fonduer uses a new deep-learning model to automatically capture the representation (i.e., features) needed to learn how to extract relations from richly formatted data. Finally, Fonduer provides a new programming model that enables users to convert domain expertise, based on multiple modalities of information, to meaningful signals of supervision for training a KBC system. Fonduer-based KBC systems are in production for a range of use cases, including at a major online retailer. We compare Fonduer against state-of-the-art KBC approaches in four different domains. We show that Fonduer achieves an average improvement of 41 F1 points on the quality of the output knowledge base---and in some cases produces up to 1.87x the number of correct entries---compared to expert-curated public knowledge bases. We also conduct a user study to assess the usability of Fonduer's new programming model. We show that after using Fonduer for only 30 minutes, non-domain experts are able to design KBC systems that achieve on average 23 F1 points higher quality than traditional machine-learning-based KBC approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.05028/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1703.05028/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1703.05028/full.md

---
Source: https://tomesphere.com/paper/1703.05028