The Case for a Structured Approach to Managing Unstructured Data
AnHai Doan (Univ of Wisconsin), Jeff Naughton (Wisconsin), Akanksha, Baid (Wisconsin), Xiaoyong Chai (Wisconsin), Fei Chen (Wisconsin), Ting Chen, (Wisconsin), Eric Chu (Wisconsin), Pedro DeRose (Wisconsin), Byron Gao, (Wisconsin), Chaitanya Gokhale (Wisconsin)

TL;DR
This paper advocates for a structured approach to managing unstructured data, highlighting its importance and potential to leverage lessons from relational data management to improve handling of non-relational data.
Contribution
It proposes a structured framework for unstructured data management inspired by relational data lessons, aiming to prevent industry players from dominating this space.
Findings
Structured approach can enhance unstructured data management
Lessons from relational data are applicable to unstructured data
Potential to influence management of other non-relational data types
Abstract
The challenge of managing unstructured data represents perhaps the largest data management opportunity for our community since managing relational data. And yet we are risking letting this opportunity go by, ceding the playing field to other players, ranging from communities such as AI, KDD, IR, Web, and Semantic Web, to industrial players such as Google, Yahoo, and Microsoft. In this essay we explore what we can do to improve upon this situation. Drawing on the lessons learned while managing relational data, we outline a structured approach to managing unstructured data. We conclude by discussing the potential implications of this approach to managing other kinds of non-relational data, and to the identify of our field.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
