Making problems tractable on big data via preprocessing with polylog-size output
Jiannan Yang, Hanpin Wang, and Yongzhi Cao

TL;DR
This paper introduces a new tractability framework, $oxplus'$-tractability, for big data queries with polylog-size preprocessing outputs, refining previous theories to better align with practical data processing constraints.
Contribution
It proposes a novel $oxplus'$-tractability concept restricting preprocessing to produce polylog-size outputs, and analyzes its theoretical properties and relationship to existing complexity classes.
Findings
All PTIME Boolean queries can be made $oxplus'$-tractable.
The set of $oxplus'$-tractable queries is strictly smaller than $oxplus$-tractable queries.
$oxplus'$-tractability defines a new complexity class within PTIME.
Abstract
To provide a dichotomy between those queries that can be made feasible on big data after appropriate preprocessing and those for which preprocessing does not help, Fan et al. developed the -tractability theory. This theory provides a formal foundation for understanding the tractability of query classes in the context of big data. Along this line, we introduce a novel notion of -tractability in this paper. Inspired by some technologies used to deal big data, we place a restriction on preprocessing function, which limits the function to produce a relatively small database as output, at most polylog-size of the input database. At the same time, we bound the redundancy information when re-factorizing data and queries for preprocessing. These changes aim to make our theory more closely linked to practice. We set two complexity classes to denote the classes of Boolean queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Quality and Management
