Incrementally Maintaining Classification using an RDBMS
Mehmet Levent Koc (University of Wisconsin-Madiso), Christopher R\'e, (University of Wisconsin-Madison)

TL;DR
This paper introduces an incremental classification algorithm integrated into RDBMSs, enabling efficient updates and significantly improving performance over traditional non-incremental methods across various datasets.
Contribution
The paper presents a novel incremental classification algorithm for RDBMSs, analyzes its optimality, and develops an index structure to efficiently manage data updates.
Findings
Algorithms outperform non-incremental approaches by several orders of magnitude.
The proposed method is optimal among deterministic algorithms.
Effective for text processing and various real-world datasets.
Abstract
The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into relational database management systems (RDBMSs). We study algorithms to maintain model-based views for a popular statistical technique, classification, inside an RDBMS in the presence of updates to the training examples. We make three technical contributions: (1) An algorithm that incrementally maintains classification inside an RDBMS. (2) An analysis of the above algorithm that shows that our algorithm is optimal among all deterministic algorithms (and asymptotically within a factor of 2 of a nondeterministic optimal). (3) An index structure based on the technical ideas that underlie the above algorithm which allows us to store only a fraction of the entities in memory. We apply our techniques to text processing, and we demonstrate that our algorithms provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Data Quality and Management · Advanced Database Systems and Queries
