Divide and Predict: An Architecture for Input Space Partitioning and Enhanced Accuracy

Fenix W. Huang; Henning S. Mortveit; and Christian M. Reidys

arXiv:2603.08649·cs.LG·March 10, 2026

Divide and Predict: An Architecture for Input Space Partitioning and Enhanced Accuracy

Fenix W. Huang, Henning S. Mortveit, and Christian M. Reidys

PDF

Open Access

TL;DR

This paper introduces a variance-based measure to quantify data heterogeneity, enabling effective data partitioning that improves supervised learning accuracy, demonstrated through experiments on image and synthetic data.

Contribution

It presents a novel variance measure for data heterogeneity and shows how partitioning data based on this measure enhances model performance.

Findings

01

Variance captures data heterogeneity effectively.

02

Partitioning data improves test accuracy.

03

Variance peaks at equal distribution mixes.

Abstract

In this article the authors develop an intrinsic measure for quantifying heterogeneity in training data for supervised learning. This measure is the variance of a random variable which factors through the influences of pairs of training points. The variance is shown to capture data heterogeneity and can thus be used to assess if a sample is a mixture of distributions. The authors prove that the data itself contains key information that supports a partitioning into blocks. Several proof of concept studies are provided that quantify the connection between variance and heterogeneity for EMNIST image data and synthetic data. The authors establish that variance is maximal for equal mixes of distributions, and detail how variance-based data purification followed by conventional training over blocks can lead to significant increases in test accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques