# How many qubits does a machine learning problem require?

**Authors:** Sydney Leither, Michael Kubal, Sonika Johri

arXiv: 2508.20992 · 2026-01-30

## TL;DR

This paper introduces a universal quantum data encoding scheme that enables efficient resource estimation for variational quantum machine learning, demonstrating that many classical datasets can be encoded with around 27 qubits on average.

## Contribution

It presents the first resource estimation framework for variational quantum machine learning using the bit-bit encoding scheme, showing its universality and practical applicability.

## Key findings

- Bit-bit encoding achieves universal approximation efficiently.
- Classical datasets require about 27 qubits on average for encoding.
- Extended encoding supports batched processing of large datasets.

## Abstract

For a machine learning paradigm to be generally applicable, it should have the property of universal approximation, that is, it should be able to approximate any target function to any desired degree of accuracy. In variational quantum machine learning, the class of functions that can be learned depend on both the data encoding scheme as well as the architecture of the optimizable part of the model. Here, we show that the property of universal approximation is constructively and efficiently realized by the recently proposed bit-bit data encoding scheme. Further, we show that this construction allows us to calculate the number of qubits required to solve a learning problem on a dataset to a target accuracy, giving rise to the first resource estimation framework for variational quantum machine learning. We apply bit-bit encoding to a number of medium-sized classical benchmark datasets and find that they require only 27 qubits on average for encoding. We extend the basic bit-bit encoding scheme to a variant that efficiently supports batched processing of large datasets. As a demonstration, we apply this new scheme to subsets of a giga-scale transcriptomic dataset. This work establishes bit-bit encoding not only as a universally expressive quantum data representation, but also as a practical foundation for resource estimation and benchmarking in quantum machine learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20992/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20992/full.md

---
Source: https://tomesphere.com/paper/2508.20992