Geometric Characterisation and Structured Trajectory Surrogates for Clinical Dataset Condensation
Pafue Christy Nganjimi, Andrew Soltan, Danielle Belgrave, Lei Clifton, David Clifton, Anshul Thakur

TL;DR
This paper introduces Bezier Trajectory Matching, a structured surrogate approach for dataset condensation that improves model training efficiency and performance on clinical datasets by better aligning supervision signals.
Contribution
It provides a geometric analysis of trajectory matching and proposes Bezier surrogates to enhance supervision structure and reduce storage in clinical dataset condensation.
Findings
BTM outperforms standard trajectory matching in clinical datasets.
BTM is especially effective in low-prevalence and low-budget scenarios.
Structured supervision signals improve dataset condensation efficiency.
Abstract
Dataset condensation constructs compact synthetic datasets that retain the training utility of large real-world datasets, enabling efficient model development and potentially supporting downstream research in governed domains such as healthcare. Trajectory matching (TM) is a widely used condensation approach that supervises synthetic data using changes in model parameters observed during training on real data, yet the structure of this supervision signal remains poorly understood. In this paper, we provide a geometric characterisation of trajectory matching, showing that a fixed synthetic dataset can only reproduce a limited span of such training-induced parameter changes. When the resulting supervision signal is spectrally broad, this creates a conditional representability bottleneck. Motivated by this mismatch, we propose Bezier Trajectory Matching (BTM), which replaces SGD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
