SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
Nikola Jovi\v{s}i\'c, Milica \v{S}kipina, Vanja \v{S}venda

TL;DR
SetFlow is a novel generative model that creates permutation-invariant, semantically coherent sets of representations for MIL data, improving classification performance and data augmentation in data-scarce scenarios.
Contribution
It introduces SetFlow, a flow-based architecture that models entire MIL bags in the representation space, capturing intra-bag dependencies and enabling effective data augmentation.
Findings
Generated samples closely match original data distribution.
Synthetic data training yields competitive classification results.
SetFlow improves downstream MIL classification performance.
Abstract
Data scarcity and weak supervision continue to limit the performance of machine learning models in many real-world applications, such as mammography, where Multiple Instance Learning (MIL) often offers the best formulation. While recent foundation models provide strong semantic representations out of the box, effective augmentation of such representations of MIL data remains limited, as existing methods operate at the instance level and fail to capture intra-bag dependencies. In this work, we introduce SetFlow, a generative architecture that models entire MIL bags (i.e., sets) directly in the representation space. Our approach leverages the flow matching paradigm combined with a Set Transformer-inspired design, enabling it to handle permutation-invariant inputs while capturing interactions between instances within each bag. The model is conditioned on both class labels and input scale,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
