TL;DR
This paper introduces a self-supervised pretraining approach using synthetic fractal images to improve stellar mass inference in dense gas regions, reducing the need for extensive labeled data.
Contribution
It presents a novel synthetic pretraining method with vision transformers that enhances stellar mass predictions and enables unsupervised segmentation in star-forming regions.
Findings
Pretraining on synthetic images improves mass prediction accuracy.
The pretrained model outperforms limited supervised models.
Features enable unsupervised segmentation of star-forming regions.
Abstract
Stellar mass is a fundamental quantity that determines the properties and evolution of stars. However, estimating stellar masses in star-forming regions is challenging because young stars are obscured by dense gas and the regions are highly inhomogeneous, making spherical dynamical estimates unreliable. Supervised machine learning could link such complex structures to stellar mass, but it requires large, high-quality labeled datasets from high-resolution magneto-hydrodynamical (MHD) simulations, which are computationally expensive. We address this by pretraining a vision transformer on one million synthetic fractal images using the self-supervised framework DINOv2, and then applying the frozen model to limited high-resolution MHD simulations. Our results demonstrate that synthetic pretraining improves frozen-feature regression stellar mass predictions, with the pretrained model…
Peer Reviews
Decision·Submitted to ICLR 2026
**1. This is a well-motivated problem:** This is a well-motivated and common significant problem is the field of computational astrophysics. Often, the only available link between latent parameters of interest and observations are simulation data that are too expensive to generate in large quantities. Using ML in this low-data regime is a well-justified goal. **2. The approach is well-motivated and sensible:** The core idea of the paper, that is, to leverage large, cheap-to-generate synthetic
**1. The work lacks methodological novelty for ICLR venue:** The primary weakness is the paper's limited methodological contribution to the machine learning community. The work appears to be a direct application of an existing, off-the-shelf SSL framework (DINOv2) combined with an existing pretraining data concept (fractals). There are no apparent modifications or novel insights into the DINOv2 algorithm, the ViT architecture, or the learning process itself. The downstream tasks are handled by s
- The description of the problem provides a compelling justification for adopting a self-supervised approach, effectively highlighting the limitations of traditional supervised methods in this context. - The figures are well-designed and informative, contributing to the clarity of the presentation.
- **Lack of Justification for Pretraining Strategy:** The primary shortcoming of the paper is that it does not adequately address why the original pretrained DinoV2 features cannot be used directly. The necessity of pretraining the entire model is not clearly justified, especially given that pretraining typically requires substantial computational resources and large datasets, neither of which are discussed in detail. - **Unconvincing Results:** The results presented in Table 1 are not comp
- The paper is well motivated and aligns well with current trends toward developing low-cost, easily deployable methods. - The idea of training on easily generated fractals is both novel and clever. I find it genuinely interesting and promising for this application. - The experimental pipeline is clearly described and relies on well-established techniques such as PCA and kNN. - The proposed method demonstrates improvement over a simple supervised baseline.
1. The main weakness I find is the lack of a clear baseline that aligns with the paper’s “limited data” motivation. The supervised ResNet-18 trained from scratch seems to me like a rather naive baseline. I understand that the goal was to demonstrate that synthetic-pretrained, frozen ViT features generalize better than a fully supervised model trained with limited data. However, the presented experiment does not seem to reflect a truly limited-data setting. The 24k–8k split they use already falls
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
