TL;DR
This empirical study compares different configuration encoding schemes for predicting software performance, revealing trade-offs in accuracy and training time, and providing practical guidance for selecting encoding methods.
Contribution
It systematically evaluates label, scaled label, and one-hot encoding across multiple systems and models, offering insights into their effectiveness and efficiency.
Findings
One-hot encoding often yields the highest accuracy.
Scaled label encoding generally results in faster training.
Trial-and-error encoding selection can be very time-consuming.
Abstract
Learning and predicting the performance of a configurable software system helps to provide better quality assurance. One important engineering decision therein is how to encode the configuration into the model built. Despite the presence of different encoding schemes, there is still little understanding of which is better and under what circumstances, as the community often relies on some general beliefs that inform the decision in an ad-hoc manner. To bridge this gap, in this paper, we empirically compared the widely used encoding schemes for software performance learning, namely label, scaled label, and one-hot encoding. The study covers five systems, seven models, and three encoding schemes, leading to 105 cases of investigation. Our key findings reveal that: (1) conducting trial-and-error to find the best encoding scheme in a case by case manner can be rather expensive, requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
