Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization
Thien Le, Melanie Weber

TL;DR
This paper investigates the conditions under which knowledge distillation from large models to graph neural networks succeeds in combinatorial optimization, emphasizing the role of algorithmic alignment with dynamic programming.
Contribution
It provides a formal analysis of distillation success conditions for GNNs aligned with DP algorithms, extending recent learning-theoretic insights to combinatorial tasks.
Findings
Distillation can be efficiently achieved under certain complexity conditions.
A formal sufficient condition for successful distillation is established.
The analysis extends to combinatorial optimization with graph neural networks.
Abstract
Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
