Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

Thien Le; Melanie Weber

arXiv:2605.20074·cs.LG·May 20, 2026

Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

Thien Le, Melanie Weber

PDF

TL;DR

This paper investigates the conditions under which knowledge distillation from large models to graph neural networks succeeds in combinatorial optimization, emphasizing the role of algorithmic alignment with dynamic programming.

Contribution

It provides a formal analysis of distillation success conditions for GNNs aligned with DP algorithms, extending recent learning-theoretic insights to combinatorial tasks.

Findings

01

Distillation can be efficiently achieved under certain complexity conditions.

02

A formal sufficient condition for successful distillation is established.

03

The analysis extends to combinatorial optimization with graph neural networks.

Abstract

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.