Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback
Haoran Zhang, Seohyeon Cha, Hasan Burhan Beytur, Kevin S Chan, Gustavo de Veciana, Haris Vikalo

TL;DR
This paper introduces a variance-reduced online learning algorithm for multi-layer hierarchical inference systems with partial, policy-dependent feedback, improving stability and performance in resource-constrained environments.
Contribution
It develops a novel variance-reduction technique integrated with Lyapunov optimization for stable online routing under sparse feedback and resource constraints.
Findings
The proposed method achieves lower regret compared to standard importance-weighted algorithms.
It maintains stability and unbiased loss estimation despite decaying feedback probabilities.
Experimental results show improved performance on large-scale multi-task workloads.
Abstract
Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
