Dynamic and Distributed Routing in IoT Networks based on Multi-Objective Q-Learning
Shubham Vaishnav, Praveen Kumar Donta, Sindri Magn\'usson

TL;DR
This paper introduces a distributed multi-objective Q-learning routing algorithm for IoT networks that adaptively balances conflicting goals like energy efficiency and reliability in real time, outperforming existing methods.
Contribution
It presents a novel distributed multi-objective Q-learning approach with a greedy interpolation policy, enabling real-time adaptation to dynamic routing preferences without central coordination.
Findings
Achieves 80-90% lower energy consumption
Provides 2-5x higher cumulative rewards and packet delivery
Demonstrates robustness across varying operating conditions
Abstract
IoT networks often face conflicting routing goals such as maximizing packet delivery, minimizing delay, and conserving limited battery energy. These priorities can also change dynamically: for example, an emergency alert requires high reliability, while routine monitoring prioritizes energy efficiency to prolong network lifetime. Existing works, including many deep reinforcement learning approaches, are typically centralized and assume static objectives, making them slow to adapt when preferences shift. We propose a dynamic and fully distributed multi-objective Q-learning routing algorithm that learns multiple per-preference Q-tables in parallel and introduces a novel greedy interpolation policy to act near-optimally for unseen preferences without retraining or central coordination. A theoretical analysis further shows that the optimal value function is Lipschitz-continuous in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
