# Online bipartite matching methodology for anti-epidemic resources allocation: an adaptive time window based on reinforcement learning

**Authors:** Zhiyong Wu, Sulin Pang, Suyan He

PMC · DOI: 10.3389/fpubh.2025.1644499 · 2026-01-08

## TL;DR

This paper proposes a reinforcement learning-based method to dynamically allocate anti-epidemic resources among suppliers and recipients during outbreaks.

## Contribution

The novel contribution is an adaptive time window bipartite matching algorithm using reinforcement learning for dynamic anti-epidemic resource allocation.

## Key findings

- Adaptive time window strategies better adapt to dynamic epidemic scenarios compared to fixed window approaches.
- Matching rates increase with larger windows, but waiting times initially decrease then increase.
- Health managers should adjust time windows based on epidemic dynamics and resource availability.

## Abstract

This study aimed to investigate the online matching problem for anti-epidemic resources among multiple suppliers and recipients in the Internet of Healthcare System during a major outbreak. It accounts for the heterogeneity of supply and demand.

A multi-stage online dynamic bipartite matching model based on time windows is developed, which can be reformulated as a Markov decision process. An adaptive time window batch bipartite matching algorithm based on reinforcement learning is proposed, which utilizes the nearest neighbor's first heuristic strategy to allocate anti-epidemic resources.

The optimal window size in fixed time window batch matching strategy (FTWBM) outperforms the results of adaptive time window batch matching strategy (ATWBM). However, the ATWBM strategy demonstrates greater effectiveness in adapting to the dynamic changes in epidemic prevention and control, particularly in partially optimistic scenarios.

The results revealed that, although the average matching rate consistently increases, the average waiting time initially decreases before rising again as the matching time window expands. This finding implies that health operations managers should modify the matching time window in response to changing epidemic dynamics and resource availability.

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12823865/full.md

---
Source: https://tomesphere.com/paper/PMC12823865