# QRBT: Quantum Driven Reinforcement Learning for Scalable Blockchain Transaction Processing

**Authors:** Kranthi Kumar Lella, Shiva Rama Krishna Mallu

PMC · DOI: 10.1371/journal.pone.0342689 · PLOS One · 2026-02-19

## TL;DR

QRBT uses quantum computing and reinforcement learning to improve blockchain transaction processing speed and security against quantum threats.

## Contribution

QRBT introduces a quantum-driven reinforcement learning framework for blockchain that enhances scalability, latency, and quantum-resistant security.

## Key findings

- QRBT reduces transaction latency by up to 91.264% and improves throughput by 92.635%.
- The system achieves strong cryptographic security against quantum attacks with over 96% strength at Level-1.
- QRBT outperforms other quantum algorithms in energy efficiency and reinforcement learning convergence.

## Abstract

Scaling and Latency Issues Processing blockchain transactions is a challenge with respect to throughput and latency, while also being resilient against quantum adversaries. To overcome these,m this research presents QRBT:Quantum Driven Reinforcement Learning for Scalable Blockchain Transaction Processing, a Quantum Based Reinforcement Learning Framework, where variational quantum circuits and QKD are utilized jointly with an actor-critic paradigm of RL to improve consensus strategies and transaction validation. The system adopts a four tier architecture quantum computation layer, reinforcement learning layer, blockchain security layer and transaction processing layer to support adaptive policy optimization driven by quantum enhanced state encoding and circuit refinement. Our experimental results with Ethereum mainnet traces and synthetic workloads further show significant improvements in system performance the transaction latency is reduced by 91.264% at level-1, and 76.298% at level-5. The cryptographic security under quantum attack approximation remains the strongest 96.152% at Level-1, 83.728% at Level-5, scalability enhanced 92.635% throughput improvement of Level-1, 79.512 contested TPS of the block stage within each mining epoch, consensus energy consumption becomes the least among all systems 69.957kWh at Level-1, 84.963kWh of computational cost at level-5. Cost for reinforcement learning convergence stabilizes quite efficiently, further lifting from 81.937 percent at Level-1 to 92.746 percent at Level-5 and showing the superiority over other QAOA, QAOA-RL, QSVT,QPSO and AQO baselines. The results show that QRBTmaintains the high throughput, security strength and energy efficient consensus simultaneously in their resistance to quantum attacks. The results show that quantum assisted reinforcement learning is a scalable and secure approach for the next generation of blockchain systems.

## Full-text entities

- **Diseases:** MCS (MESH:C536703)
- **Chemicals:** water (MESH:D014867), AQO (-)
- **Mutations:** A2C

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12919817/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12919817/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12919817/full.md

---
Source: https://tomesphere.com/paper/PMC12919817