Learning with Delayed Rewards -- A case study on inverse defect design in 2D materials
Suvo Banik, Troy D Loeffler, Rohit Batra, Harpal Singh, Mathew, Cherukara, and Subramanian KRS Sankaranarayanan

TL;DR
This paper introduces a reinforcement learning approach with delayed rewards to efficiently explore defect configurations in 2D materials, enabling better inverse defect design and understanding of defect-driven phase transitions.
Contribution
The study presents a novel RL method using delayed rewards for defect design in materials, outperforming traditional optimization techniques in efficiency and solution quality.
Findings
Delayed rewards improve defect configuration sampling.
MCTS with delayed rewards finds better defect arrangements.
Method reduces evaluations compared to genetic algorithms.
Abstract
Defect dynamics in materials are of central importance to a broad range of technologies from catalysis to energy storage systems to microelectronics. Material functionality depends strongly on the nature and organization of defects, their arrangements often involve intermediate or transient states that present a high barrier for transformation. The lack of knowledge of these intermediate states and the presence of this energy barrier presents a serious challenge for inverse defect design, especially for gradient-based approaches. Here, we present a reinforcement learning (Monte Carlo Tree Search) based on delayed rewards that allow for efficient search of the defect configurational space and allows us to identify optimal defect arrangements in low dimensional materials. Using a representative case of 2D MoS2, we demonstrate that the use of delayed rewards allows us to efficiently sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
