Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
Shixuan Liu, Yanghe Feng, Keyu Wu, Guangquan Cheng, Jincai Huang,, Zhong Liu

TL;DR
This paper introduces a novel reinforcement learning approach with trust region navigation and a refined graph attention encoder for more robust and efficient causal structure discovery, outperforming previous methods on synthetic and benchmark datasets.
Contribution
It proposes a trust region-navigated clipping policy optimization and a new SDGAT encoder to enhance causal discovery performance and stability.
Findings
Outperforms previous RL methods in synthetic datasets.
Achieves better robustness and efficiency in causal structure learning.
Demonstrates superior results on benchmark datasets.
Abstract
In many domains of empirical sciences, discovering the causal structure within variables remains an indispensable task. Recently, to tackle with unoriented edges or latent assumptions violation suffered by conventional methods, researchers formulated a reinforcement learning (RL) procedure for causal discovery, and equipped REINFORCE algorithm to search for the best-rewarded directed acyclic graph. The two keys to the overall performance of the procedure are the robustness of RL methods and the efficient encoding of variables. However, on the one hand, REINFORCE is prone to local convergence and unstable performance during training. Neither trust region policy optimization, being computationally-expensive, nor proximal policy optimization (PPO), suffering from aggregate constraint deviation, is decent alternative for combinatory optimization problems with considerable individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · REINFORCE · Entropy Regularization · Proximal Policy Optimization
