TL;DR
This paper introduces a novel method for causal inference between discrete sequences using compression-based measures, demonstrating competitive performance and applications in viral genome analysis.
Contribution
It proposes a new framework leveraging lossless compression to infer causal directions from symbolic sequences, including three models based on CCMs and applications to SARS-CoV-2 genomes.
Findings
Models perform competitively with state-of-the-art methods.
Effective in inferring causal directions without temporal information.
Applied successfully to genome sequences for viral analysis.
Abstract
Causal inference is one of the most fundamental problems across all domains of science. We address the problem of inferring a causal direction from two observed discrete symbolic sequences and . We present a framework which relies on lossless compressors for inferring context-free grammars (CFGs) from sequence pairs and quantifies the extent to which the grammar inferred from one sequence compresses the other sequence. We infer causes if the grammar inferred from better compresses than in the other direction. To put this notion to practice, we propose three models that use the Compression-Complexity Measures (CCMs) - Lempel-Ziv (LZ) complexity and Effort-To-Compress (ETC) to infer CFGs and discover causal directions without demanding temporal structures. We evaluate these models on synthetic and real-world benchmarks and empirically observe performances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCausal inference
