ARTreeFormer: A Faster Attention-based Autoregressive Model for Phylogenetic Inference

Tianyu Xie; Yicong Mao; Cheng Zhang

arXiv:2507.18380·q-bio.PE·July 25, 2025·PLoS Comput. Biol.

ARTreeFormer: A Faster Attention-based Autoregressive Model for Phylogenetic Inference

Tianyu Xie, Yicong Mao, Cheng Zhang

PDF

Open Access 3 Reviews

TL;DR

ARTreeFormer is a novel attention-based autoregressive model that significantly accelerates phylogenetic inference by employing fixed-point iteration and attention mechanisms, enabling scalable and efficient analysis of large datasets.

Contribution

It introduces ARTreeFormer, combining fixed-point iteration and attention to improve speed and scalability over previous autoregressive models in phylogenetics.

Findings

01

Achieves faster computation on CUDA devices.

02

Maintains high approximation accuracy.

03

Effective on challenging real data phylogenetic problems.

Abstract

Probabilistic modeling over the combinatorially large space of tree topologies remains a central challenge in phylogenetic inference. Previous approaches often necessitate pre-sampled tree topologies, limiting their modeling capability to a subset of the entire tree space. A recent advancement is ARTree, a deep autoregressive model that offers unrestricted distributions for tree topologies. However, its reliance on repetitive tree traversals and inefficient local message passing for computing topological node representations may hamper the scalability to large datasets. This paper proposes ARTreeFormer, a novel approach that harnesses fixed-point iteration and attention mechanisms to accelerate ARTree. By introducing a fixed-point iteration algorithm for computing the topological node embeddings, ARTreeFormer allows fast vectorized computation, especially on CUDA devices. This, together…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 8Confidence 2

Strengths

(1) ARTreeFormer improves significantly over ARTree by recognizing the several design bottlenecks and running time complexity of ARTree. The ARTreeFormer proposes novel attention and GNN mechanism that is tailored to phylogenetic inference. (2) ARTreeFormer is evaluated on several standard phylogenetic inference benchmarks, while the authors have provided several useful metrics in analyzing the performance of the model. The author explains in details about what each metrics stand for, and make

Weaknesses

(1) Table 1 reveals that ARTreeFormer consistently lags behind ARTree in terms of KL divergence. Why does ARTreeFormer not achieve competitive performance with ARTree, which it was designed to improve? What training components or procedures might contribute to ARTreeFormer’s reduced performance? (2) It seems that the recurrent node embedding with simplified attention mechanism and local message passing updates module are built upon two separate components of ARTree, respectively. The authors sh

Reviewer 02Rating 8Confidence 2

Strengths

1. Overall the paper is fairly well-written and flows quite well 2. The learnable node embeddings using attention are a simple but effective way to **(1)** make the embeddings and the model more flexible **(2)** improve the runtime 3. There is a significant improvement in runtime shaving *hours* of CPU runtime off (e.g for the Maximum parsimony application). I believe this makes the method more applicable to modern phylogenetic datasets where there are several hundred/thousands of taxa.

Weaknesses

The paper, I believe, has no real "deal-breaking" weakness but would benefit from addressing the following points: 1. In the VBPI experiment, **ARTreeFormer** is compared to other methods: $\phi$-CSMC and GeoPhy in terms of approximation accuracy. However, in terms of runtime, **ARTreeFormer** results are only compared to methods upon which it was built *(i.e. SBNs and ARTree)*. Since the focus of this paper is the fast runtime of **ARTreeFormer** I think a computation-speed comparison to all

Reviewer 03Rating 3Confidence 4

Strengths

This paper demonstrates that topological node embeddings derived from Dirichlet energy minimization are not needed to achieve near-SoTA results. Instead, learnt attention-based recurrent node embeddings can provide similar performance (if not slightly worse - see Table 1). The use of attention-based edge decisions and the local message passing scheme are also new ideas wrt ARTree which provide speedups to the model.

Weaknesses

For a work whose main contribution is a more computationally efficient model (in the author's words, line 338: "It should be emphasized that we mainly pay attention to the computational efficiency improvement of ARTreeFormer and only expect it to attain similar accuracy with baseline methods"), it is surprising that there is no analysis whatsoever of the computational complexity of ARTree and ARTreeFormer. My expectation coming into the paper was that (1) the paper would discuss the computationa

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Genetics, Bioinformatics, and Biomedical Research · Fractal and DNA sequence analysis