OncotreeVIS—an interactive graphical user interface for visualizing mutation tree cohorts
Monica-Andreea Baciu-Drăgan, Niko Beerenwinkel

TL;DR
OncotreeVIS is a tool for visualizing and analyzing mutation trees in tumors, helping researchers explore genetic events and drug interactions in cancer cohorts.
Contribution
OncotreeVIS introduces an interactive GUI for visualizing mutation tree cohorts and posterior distributions, enabling detailed inspection and comparison.
Findings
OncotreeVIS supports visualization of mutation trees with single or joint genetic events.
The tool highlights subclones, conserved trajectories, and drug-gene interactions at the cohort level.
It is available as a JavaScript library and web application with default datasets for immediate use.
Abstract
In recent years, developments in single-cell next-generation sequencing technology and computational methodology have made it possible to reconstruct, with increasing precision, the evolutionary history of tumors and their cell phylogenies, represented as mutation trees. Many mutation tree inference tools exist, but they do not support detailed visual tree inspection, nor tree comparisons or analysis at the cohort level, an important task in computational oncology. We developed oncotreeVIS, an interactive graphical user interface for visualizing mutation tree cohorts and tree posterior distributions obtained from mutation tree inference tools. OncotreeVIS can display mutation trees that encode single or joint genetic events, such as point mutations and copy number changes, and highlight matching subclones, conserved trajectories and drug-gene interactions at the cohort level.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —Horizon 2020 project OLISSIPO
- —Swiss National Science Foundation10.13039/501100001711
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCRISPR and Genetic Engineering · Evolution and Genetic Dynamics · Genetics, Bioinformatics, and Biomedical Research
1 Introduction
Tumor progression is a process driven by the accumulation of genomic mutations, such as point mutations or copy number alterations (CNA), and shaped by environmental selection pressures, including those exhibited by immune responses and drug treatments (Yates and Campbell 2012). This somatic evolutionary process can lead to a high level of heterogeneity in tumors both in space and time. Furthermore, it has been shown that the temporal order in which mutations occur can have a significant impact on clinical patient outcome (Pronier et al. 2011).
However, tracking the evolution of tumors over time is impossible before the point when the patient is being diagnosed and usually also not possible for solid tumors, where samples are available only through biopsy, when the tumor is removed from the patient. In these cases when the evolution of the tumor cannot be directly observed, it can be inferred computationally from the genetic profiles detected at the time of the biopsy in different cell populations in the tumor (subclones), using mutation tree inference tools for point mutations, copy number events, or joint copy number and point mutations (Table 1, available as supplementary data at Bioinformatics Advances online).
The mutation tree inference tools process the genomic sequencing data and reconstruct its phylogeny by a probabilistic approach using as taxa either the individual cells (for single-cell data) or the clones inferred based on variant allele frequencies, for bulk data (Kuipers et al. 2017). Depending on the statistical models used, their assumptions and their capability of handling errors in the real data, the results of different mutation tree inference tools do not always agree. They also depend on the resolution at which the input data is sequenced (bulk, single-cell), on the amount of cells per sample, and on the number of samples extracted from different sites of the biopsy.
The mutational history of a tumor can be represented by a mutation tree, also referred to as event tree, where the nodes correspond to different genetic events (event nodes) and the tree structure encodes the partial temporal order in which they occurred. The tumor cells are then attached to the mutation nodes, and their genetic profile is obtained by tracking all the events from the attachment node back to the root.
Once reconstructed, the mutation trees can be either visually inspected in order to understand the intra-tumor heterogeneity and potentially guide clinical decisions in tumor boards, or they can be analyzed in a comparative fashion as tree cohorts, in order to compute conserved evolutionary trajectories, or pairwise tree distance scores (Baciu-Drăgan and Beerenwinkel 2024).
The data formats used to represent mutation trees vary across different mutation tree inference tools, as described in Table 3, available as supplementary data at Bioinformatics Advances online, and tree visualizations must either be generated in a postprocessing step, e.g. using Graphviz (Ellson et al. 2002), or rendered as image files through functionality incorporated within the model—which is typically not reusable outside that specific scope. More complex visualization tools for clonal evolution take into account longitudinal input data (i.e. data sampled at successive time points in the life of the tumor) and treatment information, and are able to generate fish plot views of the tumors (Miller et al. 2016, Sandmann et al. 2022).
However, the existing mutation tree visualization options are limited to individual trees and display a fixed set of subclonal genomic event annotations in the mutation tree nodes, typically none or all of them, that cannot be adjusted interactively. This makes the visualization less informative when the annotations are discarded and overcrowded when the set of affected genes per subclone is large.
Also, there is currently no consistent way to visualize, in a unified manner, mutation trees reconstructed by different mutation tree inference tools. No tools for comparative tree visualization of cohorts exist that can facilitate the navigation through the different features of the cohort (e.g. conserved evolutionary trajectories) in an interactive fashion.
To address these limitations, we developed oncotreeVIS, an interactive web-based graphical user interface for visualizing mutation trees from large cohorts, which helps assess heterogeneity among tumors. The trees can be displayed side by side, grouped according to a given clustering, or projected into a 2D latent space given the pairwise tree distances. Additional information such as clinical data, drug-gene interactions, matching mutation patterns and conserved trajectories between groups of trees, and k-nearest neighbor trees, is displayed in an interactive fashion.
OncotreeVIS can be effectively used to visually inspect the results of mutation tree clustering methods, the tree posterior distributions of mutation tree inference tools, or different versions of mutation trees for the same patient, e.g. mutation trees reconstructed with different mutation tree inference tools, or longitudinal samples.
OncotreeVIS is available both as a JavaScript library and as an online web application, which makes it easy to use both for computational scientists and clinicians. By default, oncotreeVIS provides visualizations of seven publicly available mutation tree cohorts reconstructed with existing mutation tree inference tools from both bulk and single-cell data which highlight some of the use cases: visualizing matching patterns within clustered point mutation and CN trees for different cancer types, visualizing tree structures corresponding to different modes of spatial tumor evolution (linear, branching, punctuated), inspecting conserved trajectories among alternative trees sampled from the posterior tree distribution, and interpreting the results of algorithms that identify conserved evolutionary trajectories (see Table 1, available as supplementary data at Bioinformatics Advances online and Fig. 1, available as supplementary data at Bioinformatics Advances online for details). New tree cohort datasets can be visualized easily by providing the data in a predefined JSON format.
2 Usage and implementation details
OncotreeVIS is implemented in HTML5 and JavaScript, and uses the D3.js library for data visualization (Bostock et al. 2011). To render large amounts of trees without blocking the browser it performs on the fly computations asynchronously and uses lazy loading of the HTML elements that are offscreen. The online version of the tool is implemented as a web application hosted on GitHub Pages, a static site hosting service which does not support data storage or backend computations on the server side. All user data loaded into the application is processed entirely in the browser, on the client side, ensuring that no information is transmitted to a server—this makes the application suitable for handling sensitive patient data by effectively addressing critical privacy concerns.
To unify the various data formats used in the literature for encoding mutation trees, the mutation tree cohort input for oncotreeVIS is structured as a nested JSON object which encodes the tree structures, with node labels representing single or joint genomic events acquired by each subclone at different times in the inferred evolution of the tumor. The mutation events in each node are encoded by a dictionary where the keys are the type of event (point mutation, CNA, etc.) and the value stores additional information, e.g. the point mutation variant, or the copy number amplification/deletion amounts w.r.t. the diploid state. Additional node annotations can be provided by the user: matching label IDs, which uniquely identify the set of mutation events and are used to facilitate the identification of matching subclones, subclone sizes, and clinical information if available, as described in Table 2, available as supplementary data at Bioinformatics Advances online. The node sizes reflect the relative size of the subclonal cell populations. To improve the user experience we chose to use equal branch lengths for all the trees and display the number of mutations and their type as branch labels.
In our code repository we provide scripts to convert the output formats of four common mutation tree inference tools (listed in Table 3, available as supplementary data at Bioinformatics Advances online) into the JSON input format used by oncotreeVIS. In addition to individual mutation trees, oncotreeVIS can incorporate a given tree clustering and a precomputed pairwise tree distance matrix, which can be provided as additional key value pairs in the JSON input file. For the predefined tree cohorts, we used oncotree2vec (Baciu-Drăgan and Beerenwinkel 2024) to compute the pairwise distances and the clustering of the mutation trees based on the matching pairwise relations between the clones.
3 Features
In order to explore the heterogeneity of mutation tree cohorts, oncotreeVIS provides an interactive interface where the user can navigate between different views of the data both at the tree and cohort level. In addition, one can visualize subclones with matching genetic events, conserved evolutionary trajectories, and the impact of subclonal target gene events on drug efficacy (Fig. 1, Table 4, available as supplementary data at Bioinformatics Advances online).
Screenshots of different mutation tree views provided by oncotreeVIS. Some interactive elements are indicated by the arrow pointer. (a) Mutation tree with events acquired in each subclone node labeled on the incoming edge (e.g. “3snv, 2amp, 4del” stands for three point mutations, two copy number amplifications, and four deletion w.r.t. the diploid state). (b–d): Three different tree cohort views: (b) Mutation trees displayed side by side, grouped by a given clustering; matching subclones and conserved trajectories are highlighted. (c) Heatmap visualization of the pairwise distances between the mutation trees. (d) 2D visualization of the mutation trees based on the pairwise distances, using Multidimensional scaling (MDS); each tree is a dot in the latent space and can be visualized when the dot is hovered by the mouse pointer. (e–h): Additional information at mutation tree level: (e) Highlighting subclones with an affected target gene using red color for CN amplification, blue for CN deletion and violet for any other mutation event. (f) The subclones affected by a selected target drug are indicated by squares at the center of the nodes; there is one clone (next to the root) which does not interact with the selected drug. (g) Cluster summaries (clinical data and matching subclonal events). (h) K-nearest neighbors of a target mutation tree (placed at the center of the panel) among other trees in the cohort.
3.1 Tree cohort views
OncotreeVIS provides different ways of visualizing event trees from large mutation tree cohorts. For each tree, we provide text labels summarizing the mutation events or (for more than two events) the number of mutation events of each type acquired in each subclone node (Fig. 1a). By default, the individual trees are displayed side by side. If a clustering is provided, the trees are ordered according to the given cluster groups, which are indicated by different background colors (Fig. 1b). The interface provides the option to sort the trees alphabetically or randomly (see Table 4, available as supplementary data at Bioinformatics Advances online).
The user can choose whether to highlight matching subclones (i.e. the nodes with the same matching label ID) or trajectories (i.e. tree branches) conserved between the trees from the same cluster (see Table 4, available as supplementary data at Bioinformatics Advances online for details). If no clustering is provided, the matching subclones across the entire cohort will be highlighted by node colors. Detailed information is displayed by clicking on the individual mutation trees (see Section 3.2 below).
Two additional cohort views are available if the pairwise distances between the mutation trees are provided by the user: a heatmap view of the tree distances and a 2D representation of the trees in latent space, computed on the fly using Multidimensional scaling (Fig. 1c and d). Similarly to the tree view, the different tree clusters are highlighted if a clustering is provided.
By clicking on the clusters, additional details are displayed: a summary of the clinical data and information on the mutation events shared between the subclones with the same matching label ID in the cluster trees (Fig. 1g). Also, we offer users the option to export each tree cohort view, as well as individual trees, as high-quality, camera-ready PDF figures.
3.2 Mutation tree information
In order to facilitate the visual inspection of the mutation trees for guiding clinical decisions, oncotreeVIS provides an interactive user interface where the user can track the impact of certain genes and drugs of interest on the evolution of the tumors (Fig. 1e and f). For each selected tree, the user can choose from the list of genes affected by mutation events in the tree to visualize the subclone (i.e. tree node) which first acquired that mutation and consult the list of drugs which interact with the target gene according to the Drug Gene Interaction database [DGIdb, Cannon et al. (2024)—we used a frozen version from December 2024]. Some of these mutated genes can be targeted by specific drugs and one can select a drug and visualize the subclones which contain affected genes which interact with the target drug according to DGIdb. For this task, we compute the mutation profile of each subclone by collecting all mutation events along the path from the root to any node in the mutation tree.
Tracking the mutation events which affect a specific target gene in the mutation tree can provide insights into its role in tumor growth and therapeutic resistance. Also, by selecting a target drug the user can make sense of the fraction of the tumor that is potentially impacted by the drug and can help find new treatment strategies.
3.3 K-nearest tree neighbors
A key question in cancer genomics is to identify recurring patterns of tumor evolution. Mutation tree cohorts can reveal such patterns in the order in which mutations accumulate in different trees, especially co-occurring and mutually exclusive mutation events, which have an essential role in shaping tumor progression. The problem of finding such patterns has been addressed by several existing tree distance metrics, which focus on handcrafted tree features, or infer pairwise tree distance scores by learning mutation tree embeddings (Baciu-Drăgan and Beerenwinkel 2024). OncotreeVIS incorporates the information about the pairwise distances between the mutation trees, if provided by the user, to project the trees in a 2D latent space. In the absence of a precomputed tree distance matrix we propose a greedy heuristic (Fig. 2, available as supplementary data at Bioinformatics Advances online, Algorithm 1, available as supplementary data at Bioinformatics Advances online) that approximates the problem of maximum matching with ordering constraints, which is NP-hard (Ritt 2009), and computes on the fly the nearest neighbors based on the provided matching labels. The matching trees are then displayed as an extension of the mutation tree information box (Fig. 1h).
4 Conclusions
OncotreeVIS is an interactive user interface to visualize mutation tree cohorts in order to assess the heterogeneity within and among tumors, tree posterior distributions computed by different tree inference tools, pairwise tree distances, predicted drug-tumor interactions, and different modes of evolution. Our visualization can support clinical decision making and enhance research on the genetic basis of tumor progression. OncotreeVIS is freely available under MIT license and can be used locally and integrated into more complex pipelines using the JavaScript library. Furthermore, we provide an online web application where everyone with a browser can visualize seven predefined public mutation tree cohorts or load their own data on the fly, while ensuring that all user-uploaded data remains on the client side without transmitting any information to a server.
Supplementary Material
vbaf247_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Baciu-Drăgan MA , Beerenwinkel N. Oncotree 2vec—a method for embedding and clustering of tumor mutation trees. Bioinformatics 2024;40:i 180–8.10.1093/bioinformatics/btae 214PMC 1121181738940124 · doi ↗ · pubmed ↗
- 2Bostock M , Ogievetsky V, Heer J. D 3: data-driven documents. IEEE Trans Vis Comput Graph 2011;17:2301–9.10.1109/TVCG.2011.18522034350 · doi ↗ · pubmed ↗
- 3Cannon M , Stevenson J, Stahl K et al DG Idb 5.0: rebuilding the drug-gene interaction database for precision medicine and drug discovery platforms. Nucleic Acids Res 2024;52:D 1227–35.10.1093/nar/gkad 1040 PMC 1076798237953380 · doi ↗ · pubmed ↗
- 4Ellson J , Gansner E, Koutsofios L, et al Graphviz—Open source graph drawing tools. In: Mutzel P, Jünger M, Leipert S et al (eds), Graph Drawing, Volume 2265 of Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2002, 483–4.
- 5Kuipers J , Jahn K, Raphael BJ et al Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res 2017;27:1885–94.29030470 10.1101/gr.220707.117PMC 5668945 · doi ↗ · pubmed ↗
- 6Miller CA , Mc Michael J, Dang HX et al Visualizing tumor evolution with the fishplot package for R. BMC Genomics 2016;17:880.10.1186/s 12864-016-3195-z PMC 510018227821060 · doi ↗ · pubmed ↗
- 7Pronier E , Quivoron C, Bernard OA et al JAK 2V 617F/TET 2 mutations: does the order matter? Haematologica 2011;96:638–40.21531945 10.3324/haematol.2011.042846 PMC 3084907 · doi ↗ · pubmed ↗
- 8Ritt M. Maximum matching with ordering constraints is NP-complete Ph.D. thesis, Universidade Federal do Rio Grande do Sul, 2009.
