Exploring data subsets with vtree
Nick Barrowman, Richard J. Webster

TL;DR
Variable trees are a new visualization method implemented in the vtree R package that facilitate exploration of nested data subsets, missing data, and study flow diagrams, enhancing reproducibility and open science.
Contribution
Introduction of variable trees and the vtree R package, providing a novel tool for exploring multivariate discrete data and comparing it with existing visualization methods.
Findings
Variable trees effectively reveal patterns in nested data subsets.
They assist in exploring missing data.
They can generate study flow diagrams directly from data.
Abstract
Variable trees are a new method for the exploration of discrete multivariate data. They display nested subsets and corresponding frequencies and percentages. Manual calculation of these quantities can be laborious, especially when there are many multi-level factors and missing data. Here we introduce variable trees and their implementation in the vtree R package, draw comparisons with existing methods (contingency tables, mosaic plots, Venn/Euler diagrams, and UpSet), and illustrate their utility using two case studies. Variable trees can be used to (1) reveal patterns in nested subsets, (2) explore missing data, and (3) generate study flow diagrams (e.g., CONSORT diagrams) directly from data frames, to support reproducible research and open science.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Data Visualization and Analytics · Forest ecology and management
