Sustainable software development in science – insights from 20 years of Vanted
Falk Schreiber, Tobias Czauderna, Dimitar Garkov, Niklas Gröne, Karsten Klein, Matthias Lange, Uwe Scholz, Björn Sommer

TL;DR
This paper explores how to sustainably develop scientific software, using Vanted as an example of a tool maintained for 20 years.
Contribution
The paper presents five key insights for long-term maintenance of scientific software, based on a case study and 15-year analysis.
Findings
Fewer than one third of biological network tools are maintained for 15 years.
Vanted has been actively supported for 20 years, making it a rare example of sustainable software.
Five key insights are identified for long-term software development in science.
Abstract
Sustainable software development requires the software to remain accessible and maintainable over long time. This is particularly challenging in a scientific context. For example, fewer than one third of tools and platforms for biological network representation, analysis, and visualisation have been available and supported over a period of 15 years. One of those tools is Vanted, which has been developed and actively supported over the past 20 years. In this work, we discuss sustainable software development in science and investigate which software tools for biological network representation, analysis, and visualisation are maintained over a period of at least 15 years. With Vanted as a case study, we highlight five key insights that we consider crucial for sustainable, long-term software development and software maintenance in science.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1| Tool name | Year first publication | Ref. | Still available? |
|---|---|---|---|
| A-cell | 2001 | [ | No |
| aiSee | n/a | [ | No |
| Advanced pathway painter | 2003 | [ | Yes |
| Arcadia | 2008 | [ | Yes (†2014) |
| ArrayXPath | 2004 | [ | No |
| Atlas | 2005 | [ | Yes |
| AVIS | 2007 | [ | No |
| Biological concept diagram editor | 2008 | [ | No |
| BINViz | 2008 | [ | Yes (†2013) |
| Bio sketch pad | 2001 | [ | No |
| BioBiblioMetrics | 2000 | [ | No |
| BioCarta | n/a | [ | No |
| Biogranat | 2008 | [ | Yes (†2014) |
| BioGraphNet | 2004 | [ | Yes (†2013) |
| BioJAKE | 1999 | [ | No |
| BioLayout | 2000 | [ | Yes |
| BiologicalNetworks | 2005 | [ | No |
| BioMiner | 2002 | [ | No |
| BioPath | 2002 | [ | No |
| BioPathwise | 2007 | [ | No |
| BioPax | 2003 | [ | Yes |
| BioPP | 2007 | [ | No |
| BioSpice | 2002 | [ | Yes (†2012) |
| BioTapestry | 2005 | [ | Yes |
| BioUML | 2002 | [ | Yes |
| BNArray | 2006 | [ | Yes (†2006) |
| CADLIVE | 2003 | [ | Yes (†2010) |
| Cell illustrator | 2004 | [ | Yes |
| CellDesigner | 2003 | [ | Yes |
| CellNetAnalyzer | 2003 | [ | Yes |
| CentiBin | 2006 | [ | Yes (†2011) |
| Pathway tool software | 2005 | [ | No |
| COB editor | 2005 | [ | Yes (†2005) |
| CPN tools | 1999 | [ | Yes |
| Cell system markup language | 2000 | [ | No |
| CUtenet | 2000 | [ | No |
| Cyclone | 2005 | [ | Yes (†2013) |
| Cytoscape | 2003 | [ | Yes |
| DAVID | 2003 | [ | Yes |
| DBmcmc | 2003 | [ | No |
| Dynamic signaling maps | n/a | [ | No |
| E-cell | 1999 | [ | Yes |
| EMMA2 | 2008 | [ | Yes (†2013) |
| Edinburgh pathway editor | 2006 | [ | Yes (†2013) |
| ExPASy | 2003 | [ | Yes |
| ExPlain | 2006 | [ | Yes |
| GENAW | n/a | [ | No |
| GeneGobi | 2004 | [ | No |
| GenePath | 2003 | [ | Yes |
| GeneScene visualizer | 2005 | [ | No |
| GeneSpring GX | 2008 | [ | Yes |
| Genetic network analyzer | 2002 | [ | Yes |
| GeneVestigator | 2004 | [ | Yes |
| GeneView | 2007 | [ | No |
| GeneWays | 2003 | [ | No |
| GenMAPP | 2005 | [ | No |
| GenoLink | 2006 | [ | No |
| Genome3DExplorer | 2005 | [ | No |
| Genomic object net | 2003 | [ | No |
| GEOMI | 2006 | [ | No |
| Gepasi | 1989 | [ | Yes (†2002) |
| GEPAT | 2007 | [ | Yes (†2013) |
| GeXpert | 2006 | [ | Yes (†2013) |
| GlycoBrowser | 2008 | [ | No |
| GoMiner | 2003 | [ | No |
| Graphlet | 1999 | [ | No |
| GraphViz | 2003 | [ | Yes |
| Gravisto | 2004 | [ | Yes |
| Grid cellware | 2004 | [ | No |
| GridLayout | 2005 | [ | No |
| GSCope | 2006 | [ | No |
| IM browser | 2006 | [ | Yes (†2012) |
| InNetics PathwayLab | n/a | [ | No |
| INOH: pathways and ontologies | 2003 | [ | No |
| IntAct | 2004 | [ | Yes |
| Interviewer | 2005 | [ | Yes (†2004) |
| Ingenuity pathways analysis | 2004 | [ | Yes |
| J2dPathway | 2008 | [ | Yes (†2016) |
| Jarnac | 2000 | [ | Yes (†2012) |
| JDesigner | 2002 | [ | Yes (†2012) |
| jSquid | 2008 | [ | Yes (†2008) |
| KappaView | 2005 | [ | Yes (†2016) |
| KEGGanim | 2007 | [ | Yes |
| KGML-ED | 2007 | [ | No |
| KGraphViewer | 2007 | [ | Yes |
| Kinetikit | 2003 | [ | Yes (†2005) |
| KnowledgeEditor | 2002 | [ | No |
| MapMan | 2004 | [ | Yes (†2013) |
| MAPPFinder | 2003 | [ | No |
| MARGBench | 1999 | [ | No |
| Mavisto | 2005 | [ | Yes |
| megNet | 2005 | [ | No |
| Metabolic IsaViz | 2005 | [ | Yes (†2007) |
| MetaCore | n/a | [ | Yes |
| EcoCyc | 2002 | [ | Yes |
| MetaNetter | 2008 | [ | Yes |
| MetaReg | 2008 | [ | Yes (†2008) |
| metaSHARK | 2006 | [ | No |
| MetNet3D | 2005 | [ | No |
| MicroarrayDB | 2005 | [ | No |
| Metabolic network visualizer | 2003 | [ | No |
| MOVE | 2006 | [ | No |
| Narrator – a graph-based modelling tool | 2007 | [ | Yes (†2016) |
| NeAT | 2008 | [ | No |
| NetBuilder | 2002 | [ | No |
| Nodes3D | 2006 | [ | No |
| ONDEX | 2005 | [ | Yes (†2019) |
| ontoTools | 2007 | [ | Yes (†2012) |
| Osprey | 2003 | [ | No |
| Pajek | 1998 | [ | Yes |
| PathArt | n/a | [ | No |
| PathBank | 2006 | [ | No |
| PathBuilder | 2007 | [ | No |
| PathCase | 2003 | [ | No |
| PathDB | 2000 | [ | No |
| PathFinder | 2002 | [ | No |
| PathMAPA | 2003 | [ | No |
| pathSCOUT | 2003 | [ | No |
| PathVisio | 2008 | [ | Yes |
| Pathway analytics | 2007 | [ | No |
| Pathway assist | 2003 | [ | No |
| Pathway builder 2.0 | 2005 | [ | Yes |
| Pathway builder | 2005 | [ | No |
| Pathway processor | 2002 | [ | No |
| Pathway studio | 2003 | [ | No |
| Pathway tools | 2002 | [ | Yes |
| PathwayLab | 2009 | [ | No |
| PathwayLogic | 2002 | [ | Yes |
| Patika | 2002 | [ | No |
| PaVESy | 2004 | [ | No |
| Pathway hunter tool | 2005 | [ | Yes |
| PhyloGrapher | 2001 | [ | Yes (†2003) |
| PIMWalker | 2005 | [ | No |
| PIVOT | 2003 | [ | Yes (†2003) |
| PNE – pathway network editor | 2007 | [ | Yes (†2014) |
| Prefuse | 2005 | [ | Yes |
| ProcessDB | 2001 | [ | Yes |
| ProMoT | 2003 | [ | No |
| ProteoLens | 2008 | [ | Yes |
| PROTON | n/a | [ | No |
| ProViz | 2005 | [ | No |
| PubGene | 2000 | [ | Yes |
| PWComp | 2002 | [ | No |
| pyNetConv | 2005 | [ | Yes (†2013) |
| QPACA | 2006 | [ | No |
| Reactome | 2005 | [ | Yes |
| ROSPath | 2004 | [ | No |
| SBW | 2003 | [ | Yes |
| SHARKview | 2007 | [ | No |
| SHriMP | 2004 | [ | No |
| simBio | 2006 | [ | No |
| SimWiz | 2004 | [ | No |
| Snoopy | 2000 | [ | Yes |
| Sofia | 2008 | [ | No |
| Spike | 2008 | [ | Yes |
| SYCAMORE | 2008 | [ | No |
| Teranode design suite | 2008 | [ | No |
| BioPath | 2001 | [ | No |
| Tom sawyer | 2009 | [ | Yes |
| ToPNet | 2004 | [ | No |
| Pathway editor | 2005 | [ | No |
| Unipath | 2003 | [ | No |
| Vanted | 2006 | [ | Yes |
| Vector PathBlazer | n/a | [ | No |
| VisANT | 2004 | [ | No |
| VitaPad | 2005 | [ | Yes (†2013) |
| WebGestalt | 2005 | [ | Yes |
| WebInterViewer | 2003 | [ | No |
| WikiPathways | 2008 | [ | Yes |
| WilmaScope | 2002 | [ | Yes (†2013) |
| YANAsquare | 2007 | [ | Yes |
| yFiles | 2001 | [ | Yes |
- —Deutsche Forschungsgemeinschaft
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterdisciplinary Research and Collaboration · Scientific Computing and Data Management · Biomedical and Engineering Education
Introduction
1
Sustainable software development in science, that is long term development and maintenance of research software, is a challenging endeavour. It typically involves creating tools for data representation, integration, analysis, simulation, visualisation, and the communication of results for specific projects or research questions. At the same time, those tools should maintain accuracy, ensure reproducibility, provide ease of use and support long-term usability including proper licenses. For example, scientific results published in papers can often only be reproduced if the software used is still available and usable.
There are many software packages and workflows that have been developed to solve scientific questions, but few have successfully achieved long-term sustainability. There are several reasons for this, including that scientific projects and related software development are often only funded through short-term research grants, and that software maintenance does not significantly contribute to the career advancement of those involved. While the emphasis on publishing research papers remains strong and research software development is still often undervalued as a research activity, this issue was particularly problematic in the past. Other reasons are that there is often insufficient institutional support for long-term software maintenance, and that software is frequently developed by different research groups independently and without joining forces. As a result it is often difficult to create, maintain, improve, and use software effectively over the span of at least 15–20 years.
As example for sustainable and FAIR research software development, we will provide an overview of positive examples for software tools in the field of biological network representation, analysis, and visualisation that are still maintained after 15 years. Further, we will use Vanted (Visualisation and Analysis of Networks comprising Experimental Data) as a case study of software developed over the past 20 years. Vanted is a tool that was originally designed for the analysis and visualisation of biological networks and related data. Over the years, it has been developed into a software resource in various fields, particularly in systems biology, bioinformatics, and related disciplines. We will explore how Vanted is a model of sustainable software development in science, and investigate its design principles, development, community involvement, and adaptability to meet the changing needs of its user base. We will share the lessons learned, offering guidelines for sustainable software development.
The topic of this paper aligns very well with the 20th anniversary of the Journal of Integrative Bioinformatics (and this special issue dedicated to this event), in particular as: (1) the journal promotes software accessibility since its inception and provides with JIB.tools an environment for that [1], 2], (2) integrative bioinformatics is an important aspect of the Vanted development, for example, by novel ways for data integration, and (3) Vanted itself is as old as the journal.
Evolution of software development
2
Vanted incorporates key concepts of modern software development as outlined at the end of this section. Here we will give a very brief overview of the evolution of software development, which has been driven by technological advances, growing complexity in hardware and software, and changing user needs. Further, scientific software development progressed in parallel to, and was influenced by, the transition of pen and paper science to initial implementations and then to tools and libraries to provide software for reproduction and replicability [3].
Software development has been changed over time, and each phase can be characterised by specific challenges and methodologies:
The 1940s: Software development started when the first programmable computers were invented. At that time, software and hardware were not really considered distinct parts, as programming was closely tied to the physical design of the computers. Early universally programmable computers such as Zuse Z3 (1941) [4], ENIAC (1943–1946) [5], and UNIVAC I (1951) [6] were programmed low level using machine-specific code, therefore programs could not be easily transferred between different machines.
The 1950s: First high-level programming languages such as Fortran for scientific and engineering calculations (started 1953, first implementation 1957) and LISP for symbolic computation and (early) artificial intelligence research (started 1958, first implementation 1960) were developed. Those languages helped separating hardware from software, enabling programs to be written independently of the underlying computer architecture, and therefore making software development more general and improving portability across different computer systems.
The 1960s and 1970s: Due to increasing software complexity, the focus shifted towards improving software design and development methodologies, such as the imperative programming paradigm. This included declarative programming which emphasised creating programs in ways that were easier to write, understand, debug, and maintain. Moreover, object-oriented programming emerged as a paradigm that organises code into objects, which encapsulate both data and behaviour. The object-oriented approach has had significant impact on the development of modern software, improving modularity, code reusability, and maintainability. The term software engineering was already coined in the 1960s [7], but it was only later that this field gained wider influence.
The 1980s and 1990s: There was an increasing need for more formal software engineering practices to deal with issues such as software project management, platform independence, quality assurance, and scalability. Therefore, the concept of the software development lifecycle was developed. The waterfall model, one of the first formalised models, was introduced by Winston Royce in 1970 [8]. Although even Royce himself commented that it had major problems as testing only happened at the end of the process, it became a standard approach for many software projects. The problems of the waterfall model later led to the development of iterative and incremental models of software development such as Agile software development [9]. While approaches and frameworks such as Agile and Scrum gained widespread adoption, the “Agile is dead” discussion during the 2020s highlights ongoing debates about their relevance and effectiveness. More generally, many practices designed for commercial software development cannot be directly transferred to academic environments, which face unique challenges and aims.
The 2000s: The new millennium brought a great availability of open-source software. This changed the landscape of software development, enabling easier collaboration, reuse of code, and joint software development. Linux, the computer operating system created by Linus Torvalds and first released in 1991, is a prominent success story of the open-source movement. An important aspect of open-source software development is the community-driven development, where users and developers actively contribute to improve the software.
As we will discuss in more detail later, Vanted incorporates key concepts of modern software development:
- – Vanted is built on an object-oriented approach (using Java) that promotes code reusability, modularity, and maintainability.
- –Its development follows an iterative and incremental software engineering model that makes it easy to adapt to new projects and challenges.
- –The software is open-source to promote free use and community-driven development.
Rules and recommendations for good software development in science
3
What defines good modern software development, in particular the development of research software in science? How can flexibility, collaboration, and continuous improvement be fostered, and how can proper, maintainable code and an user-centric design be promoted? A wealth of literature exists on this topic, including guidelines for software development in science, such as the Ten Simple Rules for the Open Development of Scientific Software [10] which, in short, state:
- Rule: Don’t Reinvent the Wheel
- Rule: Code Well
- Rule: Be Your Own User
- Rule: Be Transparent
- Rule: Be Simple
- Rule: Don’t Be a Perfectionist
- Rule: Nurture and Grow Your Community
- Rule: Promote Your Project
- Rule: Find Sponsors
- Rule: Science Counts
Recently, the FAIR4RS initiative has introduced the FAIR principles for research software [11] in an attempt to tackle some of the challenges underlying current research software. In the context of research software FAIRification, the R set of principles advises software and its components to be usable and reusable. It is evident that research software management differs from research data management [11]. For research software to remain (re-)usable over time, it should be designed, implemented, and maintained as such continuously from the start.
While we concur with those guidelines, there are still crucial aspects that require greater attention, particularly concerning software sustainability over extended periods (15–20 years). Based on our two decades of experience developing Vanted, we would like to highlight the following additional recommendations:
- Recommendation: Long-Term Design and Adaptability
- Recommendation: Continuous Contribution to Science: Research in the Application Domain(s)
- Recommendation: Continuous Contribution to Science: Research in the Computer Science Domain
- Recommendation: Open-Source Nature and Community Contribution
- Recommendation: User-Centric Development and Support
Before discussing the details of our five recommendations, we will first provide an example of software sustainability in practice by examining tools for biological network representation, analysis and visualisation over the past 15 years in Section 4. Next, in Section 5, we will give an overview of the Vanted system, its architecture and its usage. We will present the five recommendations in Sections 6–10, supported by examples from the development of the Vanted software. We conclude with Section 11 to discuss our findings, outline further Vanted developments and present more details of the connection between 20 years of Vanted and to the 20 years of the Journal of Integrative Bioinformatics.
The reality of software sustainability – an example from biological network analysis and visualisation
4
In 2009, some of the authors conducted an unpublished study of available tools for biological network representation, analysis, and visualisation based on an extensive literature and web search. This collection was based on aspects such as data exchange (importing/exporting networks), dynamic visualisation (layout of nodes and edges, editing capabilities), visualisation elements (data representation using colour-coding, variations in node sizes or edge thickness), and data analysis (network analysis, statistics). The aim in 2009 was to extend the scope of earlier work by Pavlopoulos et al. [12] which tried to cover a broad range of use cases and focused on a few tools, a study by Saraiya et al. from 2005 [13], as well as a comparative study published by Suderman and Hallett in 2007 [14] which studied nearly 40 network visualisation tools. Please note that for certain tools, references to papers published after 2009 have been included when the tool was available online but no publication existed in 2009. Additionally, this collection reflects our knowledge as of 2009; a contemporary collection would include some different tools while omitting others.
During this study, 174 relevant tools were identified and investigated further. 51 out of these 174 tools were open-source, 24 commercial, 13 were web-based, and 39 tools mentioned in the literature were already no longer available on their respective websites in 2009. The tools with the most citations that time were DAVID [15] (818 citations), GeneVestigator [16] (738 citations), and Cytoscape [17] (618 citations). DAVID and GeneVestigator were well established in the scientific community and widely used for retrieval of information about microarray data, which explains the high citation count at the time. Cytoscape was primarily used for network analysis and visualisation, gaining traction as the field of systems biology expanded.
After 15 years, in 2024, we reviewed the status of the initially catalogued tools, see Table 1. The result of this investigation is as follows: 89 of the tools identified in the 2009 data collection (approximately 51 %) are no longer available. We categorize a tool as being “no longer available” if it is no longer accessible as defined in its first publication or the source code repository or download URL is no longer resolvable. Specifically, for none of these tools, even outdated or unsupported versions remain accessible from their developers. It should be noted that 36 tools (around 21 %) were already unavailable in 2009 (published before 2009 but not available in 2009). Therefore, 53 of the tools listed and still available in 2009 have become inaccessible over the last 15 years. Furthermore, 35 tools (around 20 % of all tools) can be classified as obsolete, as they are available, but have not received any updates in the last five years. In Table 1, we indicate obsolete tools with a dagger symbol followed by the exact year the last update was carried out. Of these 35 tools, 30 have remained without updates for (more than) a decade. This classification is based on evidence from release histories, website activity, copyright notices, and associated publications.
Importantly, with 48 tools, less than a third of all initial tools (about 28 %) are actively available or show no clear evidence of being unsupported. Here, “actively available” is defined as having undergone at least one update within the past five years. “No clear evidence of being unsupported” indicates that while recent updates could not be confirmed due to limited information, a functional download link is accessible. That means that the majority of tools is not available anymore. This has profound impact, for example, relating to the reproducibility of scientific studies conducted using those tools.
Vanted – a brief overview
5
As mentioned, Vanted is a tool for the exploration, analysis and visualisation of biological networks and related experimental data. The development of Vanted started in 2003/2004, with its first publication in 2006 [174], and version Vanted v2 [184] with significant changes to core plug-ins has been released in 2012. It should be noted that the underlying system is even older: Vanted is based on Gravisto [82], a general graph editing and layout system, which was already in development end of the 1990s. From Gravisto the Vanted software inherited a flexible and modular concept of plug-ins. Internally every major part of Vanted is a plug-in, and as of Vanted v2.8.8, there are 91 plug-ins in the core. Besides user interface and core framework functionality, plug-ins are organised into several sub-domains: (1) specialised import and export support, (2) database access, (3) network layout algorithms, and (4) other algorithms. Plug-ins for import and export include support for SBML [185], BioPAX [38], SBGN-ML [186], KGML (the exchange format for KEGG pathway maps) [187], GraphML [188], and GML [189] (Vanted’s default network format). Among the extensions for database access are plug-ins for KEGG [187], BioModels [190], MetaCrop [191], and RIMAS [192].
See Figure 1 for an overview of Vanted’s architecture. Vanted comprises a modular architecture, capable of supporting a range of different use cases and scenarios. In addition to the core’s internal plug-ins, external plug-ins, so called add-ons, are also part of the architecture. Vanted’s add-ons provide the necessary degree of customisation and extensibility to the core functionality, while assuring a proper separation of concerns. This concept has been important in the long-term maintenance of Vanted, such that Vanted’s core includes only the functionality which should be actively maintained and made compatible with past and new versions of its core, add-ons, and, of course, Java itself.
The architecture of Vanted consists of several classes of interfacing components. The core (shaded area) is responsible for overall architecture (class managers and interfaces), for network data structures, for diagram drawing and for the user interface. The core also consists of a number of plug-ins providing dedicated functionality. On top of the core, users write add-ons to extend for specific functionality. Java libraries provide external tools and interfacing with formats and standards. The core communicates with Vanted’s webserver to find updates. Any updates are deployed to the webserver through the deployment and update pipeline.
Developed using Java, Vanted embodies Java’s “write once, run everywhere” philosophy, enabling it to run on all three major operating systems with minimal additional setup. To install and run Vanted on a desktop, users need to have the Java Runtime Environment (JRE, available at www.java.com/download) pre-installed. To download and install Vanted, a user can simply download the latest version from the Vanted website [193].
Vanted is fully open-source, available on GitHub under the GPL-2.0 license. On GitHub, developers can also find a wiki and users can report issues. Add-on developers can use a so-called example add-on as a template. Tutorials, examples and further documentation are also available on the Vanted website.
Recommendation 1: long-term design and adaptability
6
Make the software modular and continuously adaptable from the beginning.
An important aspect of sustainable software development in science is creating systems that are flexible and adaptable enough to meet the (changing) demands of users while maintaining stability, reliability, and performance. A system should be modular to enable updates, adaptions, and additions of new functionalities without disrupting existing components. Vanted fulfils these principles through several design features:
- –Modular architecture: Internally, every major part of Vanted is a plug-in, which can be dynamically changed or replaced by new code. For example, different views of a network, such as a view “node-link diagram” or a view “statistical properties of the network” are implemented as plug-ins, being easily exchangeable. Another example are different algorithms, which are also implemented as separate plug-ins.
- –Extensible architecture: Vanted provides the concept of add-ons, external “plug-ins” which can provide new functionality without the need to change internal (core) functions and allow to add new algorithms, data types, visualisation techniques and similar without disrupting the core functionality.
- –Scalable performance: Leveraging the concept of plug-ins allows the core functionality to remain efficient even as datasets grow. For instance, visualising a network as a node-link diagram can be resource-intensive when dealing with networks (graphs) containing tens of thousands of nodes and edges with related data shown. The plug-in approach enables switching to a less resource-demanding view, for example, displaying only key information for large networks (such as statistical parameters) instead of all elements of the entire network.
- –Object-oriented approach: Using Java Vanted is based on an object-oriented approach that promotes code reusability, modularity, and maintainability in the long run.
This flexibility ensures that Vanted remains relevant as new scientific challenges and data types emerge. Several add-ons have been developed to extend the functionality, for example, FluxMap [194] for flux visualisation, PetriNets [195] to allow the simulation of Petri nets, CentiBiN [196] which extends the software to investigate different centrality measures in networks, FBA-SimViz [197] for interactive visualisation of constraint-based metabolic models, and SBGN-ED [198] for editing, validating, and translating of SBGN maps. Add-ons also offer a great way to encourage external (community) participation and enable the software to be used in teaching, such as allowing students to develop new functionality as part of their project.
However, we also learned that modifying the core system, as it was done for various reasons from version 1 to version 2 in 2012, introduced challenges. Due to changes in both the underlying Java system and related external libraries in version 2, not all add-ons remained compatible with the new Vanted version. However, separating functionality into core and add-ons allows us to prioritise resources to maintaining the core, while less relevant or outdated add-ons can be phased out over time.
Recommendation 2: continuous contribution to science: research in the application domain(s)
7
Make the software contributing to science – in the application area(s).
A key aspect of Vanted’s sustainability is its continued impact on scientific research. Over the past 20 years, the software has made significant contributions to the field of life sciences, enabling the visualisation and analysis of complex biological networks with related experimental data that were otherwise difficult to interpret. By facilitating the exploration of these networks, Vanted has helped researchers uncover new insights into cellular processes, disease mechanisms, and other important biological phenomena.
From our experience, it is crucial that scientific software is developed in continuous and close collaboration with partners from the application domain(s). This approach has been central to Vanted’s development, ensuring that it meets the needs of its diverse user base. Examples of such joint projects include plant sciences, where Vanted was initially developed in collaboration with plant scientists, e.g., [199], 200], animal sciences [201], medicine [202], 203], and microbiology [204], 205]. Additionally, Vanted has been used to build and graphically enhance with interactive visualisations biological databases such as MetaCrop [206], Rimas [192], and QSDB [207]. These collaborations have consistently brought new ideas and methods for the ongoing development of Vanted.
Vanted has been utilised in numerous external projects and publications, where it has often been explicitly cited; however, there are also instances where it is merely acknowledged or not mentioned at all despite its use. Examples from biology include its use in detailing the lipid composition in pollen [208], where it was applied to analyse and visualise the synthesis and breakdown of lipids, its contribution by means of statistical and visual analyses to the investigation of combined abiotic and biotic stress factors in plants [209], its use to analyse the central metabolism in developing oilseeds [210], and for the visualisation of flux and transcript data [211]. Examples from medicine, just one of the other application areas, include its use in the visual analysis of metabolites for their role in PD-1 blockade therapy in cancer research [212], to explore gut microbe metabolites in Alzheimer’s disease research [213], and to identify drug targets in Covid-19 disease mechanisms [214]. In visualising omics data, Vanted has been known to support “interactive editing particularly well” [215].
Recommendation 3: continuous contribution to science: research in the computer science domain
8
Make the software contributing to science – in computer science.
In addition to the application areas, Vanted has been used to drive novel methodological developments in computer science, in particular in the fields of network visualisation and topological network analysis.
Network layout methods are important for visual network exploration, and Vanted provides several well-known methods such as force-directed, stress minimisation, multi-level, grid and tree layout. However, existing network layout methods are often insufficient [216]. For example, hierarchical network representations (such as clustered graphs or networks of pathways) and their interactive exploration are often relevant for biological networks. Different novel approaches have been developed and implemented in the Vanted framework, already starting in 2007 with the dynamic exploration and editing of KEGG pathway diagrams [97]. To help preserve the mental-map [217] during the exploration of networks with clusters or sub-networks, Vanted provides methods which have been developed over the years. For example, NetPartVis to visualise non-overlapping clusters or partitions of graphs by laying out overview graph and sub-graphs (partitions) in a coordinated, mental-map preserving way [218], using glyphs, brushing, and topological information of the related pathways for interactive visualisation [219] as well as a decomposition method which is part of the LMME add-on [220]. We also investigated group-based visual transformations such as de-emphasising groups by opacity, position or size, aggregating groups and hiding groups to find the most suitable approach for the exploration of networks with clusters or sub-networks [221].
For the analysis of biological networks novel algorithms have been developed and usually implemented in Vanted, such as flux-based centrality analysis [222], motif-based centrality analysis [223], and pattern detection under different frequency concepts for the analysis of motifs in networks [224].
More recently, Vanted has been also used for analysing networks in other domains, for example, to investigate the relationship between Celtic knots and specific graphs [225].
These examples underscore the critical role of scientific contributions, such as new network analysis algorithms and novel visualisation methods, in pushing the boundaries of computer science during sustainable research software development. There has to be scientific progress in the application domain(s), but also in the computer science domain.
Recommendation 4: open-source nature and community contribution
9
Make the software open-source and support community involvement.
There are several advantages of open source software and community involvement, and the open-source nature of Vanted has been important for its longevity:
- –By making the software freely available and open for modification, researchers around the world can contribute improvements and bug fixes. This has helped build a diverse community of users and developers who contribute to the software’s enhancement. Examples include usage and extension independent of our team as done for the visualisations presented by Yugi et al. [226], as well as joint development of add-ons, for example, with the group of Andreas Kerren for glyph-based navigation of metabolic networks [219].
- –Our group has moved several times over the past 20 years including between different countries, as Vanted’s development sustained. Having the code open source has made it easier to transfer the code between different institutions or universities without facing licensing issues.
- –The open-source model ensures that Vanted can be integrated into a variety of workflows and customised to meet the specific needs of different research fields. An example is the inclusion of Vanted (and its SBGN-ED add-on [198]) into the ecosystem of the Covid-19 disease map community [227].
The ability to customise and adapt the software freely is a major advantage for long-term usability and maintainability. Vanted has embraced the concept of community-driven development, where users not only benefit from the software, but also contribute to its evolution, providing a sustainable cycle of usage, feedback, and improvement. A community-driven approach is crucial for the long-term support of software; it is a process that needs a low entry barrier to build an initial user base and then create further momentum, such as through events or other incentives.
Recommendation 5: user-centric development and support
10
Make the software user-friendly, also for non-expert users.
The development of Vanted was strongly driven by user needs, adapting the software based on feedback from both researchers and developers. This user-centric approach has helped that Vanted remains a relevant tool as new scientific challenges emerge. In particular, the use in collaborative projects has ensured that Vanted evolved in a way that addresses the real-world problems faced by its users. Early interdisciplinary work for the representation of experimental biological data in metabolic networks [228] and for analysing the topology of such networks led to prototypes called DBE-Gravisto [229], PatternGravisto [230] and MAVisto [104]. These early prototypes were instrumental for the successful development of Vanted which was directly based on experience with those prototypes and their early application to scientific questions such as presented by Rolletschek et al. [231]. Functionality of these early prototypes proved so useful in collaborative projects that it was later included into Vanted as plug-ins and add-ons. This user-focused collaboration continued over the years, recent examples of add-ons from interdisciplinary collaborations are PathwayNexus [232] for interactive metabolic data analysis together with the group of Marcel Leist, and for the layout of anatomical structures and blood vessels based on the foundational model of anatomy [233] together with Bernard de Bono.
Sustainability in scientific software development is also supported by comprehensive documentation. The Vanted developers have provided user guides, (video) tutorials and documentation that help users understand how to use the software effectively, even if they are new to bioinformatics and biological data analysis. There are also protocols and tutorials (e.g., [234], 235]) which provide guidelines and step-by-step training, and we provided several workshops and in-class tutorials to support users making their first steps with the tool and analysing their own data. The production of documentation and tutorials is time-intensive, but is important for onboarding new users and ensuring that users can take full advantage of the software.
In addition, it is often necessary to not only provide a tool but some broader ecosystem offering additional functionality necessary for data integration and analysis. For example, for Vanted this has included
- –ways to store experimental data such as DBE2 for the management of experimental data [236],
- –databases providing biological pathways such as MetaCrop [237] and RIMAS [192],
- –connections to other databases with experimental data such as OPTIMAS-DW [238], and
- –services to predict database links in biomedical databases [239], 240].
Conclusions
11
The development of Vanted over the past two decades is an example of how research software can be developed to achieve long-term sustainability. Its modular architecture, adaptability, open-source nature and community-driven development have allowed it to thrive and remain an important tool for researchers. This led to five insights that we consider crucial for sustainable, long-term software development and software maintenance in science:
- –Recommendation 1: Long-Term Design and Adaptability
- –Recommendation 2: Continuous Contribution to Science: Research in the Application Domain(s)
- –Recommendation 3: Continuous Contribution to Science: Research in the Computer Science Domain
- –Recommendation 4: Open-Source Nature and Community Contribution
- –Recommendation 5: User-Centric Development and Support
Vanted is one of the tools for biological network analysis and visualisation still available and maintained after 15 years, and only roughly one third of the initially developed and published tools fall in this category.
We plan to develop Vanted further in the future. We like to improve the inclusion of three-dimensional (3D) data, both 3D network data and spatial information. The importance of 3D in network visualisation has already been discussed many years ago [241], 242], and a Vanted-based version for early data integration of 3D data that allowed creating views on integrated multi-domain data including 2D, 3D and network data has been made available already in 2011 [243]. Newer developments based on the raise of Immersive Analytics [244], 245] include the integration and virtual reality exploration of biomedical data [246] and the support of different display modalities such as transitional or hybrid interfaces in Vanted [247], 248]. Here, we plan to further extend this, for example, by including network-based information (such as signal transduction and gene regulatory networks) in our 3D based work for spatial transcriptomics [249]. We also started to investigate suitable network visualisation approaches in 2D, 2.5D (see e.g., [250]) and 3D [251] and plan to further investigate this and include useful approaches, for example, as Vanted add-ons. Another direction of future developments is a better support for constraint-based layout algorithms to provide more precise and customisable visual representations of complex networks while preserving structural and relational context (i.e., the user’s mental map).
The focus of the Journal of Integrative Bioinformatics over the last 20 years is also related to the development of the Vanted framework. Since its inception, the journal has supported software accessibility, offering an environment for this purpose through JIB.tools, and Vanted provides free accessibility and supports long term maintenance. In addition, integrative bioinformatics, the key topic of the journal, is an important aspect of the Vanted development, for example, by novel ways for data integration. The Journal of Integrative Bioinformatics is also very committed to support the development of standards in systems and synthetic biology; special issues focusing on COMBINE [252] standards have been released regularly, offering updates since 2015 [253]. We believe that standards are very important, and Vanted not only supports several COMBINE standards such as SBGN, SBML and BioPax, but it is also a reference implementation for the SBGN [254] standard, supporting all three graphical languages (Process Description [255], Entity Relationship [256] and Activity Flow [257]) and providing additional functionality such as editing and validating SBGN maps [198], conversion of KEGG metabolic pathways to SBGN maps including automatic layout [258] and the translation of SBGN maps from Process Description to Activity Flow [259].
Sustainable software development in science, as demonstrated by the Vanted system, ensures tools not only remain relevant and functional over decades, but also continue to evolve. This enables researchers to address new challenges and fosters enduring contributions to scientific discovery.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Hofestädt R Kormeier B Lange M Schreiber F Sommer B Weise S JI Btools: a strategy to reduce the bioinformatics analysis gap J Integr Bioinform 20131010.1515/jib-2013-22629220913 · doi ↗ · pubmed ↗
- 2Friedrichs M Shoshi A Chmura P Ison JC Schwämmle V Schreiber F JIB.tools 2.0 – a bioinformatics registry for journal published tools with interoperability to bio.tools J Integr Bioinform 2019165910.1515/jib-2019-0059 PMC 707414131913853 · doi ↗ · pubmed ↗
- 3Chimani M Klein K Algorithm engineering: concepts and practice Experimental methods for the analysis of optimization algorithms Berlin Heidelberg: Springer 201013158
- 4Weiss E Konrad Zuse obituary IEEE Ann Hist Comput 1996183510.1109/mahc.1996.489747 · doi ↗
- 5Burks AW Burks AR First general-purpose electronic computer Ann Hist Comput 198133108910.1109/mahc.1981.100437596251 · doi ↗ · pubmed ↗
- 6Walker VC Pioneer day 1981: UNIVAC I Ann Hist Comput 198134007
- 7Wirth N A brief history of software engineering IEEE Ann Hist Comput 20083032910.1109/mahc.2008.33 · doi ↗
- 8Royce WW Managing the development of large software systems Proceedings of IEEE WESCON 200819
