COTS software in science operations, is it worth it?
William O'Mullane, Nana Bach, Jose Hernandez, Alexander, Hutton, Rosario Messineo

TL;DR
This paper evaluates the effectiveness and decision-making process behind using Commercial Off-The-Shelf (COTS) software in scientific systems, highlighting considerations and assessing whether these choices are justified.
Contribution
It provides a case-based analysis of COTS software selection in scientific operations, offering insights into decision criteria and effectiveness.
Findings
COTS software can be cost-effective and efficient for scientific systems.
Decision processes for COTS selection vary based on project needs.
Assessing effectiveness of COTS use remains challenging.
Abstract
Often, perhaps not often enough, we choose Common Off the Shelf (COTS) software for integration in our systems. These range from repositories to databases and tools we use on a daily basis. It is very hard to assess the effectiveness of these choices. While none of us would consider a project specific word processing solution when LaTeX (or even Word) many will consider writing their own data management systems. We will look at some of the COTS we have used and attempt to explain how we came to the decision and if it was worth it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Research Data Management Practices
COTS software in science operations, is it worth it?
William O’Mullane1 and Nana Bach2 and Jose Hernandez1 and Alexander Hutton 2 and Rosario Messineo3 ,
1* European Space Astronomy Centre, P.O. Box 78, 28691 Villanueva de la Cañada, Spain*
2* Aurora for ESA at ESAC*
3* Altec Turin, Italy*
1 Introduction
The Gaia astrometric satellite is now in operations for more than two years. The first data release (Gaia Collaboration et al. 2016) was highly successful. Behind the operations there is a lot of software and collaboration. The Data Processing Ground Segment is jointly operated by ESAC and the Gaia DPAC. This is comprised of the Science Operations Centre (SOC) operated by ESA and a set of Data Processing Centres (DPCs). The SOC is also a DPC (DPCE for daily and astrometric processing), SOC also acts as the interface between the Mission Operations Centre (MOC) and the DPAC (See Figure 1). Gaia produces an impressive volume of raw data with about 50GB of uncompressed science data per day, yielding at mission completion a telemetry data volume of roughly 500TB. Transforming the data into scientifically meaningful quantities is the task of the Data Processing Analysis Consortium (DPAC). DPAC is comprised of a number of Coordination Units (CUs). Each one is responsible for a well-defined part of the Gaia data processing (Mignard et al. 2008). For each CU there is at least one Data Processing Center (DPC) with dedicated resources for the data processing of the CU.
2 The Main Database
The MDB (Main Database) is the central repository of all the data produced by Gaia Scientific Processing and DPAC. The MDB Dictionary, running at DPCE, is used by all to define the Datamodel of the MDB. There are actually a bunch of software systems incorporated in the MDB:
- •
MDB Database
- •
MDB Schema Creator
- •
MDB Dictionary Tool
- •
MDB Extractor
- •
MDB Ingestor
- •
MDB Integrator
- •
MDB Explorer
- •
MDB Data Manager
There are documents listing MDB Requirements and they may be summarised as follows:
- •
14 functional requirements
- •
**1 performance requirement **
- –
MDB Schema Creator: 1 functional requirement
- –
MDB Dictionary Tool: 24 functional requirements
- –
MDB Extractor: 12 functional requirements, 1 performance requirement
- –
MDB Ingestor: 10 functional requirements, 2 performance requirements
- –
MDB Integrator: 9 functional requirements
- –
MDB Explorer: 16 functional requirements
- –
Organically added functionality
MDB DataManager: no written requirements
3 Gaia Transfer System (GTS)
GTS (Gaia Transfer System) is the service that permits data exchange between the Data Processing Centres (DPCs). This is composed of Aspera (commercial) and the DTSTool (add on built within DPAC by Altec). Aspera is a COTS for fast data transfer, it provides the technical platform for the data transfers. We have found this an extremely useful product with very good support.
The DTSTool is responsible for building an interface between Aspera and the data processing software systems at each DPC.
Again we have documents containing GTS Requirements which may be summarised as:
- •
13 functional requirements
- •
**2 performance requirements **
- •
29 other requirements:
- –
Portability: 1
- –
User Interface: 5
- –
Execution and activation: 1
- –
File naming convention: 3
- –
Interface: 5
- –
Safety and Security: 12
- –
Performance: 3
4 Cost-effectiveness analysis - factors
For serious implementations, the real cost of software goes far beyond the license itself, and includes (see also et Al. (2017)):
- •
Time for implementation, evaluation, and integration, heavily influenced by:
The quality of documentation 2. 2.
The responsiveness of support staff 3. 3.
The availability of code examples and other learning aides
- •
Resources consumed by the use of the software.
- •
Reliability of the software across all possible use-cases (often the most expensive aspect of cheaper software).
- •
Flexibility and adaptability of the software relative to the competition.
To achieve real cost-effectiveness we recommend seeking the following qualities in server software for high-demand workflows:
- •
Memory-allocation independent (application terminates after execution).
- •
Fully separable from other functions.
- •
Availability of well defined and stable API in order to decouple the COTS from SW customisation.
- •
Proven track-record of reliability in high-volume production environments.
4.1 MDB Cost
We have a reasonable account of effort booked to the MDB work package 2005-2016:
- •
Effort person years incl. testing and documentation/support.
- •
Depending on our cost model that is a cost of €2.8M to €4.5M Bear in mind this must include all consumables, office space, phones, travel etc.. It also contains management overhead not just an individuals salary.
- •
Lets call it €3M.
4.2 GTS Cost
For Aspera we have a mix of licenses and development:
- •
Aspera licenses and support to date €158K
- •
DTSTool, customisation, automation etc. .. effort person years incl. testing documentation.
- •
Effort cost (using a similar rate to MDB but actually probably lower) €560k
- •
Total cost €
4.3 Cost effectiveness
There is No good way to do this! Constructive Cost Model (COCOMO) good BEFORE you build perhaps. Though similar in requirement the MDB feels much more complex than GTS if we use the functional requirements and say all were met that gives MDB a much higher complexity with 86 v 29 requirements.
- •
UNIT COST = Cost/Function
- •
MDB UC= 3M/86 = 34883
- •
GTS UC= 720K/29 = 24827
COST EFFECTIVENESS could be considered as the Ratio of in house to COTS software unit development cost. This would make COTS 1.4 times more cost effective. Of course we can only use COTS where appropriate, we could not find a tool or set of tools to do the job of the MDB.
5 Conclusion
To do this more effectively we should try harder to scope functional requirements in comparable way. The COTS items we can use in space science is fairly limited (DBMS, xfer other generic stuff ). But we should choose carefully - that small development can cost a lot cumulatively over our very long projects.
Open source is ok but can also die out over our long projects. In Gaia we supported Apache Common Math - now ONLY we support it as the community seems to have disappeared. Java as a sort of open platform worked out ok for Gaia do far - who knows what will happen in the next decade. The Eclipse IDE , MANTIS (now we use Jira) are all good there are peripheral tools which work and save a lot.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1et Al. (2017) et Al., R. G. 2017, in ADASS XXVI, edited by TBD (San Francisco: ASP), vol. TBD of ASP Con f. Ser., TBD
- 2Gaia Collaboration et al. (2016) Gaia Collaboration, Brown, A. G. A., Vallenari, A., Prusti, T., de Bruijne, J., Mignard, F., Drimmel, R., & co-authors, . 2016, Ar Xiv e-prints. 1609.04172
- 3Mignard et al. (2008) Mignard, F., Bailer-Jones, C., Bastian, U., Drimmel, R., Eyer, L., Katz, D., van Leeuwen, F., Luri, X., O’Mullane, W., Passot, X., Pourbaix, D., & Prusti, T. 2008, in A Giant Step: from Milli- to Micro-arcsecond Astrometry, edited by W. J. Jin, I. Platais, & M. A. C. Perryman, vol. 248 of IAU Symposium, 224. 0712.0889
