Morphometric dataset of Varanus salvator for non-invasive sex identification using machine learning
Ariff Azlan Alymann, Imann Azlan Alymann, Song-Quan Ong, Mohd Uzair Rusli, Abu Hassan Ahmad, Hasber Salim

TL;DR
This paper introduces a dataset of morphometric measurements for Varanus salvator to enable non-invasive sex identification using machine learning.
Contribution
The paper presents a new dataset and demonstrates machine learning models for non-invasive sex determination in Varanus salvator.
Findings
Six machine learning models were trained and tested to validate the dataset's utility for sex prediction.
The dataset includes morphometric measurements like weight, skull size, and tail length from confirmed-sex individuals.
The dataset offers a non-invasive alternative to traditional invasive methods for sex identification in V. salvator.
Abstract
Reliable sex identification in Varanus salvator traditionally relied on invasive methods like genetic analysis or dissection, as less invasive techniques such as hemipenes inversion are unreliable. Given the ecological importance of this species and skewed sex ratios in disturbed habitats, a dataset that allows ecologists or zoologists to study the sex determination of the lizard is crucial. We present a new dataset containing morphometric measurements of V. salvator individuals from the skin trade, with sex confirmed by dissection post- measurement. The dataset consists of a mixture of primary and secondary data such as weight, skull size, tail length, condition etc. and can be used in modelling studies for ecological and conservation research to monitor the sex ratio of this species. Validity was demonstrated by training and testing six machine learning models. This dataset has the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAmphibian and Reptile Biology · Animal Behavior and Reproduction · Wildlife Ecology and Conservation
Background & Summary
The study of morphological differences between the sexes in Varanus salvator is ecologically important, especially considering that the species is extensively used for the skin trade and that anthropogenic habitat disturbance is thought to influence the sex ratio^1,2^. Surprisingly, little attention has been paid to sex determination in V. salvator based on morphometric proportions of the body, although the general body morphology of the species has been extensively studied^3–5^. Within varanids, species show considerable variation in body size, with larger species often exhibiting more pronounced sexual dimorphism^6^, which is consistent with Rensch’s rule^7^. Specific features such as variations at the base of the tail (where the male hemipenes are located) and the proportions of the head shape have also been reported^8,9^.
Reliable sex determination of V. salvator in the field would facilitate the measurement of sex ratios, which is crucial for drawing conclusions about population dynamics in disturbed habitats. Currently, unambiguous sexing requires invasive methods in which the reproductive organs are measured during dissection or genetic analysis^9,10^. Less invasive methods, such as hemipenis inversion, are unreliable due to the possible for confusion between partially elongated male hemipenis and female hemiclitori but are still used in ecological studies^1^. Previous studies suggest that tail-to-body ratio, eye-to-ear length, and the extent of the tail base are potential features for sex determination in this species^4,9,11^. Therefore, there are many research questions that need to be answered by investigating the relationship between the sex of V. salvator and its morphology, and a dataset that allows statistical or machine learning modelling is crucial.
We present a morphometric dataset that provides a non-invasive method for sex prediction and can potentially improve the accuracy of sex determination in the field alongside the commonly practised hemipenes inversion. This dataset is useful for various fields, including machine learning engineers, app developers, data scientists, ecologists, herpetologists, conservationists, and others.
Methods
Sampling
This study sampled a total of 146 individual V. salvator; 83 females, 63 males. Lizards were sampled in a skin factory in Johor (location provided by Department of Wildlife and National Parks Peninsular Malaysia (PERHILITAN)). All lizards were sourced from oil palm plantations in Perak. The sample size was determined by the allocation provided to the researchers by the skin factory.
Dataset formation
Lizard morphological features measured included the following: thigh width (TW), base tail circumference (BTC), skull length (SL), skull width (SW), eye to ear length (EEL), snout-vent length (SVL), snout-tail length (STL), tail length (TL), and weight^1,3,5,9,11,12^. Length measurements were made using a flexible measuring tape whereas weight measurements were made using a handheld weighing scale. From the measurements made above, TW, BTC, SL, SW, EEL, and TL were divided by STL to derive relative proportions. Similarly, SW and EEL were divided by SL to derive relative head proportions. These variables were used for analysis, as some of the literature suggests relative proportions in body morphology and head dimension could be different between sexes^3,4,8^. Body condition was made by dividing weight by STL, similar to a body mass index^1^. Body size assessment involved a principal components analysis (PCA) performed on eight morphometric variables, namely TW, BTC, SL, SW, EEL, SVL, STL, and weight (similar to^13^). Component number 1 from the resulting PCA output was subsequently utilized as body size (Tables 1 and 2). Definitions of morphometric variables used for sex prediction are provided in Table 3.Table 1KMO and Bartlett’s test results indicating variables are suitable for PCA.KMO0.951Bartlett’s test of sphericityχ ^2^ (15) = 1225.159p < 0.001Table 2Eigenvalue and percentage of variance explained for all components.Component NumberEigenvaluePercentage of Variance Explained16.28878.59920.4285.34630.3224.02040.2703.36950.2453.05960.2012.51870.1361.69980.1111.389Total Percentage100Table 3Definition of morphometric variables used for sex prediction.No.VariableDefinitionType1.Body conditionWeight divided by STL (kg^cm^)Continuous2.Base tail circumference (BTC)Circumference of the tail after the cloaca (cm)Continuous3.BTC: STLBTC divided by STLContinuous4.Body sizePCA output of 8 morphometric variablesContinuous5.Eye to ear length (EEL)Length from the anterior tip of the eye to the posterior tip of the ear (cm)Continuous6.EEL: Skull LengthEEL divided by Skull LengthContinuous7.EEL: STLEEL divided by STLContinuous8.Skull length (SL)Length from the tip of the snout to the base of the skull (cm)Continuous9.SL: STLSL divided by STLContinuous10.Skull width (SW)Length of the broadest part of skull (cm)Continuous11.SW: STLSW divided by STLContinuous12.SW: SLSW divided by SLContinuous13.Snout-tail length (STL)Length form tip of snout to end of tail (cm)Continuous14.Snout-vent length (SVL)Length from tip of snout to vent (cloaca) (cm)Continuous15.Tail length (TL)Length of tail from cloaca to tip (cm)Continuous16.TL: STLTL divided by STLContinuous17.Thigh width (TW)Circumference of thigh at middle of thigh (cm)Continuous18.TW: STLTW divided by STLContinuous19.WeightWeight of animal (kg)Continuous20.SexMale = 0, Female = 1Categorical
Ethics statements
All authors confirm that we have complied with all relevant ethical regulations. A permit to conduct research on this species has been secured from PERHILITAN, license number P-00003-15-19; as well as animal ethics approval from Universiti Sains Malaysia, Animal ethics approval number USM/IACUC/2020/(123)(1064).
Data Records
The dataset is publicly available on Figshare at the link: 10.6084/m9.figshare.24558595^14^. Morphometric measurements were categorised according to sex (83 females, 63 males). The raw data were recorded in a physical data sheet predefined with the attributes and digitised into an Excel file and saved in CSV format. The data were checked, cleaned, and processed into independent variables that can serve as predictors and dependent variables according to the lizards’ sex.
Technical Validation
Pilot testing with basic model construction
A pilot study was conducted to validate the suitability of the dataset in predicting the sex of V. salvator. Six machine learning models were used: logistic regression, random forest, support vector machine, extreme gradient boosting, adaptive boosting, and gaussian naïve bayes. For training and validation, data were split 70% for training, and 30% for validation. Model construction, training and validation was conducted using Python programming in Google Colab workbook. The resulting confusion matrixes and model performances are summarized in Supplementary Table 1.
Usage Notes
This dataset contains morphological measurements form 83 females, 63 male V. salvator. However, it is important to acknowledge several limitations inherent to the dataset. Firstly, the data predominantly represents smaller individuals, as it was collected from individuals captured for the skin trade. Skin factories typically accept individuals weighing ≤5 kg, contributing to this size bias. Additionally, individuals from other habitats like forests and urban areas are notably absent from this dataset, given that the data collection exclusively pertained to animals sourced from oil palm plantations. Moving forward, to enhance the applicability of morphological data analysis, it is recommended to include individuals from wild populations in model training and validation. This inclusion could lead to the development of an app where inputting relevant morphological variables can determine the sex of wild individuals, allowing for easy sex identification in the field. Furthermore, future work could explore image-based means of sex identification, which could prove more time and cost efficient to conduct.
Supplementary information
Supplementary Table 1
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Twining JP Bernard H Ewers RM Increasing land-use intensity reverses the relative occupancy of two quadrupedal scavengers Plo S One 20171211310.1371/journal.pone.0177143 PMC 542670728494004 · doi ↗ · pubmed ↗
- 2Khadiejah S Razak N Ward-Fear G Shine R Natusch DJD Asian Water Monitors (Varanus salvator) remain common in Peninsular Malaysia, despite intense harvesting Wildl. Res.20194626527510.1071/WR 18166 · doi ↗
- 3Shine R Harlow PS Keogh JS Commercial harvesting of giant lizards: the biology of water monitors Varanus salvator in southern Sumatra Biol. Conserv.19967712513410.1016/0006-3207(96)00008-0 · doi ↗
- 4Shine R Ambariyanto Harlow PS Mumpuni Ecological traits of commercially harvested water monitors, Varanus salvator, in northern Sumatra Wildl. Res.19982543744710.1071/WR 97118 · doi ↗
- 5Koch A Auliya M Schmitz A Kuch UBöhme W Morphological studies on the systematics of South East Asian water monitors (Varanus salvator complex): nominotypic populations and taxonomic overview Mertensiella 200716109180
- 6FrýdlováP Frynta DA test of Rensch’s rule in varanid lizards Biol. J. Linn. Soc.201010029330610.1111/j.1095-8312.2010.01430.x · doi ↗
- 7Abouheif E Fairbairn DJA comparative analysis of allometry for sexual size dimorphism: assessing Rensch’s rule Am. Nat.199714954056210.1086/286004 · doi ↗
- 8Smith JG Brook BW Griffiths AD Thompson GG Can morphometrics predict sex in varanids?J. Herpetology 20074113314010.1670/0022-1511(2007)41[133:CMPSIV]2.0.CO;2 · doi ↗
