Assessing the Feasibility of Early Cancer Detection Using Routine Laboratory Data: An Evaluation of Machine Learning Approaches on an Imbalanced Dataset
Shumin Li

TL;DR
This study evaluates machine learning methods for early cancer detection in dogs using routine lab data, finding limited clinical utility due to weak signals and confounding factors, emphasizing the need for multi-modal data integration.
Contribution
It provides a comprehensive benchmark of 126 ML pipelines for cancer risk classification in dogs using real-world, imbalanced data, highlighting current limitations.
Findings
Optimal model achieved AUROC of 0.815 but low F1-score of 0.25.
High NPV of 0.98 indicates good rule-out potential.
Weak and confounded signals limit clinical reliability.
Abstract
The development of accessible screening tools for early cancer detection in dogs represents a significant challenge in veterinary medicine. Routine laboratory data offer a promising, low-cost source for such tools, but their utility is hampered by the non-specificity of individual biomarkers and the severe class imbalance inherent in screening populations. This study assesses the feasibility of cancer risk classification using the Golden Retriever Lifetime Study (GRLS) cohort under real-world constraints, including the grouping of diverse cancer types and the inclusion of post-diagnosis samples. A comprehensive benchmark evaluation was conducted, systematically comparing 126 analytical pipelines that comprised various machine learning models, feature selection methods, and data balancing techniques. Data were partitioned at the patient level to prevent leakage. The optimal model, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVeterinary Oncology Research · AI in cancer detection · Inflammatory Biomarkers in Disease Prognosis
