Prediction by Machine Learning Analysis of Genomic Data Phenotypic Frost Tolerance in Perccottus glenii
Lilin Fan, Xuqing Chai, Zhixiong Tian, Yihang Qiao, Zhen Wang, and, Yifan Zhang

TL;DR
This study employs machine learning techniques to analyze genomic data of Perccottus glenii, a fish with freeze tolerance, achieving high accuracy in identifying genes linked to this trait and demonstrating the effectiveness of ML over traditional methods.
Contribution
The paper introduces novel gene sequence vectorization methods, compares multiple classification models, and applies interpretability techniques to identify key genetic features for freeze tolerance.
Findings
Random Forest achieved 99.98% accuracy
K-mer encoding was optimal for sequence vectorization
Top features identified by SHAP relate to freeze tolerance
Abstract
Analysis of the genome sequence of Perccottus glenii, the only fish known to possess freeze tolerance, holds significant importance for understanding how organisms adapt to extreme environments, Traditional biological analysis methods are time-consuming and have limited accuracy, To address these issues, we will employ machine learning techniques to analyze the gene sequences of Perccottus glenii, with Neodontobutis hainanens as a comparative group, Firstly, we have proposed five gene sequence vectorization methods and a method for handling ultra-long gene sequences, We conducted a comparative study on the three vectorization methods: ordinal encoding, One-Hot encoding, and K-mer encoding, to identify the optimal encoding method, Secondly, we constructed four classification models: Random Forest, LightGBM, XGBoost, and Decision Tree, The dataset used by these classification models was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Quality and Safety Studies · Ecology and Conservation Studies · Climate change and permafrost
MethodsShapley Additive Explanations
