Taec: a Manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature
Claire N\'edellec, Clara Sauvion, Robert Bossy, Mariya Borovikova,, Louise Del\'eger

TL;DR
This paper introduces Triticum aestivum trait Corpus, a new annotated dataset for extracting and linking wheat traits and phenotypes from scientific literature, facilitating improved text mining in plant breeding research.
Contribution
It provides the first comprehensive annotated corpus for wheat trait and phenotype extraction, enabling better training and evaluation of text mining tools in plant genomics.
Findings
The corpus contains 540 annotated PubMed references.
Tools trained on this corpus perform well in entity recognition and linking.
The dataset supports advancing text mining in wheat breeding literature.
Abstract
Wheat varieties show a large diversity of traits and phenotypes. Linking them to genetic variability is essential for shorter and more efficient wheat breeding programs. Newly desirable wheat variety traits include disease resistance to reduce pesticide use, adaptation to climate change, resistance to heat and drought stresses, or low gluten content of grains. Wheat breeding experiments are documented by a large body of scientific literature and observational data obtained in-field and under controlled conditions. The cross-referencing of complementary information from the literature and observational data is essential to the study of the genotype-phenotype relationship and to the improvement of wheat selection. The scientific literature on genetic marker-assisted selection describes much information about the genotype-phenotype relationship. However, the variety of expressions used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWheat and Barley Genetics and Pathology · Horticultural and Viticultural Research · Agricultural Productivity and Crop Improvement
MethodsOntology
