Czech J. Anim. Sci., 2021, 66(1):1-12 | DOI: 10.17221/120/2020-CJAS
The assignment success for 22 horse breeds registered in the Czech Republic: The machine learning perspectiveOriginal Paper
- 1 Department of Animal Morphology, Physiology and Genetics, Faculty of AgriScience, Mendel University in Brno, Brno, Czech Republic
- 2 Department of Control and Instrumentation, Faculty of Electrical Engineering and Communication, University of Technology, Brno, Czech Republic
The paper demonstrates the dependability of assignment testing in the identification of an appropriate breed to monitor comprehensive genetic information from molecular markers to analyse the collection of real population data covering 22 horse breeds registered in the Czech Republic, including native breeds and genetic resources. If 17 microsatellites are used, the mean number of alleles per locus corresponds to 10.4. The count of alleles at the individual loci ranges between five (HTG07) and 17 (ASB17). The loci ASB02, ASB23, HMS03, HTG10, and VHL20 exhibit the highest gene diversity and observed heterozygosity (both above 80%), with the mean value of 0.77 and 0.73, respectively. The moderate total inbreeding coefficient (5.2%) is estimated across all the loci and breeds. The levels of apparent breed differentiation span from zero between the Czech Warmblood and Slovak Warmblood to 0.15 between the Shetland Pony and Standardbred. The phylogenetic breed relationships are revealed via the NeighbourNet dendrogram constructed from Reynolds' genetic distances, which clearly separate the Coldblood draught, Hot/Warmblood, and Pony horses. Our results reveal that the Bayesian approach (the Rannala and Mountain technique) provides the most intensive prediction power (83.6%) out of the GeneClass tools and that the Bayes Net algorithm exhibits the best efficiency (78.4%) from the WEKA machine learning workbench options, considering the use of the five-fold cross-validation technique. The algorithms could be trained on large real reference data sets, and thus there appears another viable perspective for machine learning in horse ancestry testing. In this context, it is also important to stress the fact that innovated computational tools will potentially lead towards structuring a novel webserver to allow the identification of horse breeds.
Keywords: accuracy; GeneClass analyses; individual breed assignment; DNA markers; WEKA algorithms
Published: January 29, 2021 Show citation
Supplementary files:
Download file | 120-2020 final ESM.pdf File size: 307.27 kB |
References
- Baudouin L, Lebrun P. An operational Bayesian approach for the identification of sexually reproduced cross-fertilized populations using molecular markers. Acta Hortic. 2001;546:81-93.
Go to original source...
- Bjornstad G, Roed KH. Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses. Anim Genet. 2002 Aug;33(4):264-70.
Go to original source...
Go to PubMed...
- Cavalli-Sforza LL, Edwards AWF. Phylogenetic analysis: Models and estimation procedures. Am J Hum Genet. 1967 May;19(3 Pt 1):233-57.
Go to original source...
- Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multilocus genotypes to select or exclude populations as origins of individuals. Genetics. 1999 Dec;153(4):1989-2000.
Go to original source...
Go to PubMed...
- Fages A, Hanghoj K, Khan N, Gaunitz C, Seguin-Orlando A, Leonardi M, McCrory Constantz C, Gamba C, Al-Rasheid KAS, Albizuri S, Alfarhan AH, Allentoft M, Alquraishi S, Anthony D, Baimukhanov N, Barrett JH, Bayarsaikhan J, Benecke N, Bernaldez-Sanchez E, Berrocal-Rangel L, Biglari F, Boessenkool S, Boldgiv B, Brem G, Brown D, Burger J, Crubezy E, Daugnora L, Davoudi H, de Barros Damgaard P, Chorro y de Villa-Ceballos MA, DeschlerErb S, Detry C, Dill N, do Mar Oom M, Dohr A, Ellingvag S, Erdenebaatar D, Fathi H, Felkel S, FernandezRodriguez C, Garcia-Vinas E, Germonpre M, Granado JD, Hallsson JH, Hemmer H, Hofreiter M, Kasparov A, Khasanov M, Khazaeli R, Kosintsev P, Kristiansen K, Kubatbek T, Kuderna L, Kuznetsov P, Laleh H, Leonard JA, Lhuillier J, von Lettow-Vorbeck CL, Logvin A, Lougas L, Ludwig A, Luis C, Arruda AM, Marques-Bonet T, Silva RM, Merz V, Mijiddorj E, Miller BK, Monchalov O, Mohaseb FA, Morales A, Nieto-Espinet A, Nistelberger H, Onar V, Palsdottir AH, Pitulko V, Pitskhelauri K, Pruvost M, Sikanjic PR, Papesa AR, Roslyakova N, Sardari A, Sauer E, Schafberg R, Scheu A, Schibler J, Schlumbaum A, Serrand N, Serres-Armero A, Shapiro B, Seno SS, Shevnina I, Shidrang S, Southon J, Star B, Sykes N, Taheri K, Taylor W, Teegen WR, Trbojevic Vukicevic T, Trixl S, Tumen D, Undrakhbold S, Usmanova E, Vahdati A, Ialenzuela-Lamas S, Viegas C, Wallner B, Weinstock J, Zaibert V, Clavel B, Lepetz S, Mashkour M, Helgason A, Stefansson K, Barrey E, Willerslev E, Outram AK, Librado P, Orlando L. Tracking five millennia of horse management with extensive ancient genome time series. Cell. 2019 May 30;177(6):1419-35.
Go to original source...
Go to PubMed...
- Fan B, Chen YZ, Moran C, Zhao SH, Liu B, Zhu MJ, Xiong TA, Li K. Individual-breed assignment analysis in swine populations by using microsatellite markers. Asian-Aust J Anim Sci. 2005 Dec 2;18(11):1529-34.
Go to original source...
- Funk SM, Guedaoura S, Juras R, Raziq A, Landolsi F, Luis C, Martinez AM, Musa Mayaki A, Mujica F, Oom MD, Ouragh L. Major inconsistencies of inferred population genetic structure estimated in a large set of domestic horse breeds using microsatellites. Ecol Evol. 2020 May;10 (10):4261-79.
Go to original source...
Go to PubMed...
- Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW. Genetic absolute dating based on microsatellites and the origin of modern humans. PNAS. 1995 Jul 18; 92(15):6723-7.
Go to original source...
Go to PubMed...
- Iquebal MA, Ansari MS, Dixit SP, Verma NK, Aggarwal RA, Jayakumar S, Rai A, Kumar D. Locus minimization in breed prediction using artificial neural network approach. Anim Genet. 2014 Dec;45(6):898-902.
Go to original source...
Go to PubMed...
- Jakobsson M, Rosenberg NA. CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics. 2007 Jul 15;23(14):1801-6.
Go to original source...
Go to PubMed...
- Koskinen M. Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog. Anim Genet. 2003 Aug;34(4):297-301.
Go to original source...
Go to PubMed...
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: An information aesthetic for comparative genomics. Genome Res. 2009 Sep 1;19(9):1639-45.
Go to original source...
Go to PubMed...
- Librado P, Fages A, Gaunitz C, Leonardi M, Wagner S, Khan N, Hanghoj K, Alquraishi SA, Alfarhan AH, AlRasheid KA, Der Sarkissian C, Schubert M, Orlando L. The evolutionary origin and genetic makeup of domestic horses. Genetics. 2016 Oct 1;204(2):423-34.
Go to original source...
Go to PubMed...
- Morota G, Ventura RV, Silva FF, Koyama M, Fernando SC. Machine learning and data mining advance predictive big data analysis in precision animal agriculture. J Anim Sci. 2018 Apr;96(4):1540-50.
Go to original source...
Go to PubMed...
- Nei M. Genetic distance between populations. Am Nat. 1972 May-Jun;106(949):283-91.
Go to original source...
- Nei M. The theory and estimation of genetic distance. In: Morton NE, editor. Genetic structure of populations. Honolulu: University of Hawaii Press; 1973. p. 45-51.
- Nei M, Tajima F, Tateno Y. Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol. 1983 Mar; 19(2):153-70.
Go to original source...
Go to PubMed...
- Orlando L, Librado P. Origin and evolution of deleterious mutations in horses. Genes. 2019 Aug 28;10(9): [16 p.].
Go to original source...
Go to PubMed...
- Paetkau D, Calvert W, Stirling I, Strobeck C. Microsatellite analysis of population structure in Canadian polar bears. Mol Ecol. 1995 Jun;4(3):347-54.
Go to original source...
Go to PubMed...
- Perez-Enciso M. Animal breeding learning from machine learning. J Anim Breed Genet. 2017 Apr;134(2):85-6.
Go to original source...
Go to PubMed...
- Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun 1;155(2):945-59.
Go to original source...
Go to PubMed...
- Putnova L, Stohl R. Comparing assignment-based approaches to breed identification within a large set of horses. J Appl Genet. 2019 Apr 8;60(2):187-98.
Go to original source...
Go to PubMed...
- Putnova L, Stohl R, Vrtkova I. Using nuclear microsatellite data to trace the gene flow and population structure in Czech horses. Czech J Anim Sci. 2019 Feb;64(2):67-77.
Go to original source...
- Rannala B, Mountain JL. Detecting immigration by using multilocus genotypes. PNAS. 1997 Aug 19;94(17):9197-201.
Go to original source...
Go to PubMed...
- Talle SB, Fimland E, Syrstad O, Meuwissen T, Klungland H. Comparison of individual assignment methods and factors affecting assignment success in cattle breeds using microsatellites. Acta Agric Scand. 2005 Aug;55(2-3):74-9.
Go to original source...
- Van de Goor LH, van Haeringen WA, Lenstra JA. Population studies of 17 equine STR for forensic and phylogenetic analysis. Anim Genet. 2011 Dec;42(6):627-33.
Go to original source...
Go to PubMed...
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY NC 4.0), which permits non-comercial use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.