• Evaluation of random forest regression for prediction of breeding value from genomewide SNPs

genomewide SNPs; penalized regression; prediction of breeding value; machine learning methods.

Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs ($P$ variables) is much higher than the number of genotypes ($n$ observations) ($P &gt;&gt; n$). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under $P &gt;&gt; n$. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy.

1. ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
2. ICAR-Indian Agricultural Research Institute, New Delhi 110 012, India
3. ICAR-Central Rice Research Institute, Cuttack 753 006, India

Posted on July 25, 2019