Dairy farm records are a crucial component of effective livestock business management.
Record analysis allows a farm’s owner to make informed decisions. Incomplete records are
less useful for data analysis, so it's important to handle missing values correctly. This study
compares different imputation methods for handling missing values in a dataset of dairy records
comprising 997 records collected from 234 cows between 2012 and 2022. The dataset was
screened against records with missing values and then deleted, resulting in 858 observations
from 200 animals. There were missing values in two variables, with a missing percentage of
13.9%: days in milk (DIM) and total milk yield (TOTM). Then, cases with known values that
show the same percentages of missing data as the original dataset for DIM and TOTM are
randomly excluded. Five different imputation techniques were compared to obtain the best
imputation technique. These techniques include mean imputation, median imputation, power
regression imputation, multiple regression imputation, and expectation-maximization method
(EM). The results showed that the expectation maximization method was the best imputation
method for the data under study. It has the lowest mean absolute deviation MAD (37.54), the
lowest mean square error MSE (15425.07), the highest Spearman’s correlation coefficient
(0.967) and the second lowest mean absolute percentage error MAPE (5.27) for predicting the
missing data in missing variables. Power regression imputation comes after expectation
maximization (EM) in predicting missing values, as it gives results better than other imputation
methods but lower than Expectation-maximization (EM). |