Missing data presents a common challenge for researchers and data scientists, prompting the use of multiple imputations by chained equations in epidemiologic research. This method is highly favored for its practicality and reliable aptitude to generate unbiased effect estimates and make valid inferences. When employing multiple imputation by chained equations, researchers can choose from various imputation methods, both parametric and nonparametric. Recent studies indicate that nonparametric tree-based methods may outperform parametric approaches, especially when dealing with interactions or nonlinear effects among predictor variables. Yet, these comparisons can be misleading if the parametric model does not include all effects present in the final analysis model, including interactions. Based on simulation results, it has been shown that integrating interactions into the parametric imputation model enhances its effectiveness in handling missing binary outcomes. While parametric imputation generally results in lower bias and slightly higher coverage probability for interaction effects, it tends to yield wider confidence intervals compared to tree-based methods, such as classification and regression trees. Furthermore, parametric imputation requires careful specification of the imputation model. Epidemiologists must be diligent in defining their
imputation models with multiple imputations by chained equations. This study contributes to the field by offering a balanced comparison between parametric and tree-based imputation methods for datasets featuring binary outcomes. |