We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Imputation of missing dependent variable in binary logistic regression.
- Authors
Thammachoto, Tidarat; Samart, Klairung
- Abstract
Missing data are an important issue affecting data analysis. This study develops and compares methods of imputing missing data in binary logistic regression. Seven imputation methods are applied: mode imputation, hot deck imputation, multiple imputation (MI), k-nearest neighbour imputation, random forest imputation, logistic regression imputation (LR), and modified logistic regression imputation (MLR). Missing data are simulated in three conditions: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). The simulations were run using sample sizes of 20, 50, 100, 150, 200, 500 and 1,000 and missing percentages of 10%, 20%, 30% and 40%. The simulated missing data in the three conditions were applied to real-life heart disease data and the obtained data sets were analysed using the seven imputation methods. Performance was compared by estimating the mean square error of each analysis. The results reveal that when the missing data condition is either MCAR or MAR, the MLR method gives the best performance with small sample sizes (n = 50) at most levels of missing data, while the MI method gives the best performance with large sample sizes. For the MNAR condition, the LR method gives the best performance with small sample sizes for all levels of missing data.
- Subjects
MISSING data (Statistics); LOGISTIC regression analysis; DEPENDENT variables; RANDOM forest algorithms; SAMPLE size (Statistics); HEART diseases
- Publication
Maejo International Journal of Science & Technology, 2024, Vol 18, Issue 1, p61
- ISSN
1905-7873
- Publication type
Article