EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

Missing Data Imputation for Categorical Variables.

Authors

Horníček, Jaroslav; Řezanková, Hana

Abstract

Dealing with missing data is a crucial part of everyday data analysis. The IMIC algorithm is a missing data imputation method that can handle mixed numerical and categorical datasets. However, the categorical data are crucial for this work. This paper proposes the new improvement of the IMIC algorithm. The two proposed modifications consider the number of categories in each categorical variable. Based on this information, the factor, which modifies the original measure, is computed. The factor equation is inspired by the Eskin similarity measure that is known in the hierarchical clustering of categorical data. The results show that as the missing value ratio in the dataset grows, better results are achieved using the second modification. The paper also shortly analyzes the advantages and disadvantages of using the IMIC algorithm.

Subjects

HIERARCHICAL clustering (Cluster analysis); MULTIPLE imputation (Statistics); MISSING data (Statistics); DATA analysis

Publication

Statistika: Statistics & Economy Journal, 2022, Vol 102, Issue 3, p249

ISSN

0322-788X

Publication type

Academic Journal

DOI

10.54694/stat.2022.3

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved