We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
From Theory to Practice: A Data Quality Framework for Classification Tasks.
- Authors
Corrales, David Camilo; Ledezma, Agapito; Corrales, Juan Carlos
- Abstract
The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.
- Subjects
DATA quality; DATA mining; ONTOLOGIES (Information retrieval); MACHINE learning; DATA scrubbing
- Publication
Symmetry (20738994), 2018, Vol 10, Issue 7, p248
- ISSN
2073-8994
- Publication type
Article
- DOI
10.3390/sym10070248