We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory.
- Authors
Liu, Jinming; Huang, Kai; Chen, Chen; Mao, Jian
- Abstract
Imbalanced class distributions are common in real-world scenarios, including datasets with multiple labels. One widely acknowledged approach to addressing imbalanced distributions is through oversampling, a technique that both balances the class distribution and improves the effectiveness of classification models. However, when generating synthetic data for multi-label datasets, complexities arise due to the presence of multiple-label sets, which require careful placement and labeling. We propose MLCSMOTE-FRST, an algorithm for synthetic data generation based on label-specific clustering and fuzzy rough set theory. Generation ratios and dependency samples are provided by clusters specific to each label, with a focus on the overall label distribution and the distribution within each cluster. The labels are supported by intra-cluster positive samples, determined using fuzzy rough set theory, which helps to capture the consensus label set. Experimental results on multi-label datasets using four classifiers demonstrate the effectiveness of the proposed method in terms of macro-F1 and micro-F1 scores.
- Publication
Complex & Intelligent Systems, 2024, Vol 10, Issue 5, p6267
- ISSN
2199-4536
- Publication type
Article
- DOI
10.1007/s40747-024-01498-w