Your institution may have access to this item. Find your institution then sign in to continue.

Title: An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory.
Authors: Liu, Jinming; Huang, Kai; Chen, Chen; Mao, Jian
Abstract: Imbalanced class distributions are common in real-world scenarios, including datasets with multiple labels. One widely acknowledged approach to addressing imbalanced distributions is through oversampling, a technique that both balances the class distribution and improves the effectiveness of classification models. However, when generating synthetic data for multi-label datasets, complexities arise due to the presence of multiple-label sets, which require careful placement and labeling. We propose MLCSMOTE-FRST, an algorithm for synthetic data generation based on label-specific clustering and fuzzy rough set theory. Generation ratios and dependency samples are provided by clusters specific to each label, with a focus on the overall label distribution and the distribution within each cluster. The labels are supported by intra-cluster positive samples, determined using fuzzy rough set theory, which helps to capture the consensus label set. Experimental results on multi-label datasets using four classifiers demonstrate the effectiveness of the proposed method in terms of macro-F1 and micro-F1 scores.
Publication: Complex & Intelligent Systems, 2024, Vol 10, Issue 5, p6267
ISSN: 2199-4536
Publication type: Article
DOI: 10.1007/s40747-024-01498-w

We found a match

An oversampling algorithm of multi-label data based on cluster-specific samples and fuzzy rough set theory.