img

Notice détaillée

Feature reduction of unbalanced data classification based on density clustering

Article Ecrit par: Wang, Zhen-Fei ; Yuan, Pei-Yao ; Cao, Zhong-Ya ; Zhang, Li-Ying ;

Résumé: With the development of big data, the problem of imbalanced data sets is becoming more and more serious. When dealing with high-dimensional imbalanced datasets, traditional classification algorithms usually tend to favor the majority class and ignore the minority class, which results in poor classification performance. In this paper, we study the issue of high-dimensional imbalanced dataset classification and propose a feature selection algorithm based on density clustering and importance measure (DBIM). DBIM firstly constructs multiple balanced subsets by randomly under-sampling the majority classes with the same number of samples as the minority classes and uses DBSCAN as the base classifier. This process quickly discovers feature distribution features based on density and generates the initial feature subspace. To select features with a strong classification of class labels, we propose to rank and select the generated initial feature subspace according to their importance. To avoid the redundancy between features and generate high-quality feature subsets, we further propose to design a new class distribution-based weight index combined with the redundancy evaluation index in the DBIM algorithm to calculate between features. Experimental results on eight publicly available datasets show that the DBIM algorithm proposed in this paper can generate feature subsets with high relevance and low redundancy, and can effectively reduce the dimensionality of high-dimensional imbalanced datasets and improve the classification performance.


Langue: Anglais