작성일
2024.09.10
수정일
2024.12.31
작성자
최용석
조회수
76

Jung, J. and Choi, Y.S.(2024). SMOTE by Mahalanobis distance using MCD in imbalanced data, The Korean Journal of Applied Statistics, 37(4), 455-465.

Jung, J. and Choi, Y.S.(2024). SMOTE by Mahalanobis distance using MCD in imbalanced data, The Korean Journal of Applied Statistics, 37(4), 455-465.

 

SMOTE (synthetic minority over-sampling technique) has been used the most as a solution to the problem of imbalanced data. SMOTE selects the nearest neighbor based on Euclidean distance. However, Euclidean distance has the disadvantage of not considering the correlation between variables. In particular, the Mahalanobis distance has the advantage of considering the covariance of variables. But if there are outliers, they usually influence calculating the Mahalanobis distance. To solve this problem, we use the Mahalanobis distance by estimating the covariance matrix using MCD (minimum covariance determinant). Then apply Mahalanobis distance based on MCD to SMOTE to create new data. Therefore, we showed that in most cases this method provided high performance indicators for classifying imbalanced data. Keywords: imbalnaced data, Mahalanobis distance, MCD, SMOTE

첨부파일