작성일
2020.11.12
수정일
2020.11.12
작성자
최용석
조회수
218

정호영(2018). Comparison of term weighting techniques for text classification

 

 

Abstract
The document-term frequency matrix is a general data of x-objects in text mining. In this study, we
introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency)
which is applied in the document-term frequency matrix and used for text classifications. In
addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently.
This study also provides a method to extract keyword enhancing the quality of text classifications.
Based on the keywords extracted, we applied support vector machine for the text classification. In
this study, to compare the performance term weighting schemes, we used some performance metrics
such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high
performance metrics and was optimal for text classification.

Keywords: term weighting, document classification, text mining, TF-IDF, keyword extraction

 

첨부파일