작성일
2020.10.27
수정일
2020.11.19
작성자
최용석
조회수
206

이수진(2020). Document classification using a deep neural network(한국통계학회 논문 포스트 및 출간 논문)

 

Abstract

 

The document-term frequency matrix is a term extracted from the documetns in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function TF-IDF to the generated document-term frequency matrix. In addition, we applied TF-IGM which is well known recently. We also generated a document-keyword weighted matrix by extracting keywords to improve document classification accuracy. Based on the keywords matrix extracted, we classify the documents using deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. As a result, model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy according to parameter changes was higher than TF-IDF. In addition, it was confirmed that the deep neural network showed better accuracy than the SVM. Therefore, we propose a method to apply TF-IDF and deep neural network in document classification.


 

Keywords: document classification, deep neural network, term weighting, text mining, keyword extraction

 

첨부파일