작성일
2020.10.27
수정일
2020.10.27
작성자
최용석
조회수
248

정민지(2017). Creation and clustering of proximity data for text data analysis(한국통계학회 논문 포스트)

Abstract


Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various
documents provided by the x-objects to be analyzed. When analyzing x-objects using this matrix, researchers
generally select only terms that are common in documents belonging to one x-object as keywords. Keywords
are used to analyze the x-object. However, this method misses the unique information of the individual
document as well as causes a problem of removing potential keywords that occur frequently in a specic
document. In this study, we de ne data that can overcome this problem as proximity data. We introduce
twelve methods that generate proximity data and cluster the x-objects through two clustering methods of
multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized
for clustering the x-object.


Keywords: text mining, proximity data, TF-IDF, multidimensional scaling, cluster analysis

첨부파일