Seol, J, Jung, J., Choi, Y. and Choi, Y.(2023). Supervised text data augmentation method for deep neural networks, Communications for Statistical Applications and Methods, 30(3), 343-354.
Recently, there have been many improvements in general language models using architectures such as GPT-3
proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data
is very small. Data augmentation that addressed this problem was more than normal success in image data. Image
augmentation technology signi?cantly improves model performance without any additional data or architectural
changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because
the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on
text data. We divide the data into signals with positive or negative meaning and noise without them, and then
perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to
generate new data.
Keywords: NLP, data augmentation