Semantic indexing-based data augmentation for filtering undesired short text messages
Johannes V. Lochter, Renato M. Silva, Tiago A. Almeida, Akebo Yamakami
ARTIGO
Inglês
Agradecimentos: We gratefully acknowledge the support of NVIDIA Corporation and the financial support provided by the São Paulo Research Foundation (FAPESP; grants #2017/09387-6, #2018/02146-6)
In the last years, spammers have taken advantage of the popularity of electronic media to spread undesired text messages. These may cause direct and indirect damages, such as dissatisfaction and exposure of users to misleading information and malicious content that can result in significant...
Ver mais
In the last years, spammers have taken advantage of the popularity of electronic media to spread undesired text messages. These may cause direct and indirect damages, such as dissatisfaction and exposure of users to misleading information and malicious content that can result in significant financial losses. Automatic filtering short text messages is a challenging problem nowadays because labeled datasets generally contain few instances and messages may have an insufficient amount of terms to be classified. In addition, the messages are rife with abbreviations, slang, and misspelled words making it difficult to generate a good computational representation. In this study, we propose an automatic data augmentation technique to increase the number of labeled instances and to improve the quality of the computational representation of short and noise text messages. We also proposed an ensemble approach to combine the predictions obtained by the classifiers using the messages generated by this technique. Experiments with three text representation techniques demonstrated that the ensemble approach improves the results obtained in the detection of undesired short text messages when the number of training instances is smal
Ver menos
FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP
2017/09387-6; 2018/02146-6
Fechado
DOI: https://doi.org/10.1109/ICMLA.2018.00169
Texto completo: https://ieeexplore.ieee.org/document/8614194
Semantic indexing-based data augmentation for filtering undesired short text messages
Johannes V. Lochter, Renato M. Silva, Tiago A. Almeida, Akebo Yamakami
Semantic indexing-based data augmentation for filtering undesired short text messages
Johannes V. Lochter, Renato M. Silva, Tiago A. Almeida, Akebo Yamakami
Fontes
Machine learning and applications (Jan., 2019), n. art. 18410612 |