Please use this identifier to cite or link to this item:
Type: Artigo de evento
Title: On The Validity Of A New Sms Spam Collection
Author: Hidalgo J.M.G.
Almeida T.A.
Yamakami A.
Abstract: Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers. © 2012 IEEE.
Rights: fechado
Identifier DOI: 10.1109/ICMLA.2012.211
Date Issue: 2012
Appears in Collections:Unicamp - Artigos e Outros Documentos

Files in This Item:
File Description SizeFormat 
2-s2.0-84873576061.pdf222.88 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.