Please use this identifier to cite or link to this item:
|Title:||Open-set web genre identification using distributional features and nearest neighbors distance ratio|
|Abstract:||Web genre identification can boost information retrieval systems by providing rich descriptions of documents and enabling more specialized queries. The open-set scenario is more realistic for this task as web genres evolve over time and it is not feasible to define a universally agreed genre palette. In this work, we bring to bear a novel approach to web genre identification underpinned by distributional features acquired by doc2vec and a recently-proposed open-set classification algorithm—the nearest neighbors distance ratio classifier. We present experimental results using a benchmark corpus and a strong baseline and demonstrate that the proposed approach is highly competitive, especially when emphasis is given on precision|
|Appears in Collections:||IC - Artigos e Outros Documentos|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.