Zha, Daochen; Li, Chenliang

doi:10.1007/s10115-018-1280-0

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Multi-label dataless text classification with topic modeling.
Authors: Zha, Daochen; Li, Chenliang
Abstract: Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Pólya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.
Subjects: CLASSIFICATION; LABELS; BRAIN-computer interfaces; SEEDS; URNS
Publication: Knowledge & Information Systems, 2019, Vol 61, Issue 1, p137
ISSN: 0219-1377
Publication type: Article
DOI: 10.1007/s10115-018-1280-0

We found a match

Multi-label dataless text classification with topic modeling.

Zha, Daochen; Li, Chenliang

CLASSIFICATION; LABELS; BRAIN-computer interfaces; SEEDS; URNS

Knowledge & Information Systems, 2019, Vol 61, Issue 1, p137

0219-1377

Article

10.1007/s10115-018-1280-0