We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Improving Similarity Measures for Publications with Special Focus on Author Name Disambiguation.
- Authors
Shoaib, Muhammad; Daud, Ali; Khiyal, Malik
- Abstract
In many real-life text mining applications such as clustering academic documents, citation matching and author name disambiguation (AND), similar publications are grouped together by exploiting similarity among them. Most of AND approaches, especially unsupervised ones, focus on either proposing new/alternate algorithms and/or using new/alternate sources of information. Researchers from digital library community pay least attention to similarity measures. They try ready-made alternate measures to estimate optimum similarity among publications. These ready-made measures may not provide real picture of similarity among publications. In this work, we propose four similarity measures specially designed for author names, co-authors and short and long text segments. Our proposed measures provide more realistic picture of similarity among publications than previous measures. Our proposed measures can be applied in many real-life scenarios where either name entities (not necessarily human names) or text documents or both are compared in pairwise fashion. We compare our measures with Jaccard coefficient and state-of-the-art cosine measure based on vector space model. Experiments on synthetic and real data show that our proposed measures are more logical and realistic than baseline methods.
- Subjects
TEXT mining; ALGORITHMS; DIGITAL library software; PAIRED comparisons (Mathematics); COSINE function; VECTOR spaces
- Publication
Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ), 2015, Vol 40, Issue 6, p1591
- ISSN
2193-567X
- Publication type
Article
- DOI
10.1007/s13369-015-1636-7