We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Bilingual Document Clustering: Evaluating Cognates as Features.
- Authors
Denicia-Carral, Claudia; Montes-y-Gómez, Manuel; Villaseñor-Pineda, Luis; Pinto-Avendaño, David
- Abstract
This paper focuses on the task of bilingual clustering, which involves dividing a set of documents from two different languages into a set of groups, so that documents with similar topics belong to the same group, regardless of their source language. It mainly considers a clustering approach that relies on the use of cognates as document features. Particularly, it proposes two straightforward methods that extract cognates from their own target document collection and do not require using any external bilingual resource, like parallel corpora or a bilingual dictionary. Experimental results in two bilingual collections that include news reports in English and Spanish are encouraging. They indicate that cognates are relevant features for the task of bilingual clustering, outperforming by more than 10% the results achieved by other known approaches.
- Subjects
COGNATE words; OBJECT-oriented databases; DOCUMENT clustering; ASSISTED searching (Information retrieval); INFORMATION retrieval software; INFORMATION services; INFORMATION services research
- Publication
Canadian Journal of Information & Library Sciences, 2011, Vol 35, Issue 3, p265
- ISSN
1195-096X
- Publication type
Article
- DOI
10.1353/ils.2011.0022