Rahman, Atta Ur; Khan, Khairullah; Khan, Wahab; Khan, Aurangzeb; Saqia, Bibi

doi:10.4108/eai.19-12-2018.156081

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Unsupervised Machine Learning based Documents Clustering in Urdu.
Authors: Rahman, Atta Ur; Khan, Khairullah; Khan, Wahab; Khan, Aurangzeb; Saqia, Bibi
Abstract: The volume of data on the web is growing rapidly, due to the proliferation of news sources, contents, blogs and journals etc. Like other languages, the Urdu language has also observed tremendous growth on the internet. As the volume of data is expanding, information retrieval (IR) is becoming complicated. Document clustering is an unsupervised ML approach, employed to group a huge number of dispersed documents into a small number of significant and consistent clusters, thus providing a base for indexing, IR and browsing mechanisms. Documents clustering has a long tradition in English as well as English like western languages, but Urdu lags behind in terms sophisticated natural language processing (NLP) tools and resources for documents clustering. Documents clustering becomes a challenging task in Urdu language having a rich morphology, particular structure, syntax peculiarities and cursive nature. In this study, we have developed a framework of document clustering and analysed various similarity measures for Urdu documents. We have also checked the effect of stop words removal in the process of Urdu document clustering.
Subjects: MACHINE learning; DOCUMENT clustering; URDU language; INFORMATION retrieval; NATURAL language processing
Publication: EAI Endorsed Transactions on Scalable Information Systems, 2018, Vol 5, Issue 19, p1
ISSN: 2032-9407
Publication type: Article
DOI: 10.4108/eai.19-12-2018.156081

We found a match

Unsupervised Machine Learning based Documents Clustering in Urdu.

Rahman, Atta Ur; Khan, Khairullah; Khan, Wahab; Khan, Aurangzeb; Saqia, Bibi

MACHINE learning; DOCUMENT clustering; URDU language; INFORMATION retrieval; NATURAL language processing

EAI Endorsed Transactions on Scalable Information Systems, 2018, Vol 5, Issue 19, p1

2032-9407

Article

10.4108/eai.19-12-2018.156081