We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A crowdsourcing approach to construct mono-lingual plagiarism detection corpus.
- Authors
Asghari, Habibollah; Fatemi, Omid; Mohtaj, Salar; Faili, Heshaam
- Abstract
Plagiarism detection deals with detecting plagiarized fragments among textual documents. The availability of digital documents in online libraries makes plagiarism easier and on the other hand, to be easily detected by automatic plagiarism detection systems. Large scale plagiarism corpora with a wide variety of plagiarism cases are needed to evaluate different detection methods in different languages. Plagiarism detection corpora play an important role in evaluating and tuning plagiarism detection systems. Despite of their importance, few corpora have been developed for low resource languages. In this paper, we propose HAMTA, a Persian plagiarism detection corpus. To simulate real cases of plagiarism, manually paraphrased text are used to compile the corpus. For obtaining the manual plagiarism cases, a crowdsourcing platform is developed and crowd workers are asked to paraphrase fragments of text in order to simulate real cases of plagiarism. Moreover, artificial methods are used to scale-up the proposed corpus by automatically generating cases of text re-use. The evaluation results indicate a high correlation between the proposed corpus and the PAN state-of-the-art English plagiarism detection corpus.
- Subjects
PLAGIARISM; CROWDSOURCING; CORPORA; ELECTRONIC records
- Publication
International Journal on Digital Libraries, 2021, Vol 22, Issue 1, p49
- ISSN
1432-5012
- Publication type
Article
- DOI
10.1007/s00799-020-00294-4