We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A Spanish dataset for reproducible benchmarked offline handwriting recognition.
- Authors
España-Boquera, Salvador; Castro-Bleda, Maria Jose
- Abstract
In this paper, a public dataset for Offline Handwriting Recognition, along with an appropriate evaluation method to provide benchmark indicators at sentence level, is presented. This dataset, called SPA-Sentences, consists of offline handwritten Spanish sentences extracted from 1617 forms produced by the same number of writers. A total of 13,691 sentences comprising around 100,000 word instances out of a vocabulary of 3288 words occur in the collection. Careful attention has been paid to make the baseline experiments both reproducible and competitive. To this end, experiments are based on state-of-the-art recognition techniques combining convolutional blocks with one-dimensional Bidirectional Long Short Term Memory (LSTM) networks using Connectionist Temporal Classification (CTC) decoding. The scripts with the entire experimental setting have been made available. The SPA-Sentences dataset and its baseline evaluation are freely available for research purposes via the institutional University repository. We expect the research community to include this corpus, as is usually done with English IAM and French RIMES datasets, in their battery of experiments when reporting novel handwriting recognition techniques.
- Subjects
INTERNATIONAL Association of Machinists &; Aerospace Workers; SPANISH language; LONG-term memory; HANDWRITING
- Publication
Language Resources & Evaluation, 2022, Vol 56, Issue 3, p1009
- ISSN
1574-020X
- Publication type
Article
- DOI
10.1007/s10579-022-09587-3