We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Image Text Extraction and Natural Language Processing of Unstructured Data from Medical Reports.
- Authors
Malashin, Ivan; Masich, Igor; Tynchenko, Vadim; Gantimurov, Andrei; Nelyub, Vladimir; Borodulin, Aleksei
- Abstract
This study presents an integrated approach for automatically extracting and structuring information from medical reports, captured as scanned documents or photographs, through a combination of image recognition and natural language processing (NLP) techniques like named entity recognition (NER). The primary aim was to develop an adaptive model for efficient text extraction from medical report images. This involved utilizing a genetic algorithm (GA) to fine-tune optical character recognition (OCR) hyperparameters, ensuring maximal text extraction length, followed by NER processing to categorize the extracted information into required entities, adjusting parameters if entities were not correctly extracted based on manual annotations. Despite the diverse formats of medical report images in the dataset, all in Russian, this serves as a conceptual example of information extraction (IE) that can be easily extended to other languages.
- Subjects
COMPUTATIONAL linguistics; OPTICAL character recognition; IMAGE recognition (Computer vision); NATURAL language processing; DATA mining; GENETIC algorithms
- Publication
Machine Learning & Knowledge Extraction, 2024, Vol 6, Issue 2, p1361
- ISSN
2504-4990
- Publication type
Article
- DOI
10.3390/make6020064