We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Customized Information Extraction and Processing Pipeline for Commercial Invoices.
- Authors
Lai, Pierce; Mohan, Abhishek; Kim, Seok; Chu, Jung Soo Victor; Lee, Samuel; Kafle, Prabhakar; Wang, Patrick
- Abstract
Extracting information from scanned invoices and other commercial documents, a critical component of corporate function, typically requires significant manual processing. Much research has been conducted in the field of automated information extraction and document processing to alleviate the manual resources used for document analysis, but resultant literature and commercially available products have demonstrated limitations in customizability for identifying specific information. In this paper, we propose a customized machine learning-based pipeline for extracting and tabulating relevant key–value pairs from commercial invoice documents. Specifically, the pipeline combines general document understanding, OCR extraction, and key–value matching with custom rules pertaining to a provided invoice dataset. Then, we demonstrate that the pipeline greatly outperforms a commercially available product and can significantly reduce the amount of manual labor required to process invoice documents. Future work will focus on generalizing the pipeline, so as to apply it on more varied datasets.
- Subjects
DATA mining; INFORMATION processing; COMMERCIAL documents; INVOICES; LABOR process
- Publication
International Journal of Pattern Recognition & Artificial Intelligence, 2023, Vol 37, Issue 9, p1
- ISSN
0218-0014
- Publication type
Article
- DOI
10.1142/S0218001423540137