An OCR Engine for Printed Receipt Images using Deep Learning Techniques


Creative Commons License

Sayallar C., SAYAR A., Babalık N.

International Journal of Advanced Computer Science and Applications, cilt.14, sa.2, ss.833-840, 2023 (ESCI) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14 Sayı: 2
  • Basım Tarihi: 2023
  • Doi Numarası: 10.14569/ijacsa.2023.0140295
  • Dergi Adı: International Journal of Advanced Computer Science and Applications
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Compendex, Index Islamicus, INSPEC
  • Sayfa Sayıları: ss.833-840
  • Anahtar Kelimeler: benchmarking, deep learning, image processing, Optical Character Recognition (OCR), receipt
  • Kocaeli Üniversitesi Adresli: Evet

Özet

The digitization of receipts and invoices, and the recording of expenses in industry and accounting have begun to be used in the field of finance tracking. However, 100% success in character recognition for document digitization has not yet been achieved. In this study, a new Optical Character Recognition (OCR) engine called Nacsoft OCR was developed on Turkish receipt data by using artificial intelligence methods. The proposed OCR engine has been compared to widely used engines, Easy OCR, Tesseract OCR, and the Google Vision API. The benchmarking was made on English and Turkish receipts, and the accuracies of OCR engines in terms of character recognition and their speeds are presented. It is known that OCR character recognition engines perform better at word recognition when provided word position information. Therefore, the performance of the Nacsoft OCR engine in determining the word position was also compared with the performance of the other OCR engines, and the results were presented.