Multi-Class Document Image Classification using Deep Visual and Textual Features

Sevim, Semih; Ekinci, EKİN; İLHAN OMURCA, SEVİNÇ; Edinc, Eren; EKEN, SÜLEYMAN; Erdem, Turkucan; SAYAR, AHMET

doi:10.1142/s1469026822500134

Multi-Class Document Image Classification using Deep Visual and Textual Features

Atıf İçin Kopyala

Sevim S., Ekinci E., İLHAN OMURCA S., Edinc E. B., EKEN S., Erdem T., ...Daha Fazla

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, cilt.21, sa.02, 2022 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 21 Sayı: 02
Basım Tarihi: 2022
Doi Numarası: 10.1142/s1469026822500134
Dergi Adı: INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, Metadex, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: Document analysis and recognition, document classification, text mining, deep learning, NETWORKS
Kocaeli Üniversitesi Adresli: Evet

Özet

The digitalization era has brought digital documents with it, and the classification of document images has become an important need as in classical text documents. Document images, in which text documents are stored as images, contain both text and visual features, unlike images. Therefore, it is possible to use both text and visual features while classifying such data. Considering this situation, in this study, it is aimed to classify document images by using both text and visual features and to determine which feature type is more successful in classification. In the text-based approach, each document/class is labeled with the keywords associated with that document/class and the classification is realized according to whether the document contains the related key-words or not. For visual-based classification, we use four deep learning models namely CNN, NASNet-Large, InceptionV3, and EfficientNetB3. Experimental study is carried out on document images obtained from applicants of the Kocaeli University. As a result, it is seen ii that EfficientNetB3 is the most superior among all with 0.8987 F-score.