International Conference on Advanced Engineering, Technology and Applications (ICAETA), Catania, İtalya, 24 - 25 Mayıs 2024, ss.1, (Tam Metin Bildiri)
Document image classification has gained extensive attention
due to the rising number and types of scanned documents. Multimodal
architectures, processing image and text simultaneously, leverage
the strengths of each modality. This study explores an efficient neural
architecture for classifying scanned documents in a private company.
The effectiveness of CNN-based deep learning and OCR algorithms in
extracting textual and visual features is investigated. Different feature
fusion methods are applied in the next stage to combine these extracted
features. A multi-modal document image classifier is developed for companies
managing a large number of scanned documents, delivering superior
performance even with fewer and faint documents.