Figure search by text in large scale digital document collections

Yurtsever, MUHAMMET; Ozcan, Muhammet; Taruz, Zubeyir; EKEN, SÜLEYMAN; SAYAR, AHMET

doi:10.1002/cpe.6529

Figure search by text in large scale digital document collections

Yurtsever M. M. E., Ozcan M., Taruz Z., EKEN S., SAYAR A.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, cilt.34, sa.1, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 34 Sayı: 1
Basım Tarihi: 2022
Doi Numarası: 10.1002/cpe.6529
Dergi Adı: CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
Anahtar Kelimeler: Apache Solr, document digitization, Elasticsearch, figure search, full-text search, regular expressions, RETRIEVAL
Kocaeli Üniversitesi Adresli: Evet

Özet

Digital document collections have been created with the transfer of a large number of documents to digital media. These digital archives have provided many benefits to users. As the diversity and size of digital image collections have grown exponentially, it has become increasingly important and difficult to obtain the desired image from them. The images on the document might contain critical information about the subject of it. In this study, an architecture is developed that can work on large-scale data by creating regular expressions together with full-text search approaches. The performance of the system has been tested on different academic documents and Elasticsearch and Apache Solr insert times are compared. Compared to Elasticsearch, Apache Solr achieved faster and more successful results.