A METHOD FOR DIGITAL RECOGNITION OF GEORGIAN TEXT FROM IMAGES

Julieta Tabeshadze

Julieta Tabeshadze PhD Student in Informatics, Samtskhe-Javakheti State University, Rustaveli St. 124, Akhaltsikhe, Georgia http://orcid.org/0009-0008-7463-7801

Abstract

This article discusses the implementation of an optical character recognition (OCR) algorithm for the Georgian alphabet using the MATLAB programming environment. The aim of the study is to develop an effective system for the digital recognition of Georgian text that overcomes challenges related to low-resolution images, font variations, uneven background structure, and noise.

The proposed algorithm is based on several stages of digital image processing: initial image filtering, conversion to grayscale and binary modes, text segmentation, and comparison with binary reference matrices. For character recognition, a correlation analysis method is used, which identifies the characters extracted from a new image by comparing them with pre-formed templates.

The algorithm was tested on images with different qualities and structures. The results showed that under appropriate pre-processing conditions, high-accuracy recognition of Georgian text is achievable.

This study emphasizes the potential of digital image processing technologies in the digitization of the Georgian language and cultural heritage. The proposed method can also be adapted to other writing systems, giving the research both theoretical and practical significance.

Keywords: Optical character recognition (OCR); Georgian alphabet; text digitization; text recognition from images; binarization; segmentation; correlation analysis; MATLAB; reference templates.