Correcting optical character recognition result via a novel approach

Otman Maarouf; Rachid El Ayachi; Mohamed Biniz

doi:10.11591/ijict.v11i1.pp8-19

Correcting optical character recognition result via a novel approach

Otman Maarouf, Rachid El Ayachi, Mohamed Biniz

Abstract

Optical character recognition (OCR) is a recognition system used to recognize the substance of a checked picture. This system gives erroneous results, which necessitates a post-treatment, for the sentence correction. In this paper, we proposed a new method for syntactic and semantic correction of sentences it is based on the frequency of two correct words in the sentence and a recursive technique. This approach starts with the frequency calculation of each two words successive in the corpora, the words that have the greatest frequency build a correction center. We found 98% using our approach when we used the noisy channel. Further, we obtained 96% using the same corpus in the same conditions.

Keywords

Correction center; Natural language processing; Optical character recognition; Recursive technique; Sentences correction; Tifinagh

Full Text:

PDF

DOI: http://doi.org/10.11591/ijict.v11i1.pp8-19

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSN 2722-2616
This journal is published by the Intelektual Pustaka Media Utama (IPMU).

View IJICT Stats

Username
Password
Remember me