Correcting Optical Character Recognition Result via a Novel Approach

Otman Maarouf, Rachid El Ayachi, Mohamed Biniz

Abstract


Optical Character Recognition (OCR) is a recognition system used to identify the contents of a scanned image. Sometimes, this system gives erroneous results, which necessitates a post-treatment, called Natural Language Processing (NLP), for the sentence correction. In this paper, we propose a new method for syntactic and semantic correction of sentences; it is based on the frequency of two correct words in the sentence and a recursive technique. This approach starts with the frequency calculation of each two words Successive in the corpora, the words that have the greatest frequency build a correction center

Keywords


OCR;NLP; Tifinagh; Sentences correction; Recursive technique; Correction center



DOI: http://doi.org/10.11591/ijict.v11i1.pp%25p

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View IJICT Stats