Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Characters Recognition

Received May 03, 2018 Revised Jun 26, 2018 Accepted Jul 10, 2018 Optical character recognition is one of the emerging research topics in the field of image processing, and it has extensive area of application in pattern recognition. Odia handwritten script is the most research concern area because it has eldest and most likable language in the state of odisha, India. Odia character is a usually handwritten, which was generally occupied by scanner into machine readable form. In this regard several recognition technique have been evolved for variance kind of languages but writing pattern of odia character is just like as curve appearance; Hence it is more difficult for recognition. In this article we have presented the novel approach for Odia character recognition based on the different angle based symmetric axis feature extraction technique which gives high accuracy of recognition pattern. This empirical model generates a unique angle based boundary points on every skeletonised character images. These points are interconnected with each other in order to extract row and column symmetry axis. We extracted feature matrix having mean distance of row, mean angle of row, mean distance of column and mean angle of column from centre of the image to midpoint of the symmetric axis respectively. The system uses a 10 fold validation to the random forest (RF) classifier and SVM for feature matrix. We have considered the standard database on 200 images having each of 47 Odia character and 10 Odia numeric for simulation. As we have noted outcome of simulation of SVM and RF yields 96.3% and 98.2% accuracy rate on NIT Rourkela Odia character database and 88.9% and 93.6% from ISI Kolkata Odia numerical database.


INTRODUCTION
In the era of digital image processing, the character recognition is one of the significant and useful emerging research topics is the area of pattern recognition. The main intend of character recognition is to translate human readable character to machine readable code so that machine can efficiently recognize the character. There are mainly two broad category of character recognition system are found such as offline and online recognition process. In case of online character recognition process, it represents the two dimensional co-ordinates of successive points of the handwriting as a function of time are stored in particular order described by [1].where as in case of the offline handwriting, only the completed writing is available as an image describe by [2]. In this paper, our research intend confined with offline handwritten character recognition. Our recognition stage comprises of three broad stages including acquisition, feature extraction and classification step. Beside that a recognition system mostly depends upon a well-defined feature extraction procedure along with a good classifier, in order to achieve high success rate [3]. In order to achieve a good recognition system for handwritten format is quite still challenging because of variation in writing skills, shapes and orientation. Various approaches are followed up by different researcher to various scripts like Arabic, Chinese, and English etc are reported [4]. Basically odia script language is one the language which is derived from devangiri scripts. It is one of the regional languages of India, mostly spoken at eastern part (Odisha) and some south, north part of India. To achieve a good accuracy of recognition for handwritten characters of odia character is quite impressive. Though a good number of works has done for Indian regional languages but a less in number related to Odia script. In these past recent years different authors make an attempt for analysis with respect to Odia scripts are reported in [5]. The feature extraction technique for recognition of handwritten character is a challenging task in the research field of pattern recognition. In this regard a large number of feature extraction technique and classification algorithm have been presented in recent year described by [6]. Several character recognition technique of different language is found in many literatures [7][8][9]. In line to character recognition the extensive survey has been reported based on different kinds of feature extraction technique [10]. In this survey paper, author reported different feature extraction technique applied on template matching, projection histograms, deformable templates, contour profiles, unitary image transforms, zoning, graph description, zernike moments, spline curve approximation and fourier descriptors has been applied on gray level character, binary character, character contour , character skeleton and character graph image representation form in the pre processing steps. As the Indian language is concerned, the optical character recognition plays a vital role now days. In this paper we have made an attempt to design a novel approach that efficiently recognize the odia character by implementing angular measurement and euclidian distance by taking the midpoint from the axis, which was generated by taking the midpoint of two boundary edge of row symmetric axis as well as column symmetric axis to the centre of the images. Odisha state, so far has been able to uphold the pride of having the largest number of palm leaf manuscripts (over 20,000 manuscripts) in the world. [11]. Million books would have been printed from starting where "New Testament" that got printed in 1809 was first published. [12]. Odia got classical status except 5 other Indian languages on the basis of its literary heritage following approval of the Union cabinet.

RELATED BACKGROUND WORK
Odia script has been extracted using Bhrami scripts and one of the most ancient languages among Indian regional language most spoken eastern part of India basically in state Odisha, West-Bengal, Gujarat etc. The most important scenario of this language that it has no lower and upper case format. Here in the script is has no upper case lower structure. A certain well-defined approaches are adopted by different researchers to achieve high recognition rate. Recognition is the process of accepting the unknown samples of handwritten character image or words and then proceeds into a pattern recognition problem for testing. Recognition process can be achieved either in three important way, which is described as template matching, statistical technique and neural network techniques. These character recognition approaches uses either top down approaches or analytical strategies for recognition. Template matching is the simplest form of training and recognition. Here is the idea is to match the stored predefined prototype with the unknown handwritten characters. In this matching technique only selected pixel are compared with data samples and ruled based decision tree analysis. Rule based decision technique were used by chaudhuri et. at in 2002 [13]. Statistical technique considered as more effective while recognition of Odia characters. In this regard obaidullah et al [14] in 2014 uses the linear logistic regression model by using higher order statistical decision model to provide better performance rather than the linear model in performance. In 2007 pal et al [15] used quadratic function for classification is based on Bayesian estimation. In 2009 and 2005 a similar techniques of pseudo Bayesian estimation technique was adopted by waxabyashi et al [15] , and roy at al [16] for odia handwritten numerical recognition. They used conventional quadratic discriminant function. In 2006 Hidden Markov Model (HMM) was purposed by Bhowmik et al [17]. This is used non homogeneous quadratic method for training and recognition of handwritten numerical. In 2014 Dash et al [18]- [19] have adopted a Discriminative Learning Based Quadratic Discriminant Classifier (DLQDF) and Non-redundant Stockwell transform based feature extraction for handwritten digit recognition. Neural network is the parallel processing method having interconnection of neurons inside this technique. It perform computation at higher speed in comparison with statistical and template matching. Neural network can be performed either in two ways like feed forward network (FFNA) and back propagation network (BPNN). In 2013 mishra et al [20] perform the classification with BPNN and got a high accuracy of 90.44 percentage. In 2011 Majhi et al [21] authors have proposed a nonlinear neural network classifier it is an analogy of functional link artificial neural network (FLANN) classifier. In 2012 Chanda et al [22] propose a method for writer identification from Odia handwritings which uses the SVM for classification. In 2015 kalyan et al [23] purposed BESAC symmetric axis constellation model using classifier SVM, nearest neighbour and random forest having accuracy 98.90,

PROPOSED HANDWRITTEN CHARACTER RECOGNITION SYSTEM
In this section, we have made a novel technique that efficiently recognizes the odia character. The complete proposed method is described graphically in Figure. 1. These proposed systems are carried out by including the certain steps like Image like Image acquisition, pre-processing, feature extraction, and classification. The details discussion can be made in several sub-chapters in subsequent section.

Image Acquisition
As per our proposed methodology described above we have consider the standard database of odiya character named as Nit Rourkela Odia database, which was developed at NIT, Rourkela by Mishra et al. [20]. In this database they had composed of various 15040 numbers of images of both character and numerals. In this research analysis, we have considered 47 characters having 200 numbers of samples for our experimental study. The modern Odia script consists of 12 vowels, 3 vowel modifiers, 37 simple consonants, 10 numerical digits and about 159 composite characters (juktas). Odia script is a curved appearance of writing patterns on

Image Pre Processing
Pre-processing is an important step during the image acquisition process in order to get higher accuracy result by means producing noise free images as well as free of skewnes. In this analysis step, our pre-processing steps are done by using different phases like noise reduction, normalization, skew or slant adjustment and segmentation. The details description of these pre-processing steps are summarised in the following sub sections.

Noise Reduction
Noise is the unwanted output comes with the pixel intensity value in the scanned document whereas reduction of noise is the process of eliminating spurious points due to the poor sampling rate of the scanner.

Normalization
Normalization is the process of separating what data we get and what data we required. We adopt binarization as the intensity normalization in the pre-processing step. Then we adjust the size of each sample as 81*81 dimensions for size normalization.

Skew or Slant Adjustment
Skewness in the image undergoes some rotation of scanned image. This is very important to eliminate rotation in the pre-processing step. Rotation can be eliminated by implementing the elimination of degree of tilt angle and rotation of opposite direction.

Segmentation
Segmentation is the process of separation of text and non text area in the scanned handwritten document. It is the challenging part for pre-processing there are 2 types of segmentations can have in the preprocessing steps, external segmentation perform separation of paragraph, words or sentence from scanned documents whereas internal segmentation is the process of separation character from each word.

Feature Extraction
Feature extraction techniques are used to evaluate the uniqueness of each character image by which they differs from the rest character images. In this section we have implemented a unique algorithm for evaluation of feature vector by considering the mean distance of row, mean angle of row, mean distance of column and mean angle of column from centre of the image to midpoint of the symmetric axis respectively. All the operations were performed over skeletonized image of handwritten characters. Our feature extraction implementation is mainly focuses on five unique steps, and this is considered as the key feature values of our proposed system. The details of the five steps are described as follows: For the above all empirical calculation of our implementation methodology, we have developed two algorithms which were depicted in Algorithm. I and Algorithm. II respectively.
The proposed character recognition method are divided the images into two parts of operation and the first part operation included a chord that is drawn from each boundary pixel to straight of boundary pixel in row wise and the second part consisting a chord that is drawn from each boundary to straight of its boundary pixels in column wise and the complete description of these two steps are discussed in Algorithm 1 and Algorithm 2 respectively. For N no of boundary pixel and K no of boundaries; the number of available cord is (N/2)*k in row wise and column wise. However, we discard those boundary chords which having less than 3 pixels in that cords. The remaining cord is called row chords and column chords because these chords are present in the same row and same column. These chords are parallel present in row chords, which is presented in Figure 6 and the cord are vertically present in column chords presented in Figure 9 respectively. In our subsequent step we have group the row chords and column chords, in order to find symmetry axis from parallel row chords and vertically column chords. The midpoint of the parallel row chords and vertical column chords could generate a number of row symmetry axes as well as column symmetry axes which are presented in Figure 7 and Figure 10 respectively. In order to find the accurate symmetry axes to represent the perceptual parts, we propose midpoint criteria of the respective chords to be verified in the following method.

Classification
Classification is one of the important phases of any recognition model. According to our implementation model we have adopted a two way strategy for recognition. In this regard we have chosen two well liked classifier namely support vector machine (SVM) [24] and random forest tree (RFT) [25] for recognition of handwritten characters. After evaluating the desired key feature values we process these vector to classifier separately and noted down the overall recognition accuracy. We have first evaluated the SVM [16] classifiers which are multi-class classifier and supervised one. Secondly random forest tree [25] which is work based on the idea of bagging and random selection of features. All the performance was listed depending upon the value of the mean square error. And tells about which classifier is the best one.

RESULT AND DISCUSSION
All the implementation of our proposed method were carried out with the system having specification with windows 8, 64 bit operating system, and Intel (R) i7 -4770 CPU @ 3.40 GHz, and all the simulation is done through matlab14 (a) over a standard database. As per standard Database containing 200 samples from each of the 47 categories named as NIT Rourkela Odia database and considering numeric database from ISI Kolkata having 16 samples from each of the 10 categorised. After getting the four key feature vector values from each database as mean distance of row, mean angle of row, mean distance of column and mean angle of column from centre of the image to midpoint of the symmetric axis from each image. Hence total size of input for Odia character becomes 4*9400 and numeric character becomes 4*9400 and makes these as input to well defined classifier such as SVM and random forest and also performed the validation by implementing 10 fold-cross validations to the system. Consequently all the observation was counted to certain as 75, 25 ratio as training and testing. At first SVM classifier is implemented followed up by random forest classifier. We have also made a comparison analyses among these two classifiers, and listed 93.6% as the recognition rate for SVM and 98.2% for the random forest for NIT Odia character, similarly for ISI numeric character the recognition rate for both SVM and random forest as 88.91% and 96.3% respectively.

CONCLUSION
In this paper, we have presented an angular symmetric constellation technique for offline Odia characters recognition. This system uses row and column symmetric axis for generating four key feature vector values from each database as mean distance of row, mean angle of row, mean distance of column and mean angle of column from centre of the image to midpoint of the symmetric axis from each image. For classification purpose, SVM and RF model is used. An experimental result from this research gives satisfactory recognition result over the standard dataset, but still the development is in its infancy. Further, other techniques are to be explored for better recognition accuracy.