Realization of an intelligent evaluation system

ABSTRACT


ISSN: 2252-8776 
Realization of an intelligent evaluation system (Otman Maarouf) 285 variety of topics, including open-ended and programming-related questions, and automatically correct answers based on teachers' responses, then it gives a score of the student.Regarding paper's structure, there are two sections; the first one contains the different methods for calculating the semantic and syntaxique similarity between two sentences.The second section give the general architecture of the realized system, all tools used in this project are mentioned in this section, finally the conclusion will be given.

CONCEPT AND TYPES OF SIMILARITIES 2.1. Introduction
Measuring the similarity between two sentences (or short texts) consists in evaluating to what extent the meaning of these sentences is close [2].This task semantic textual similarity (STS) is often used in several important areas of the automatic language processing (TAL), among which we can mention the search for information, the categorization of texts [3], the summary of text [4], and the machine translation [5].When we talk about similarity we talk about classification, clustering (or clustering) to describe data partitioning and a cluster is then a set of data or elements with similarities.The description language of the objects of a database must make it possible to define the distance of this object from the others.
In the field of artificial intelligence, similarity is one of the criteria for computer analysis of clusters and for data partitioning.This automatic classification step is necessary for the implementation of the machine learning methods.Expert software also seeks to take into account the context, according to which the similarity may vary [6].The software will do a lot more relevant work as the attributes of the data will be useful and relevant in the context.

Syntactic similarity
Measuring syntactic similarity, "the syntactic word denotes in the linguistic sense a method of classification of languages according to the order of appearance of words in the sentence.Between words, the text runs in the field of data mining plays an important role [7].

Term frequency-inverse document frequency method
Term frequency-inverse document frequency (TF-IDF) est une approche de pondération reconnue par un poids, et qui est souvent utilisée dans la recherche d'information et l'exploration de texte [8].Ce poids est une mesure statistique numérique destinée à refléter l'importance d'un mot pour un document dans une collection ou un corpus [9].Typically, the weight of TF-IDF is composed of two terms: the first calculates the normalized TF, which is the number of times a word appears in a document, divided by the total number of words in that document [10].The second term is the IDF, calculated as the logarithm of the number of documents in the corpus divided by the number of documents containing the specific term.TF is defined as [11].

− Term frequency
We notice:  (, ) = {′ ∈ :  ′ = } where w is a word, and  = { 1 , … ,   } is a document such as: where;  , : the number of appearances of the word we want to calculate. , : the sum of all the words existing in the document by eliminating the punctuation, the spaces, the apostrophes.

−
Inverse document frequency with, || : total number of documents in the corpus |{  :   ∈   }| : number of documents where the term appears (i.e. , ≠ 0) − Calculation of TF-IDF Finally, to put it all together, the total weight of TF-IDF for a token in a document is the product of its TF and IDF weights:

Similarity cosine
Cosine similarity is frequently used as a measure of similarity between two documents [12].It may be a question of comparing the texts resulting from a corpus in an optic of classification or search of information (in this case, a vectorized document is constituted by the words of the request and is compared by measure of the cosine of the angle with vectors corresponding to all the documents present in the corpus, so we evaluate which ones are closest) [13].As the angle measurement between two vectors can only be done with numerical values, we must imagine a way to convert the words of a document into numbers.This is why we rely on the results of the previous TD-IDF method which is the weight of each word in a document and we can consider it as a vector.The cosine similarity between two documents d1 and d2 is a measure of similarity.It is a question of calculating the cosine of the angle between the vector representations of the documents to compare [14].

Semantic similarity
Semantic similarity is a concept in which a set of documents or terms are given a metric based on the similarity of their meaning/semantic content [15].Concretely, this can be done by defining a topological similarity, for example, by using ontologies to define a distance between words, or by defining a statistical similarity, for example by using a vector space model to correlate terms and conditions [16].Contexts from an appropriate body of text (co-occurrence) [17].Text similarity is a field of research whereby two terms or expressions are assigned a score based on the likeness of their meaning.Kocoń and Maziarz [18] short text similarity measures have an important role in many applications such as word sense disambiguation, synonymy detection, spell checking, thesauri generation, machine translation, information retrieval, and question answering [19].

Measure of Wu & Palmer [20]
In a domain of concepts, similarity is defined with respect to the distance between two concepts in the hierarchy and their position relative to the root.This similarity also takes into account the length of the original path  and the extremity   but also the depth of their most specific common subsuming, i.e. the length of the original path and  0 and the extremity [21].The similarity between C1 and C2 is: with, N1 is the distance between the concept C1 and the concept C3; N2 is the distance between the C2 concept and the C3 concept; N3 is the distance between the concept C3 and the root.
This measure has the advantage of being simple to implement and of having good performance than the other similarity measures [22].The Figure 1 describe the relationships between the conceptual C1, C2, C3, and root.

Similarity of Mihalcia [23]
Simple lexical correspondence is described in [23].The word-for-word similarity measures and a word specificity measure are used to estimate the semantic similarity of the sentence pairs [24].The following notation function was used [25]: where  (, ) is the maximum score between the word w and the words in T according to a wordfor-word similarity measure, and  () is the inverse document frequency of the word.A threshold of 0.5 was used for classification: a score above the threshold was classified as paraphrasing other than paraphrase.
According to Mihalcea et al. [23], takes into account the syntactic nature of terms and restricts comparisons of similarities to of the same syntactic nature: verbs, nouns, and adjectives between them.He tested several types of restrictions more or less binding related to the syntactic nature of the terms: from nouns/ nouns, adjectives/adjectives, and verbs/verbs to only proper nouns/proper nouns.In spite of what could be predicted by the similarity tests between terms according to their syntactic natures, these different tests all led to a very marked deterioration of the results.One might think, for example, that restricting a named entity to being comparable only to another named entity cannot damage the results, but experience has shown that this discrimination leads to a bad similarity between expressions such as "the Japanese president… 'And' in Japan, the president…" [26].The system whose results are given in the evaluations thus operates without any restriction as to the syntactic nature of the terms compared [27].Below is an example of calculating the similarity between two sentences.Table 1 shows the similarity matrix of Wu and Palmer [20]. 1 et  2 two sentences such as: i)  1 : eventually, a huge cyclone hit the entrance of my house and ii)  2 : finally, a massive hurricane attacked my home.From these results, we find that the two sentences  1 and  2 are similar.

DESCRIPTION OF THE REALIZED EVALUATION SYSTEM 3.1. Architecture of IES
The IES developed is a website for assessing learner's level of learning.It offers several opportunities, namely: the online passage of assessment tests by students, the preparation of exams that contain all kinds of questions including programming questions and open questions and automatic correction based on the answers provided by teachers.IES is an evaluation system developed to automate the process of assessing learners' competencies, with several parties communicating with each other.Figure 2 shows the architecture of IES.The IES system has a set of components that are: − Teacher space: this space allows teachers to register and authenticate in the platform to build an exam (questions, answers, and notation) and to enter the codes of students who are allowed to take the exam .− Database: the database contains the information of the professors and students, as well as the exams (questions and answers proposed by the professors).− Student area: this space allows students to register and to authenticate in the platform to pass exams.

Mechanism for correcting the open questions
To correct the open questions (the questions that have writing responses), the IES system makes use of the notion of semantic similarity; this similarity is calculated between the answer provided by the learner and the answer given by the teacher.The operation of the correction is done in several steps: i) the syntactic correctness of the answers, ii) sentence segmentation, labeling of speech parts, extraction of named entities using OPENNLP, iii) the elimination of stop words and the punctuation, and iv) the calculation of semantic similarity using the Mihalcea et al. [21] approach.

Syntax correction
The syntax correction of the sentence is a technique that allows to correct the syntax errors based on a corpus and a dictionary to correct a sentence syntactically, we need to determine the correction center of the sentence (the position of the word in the sentence that has a maximum frequency in the corpus) using ( 7) [28]: with  is the maximum frequency of the phrase words.The second step is the detection of the erroneous words of the sentence by doing a search and a comparison of the words of the sentence with the words of the dictionary, then the calculation of the distance between the erroneous words and the words of the dictionary, in order to recover all the words close to the erroneous word, by ( 8) and ( 9) [28]: In ( 8) is adopted to calculate the distance between the erroneous word and all the words in the dictionary.In ( 9) is used to retrieve all words close to the erroneous word.With _ is the maximum distance between the wrong word and the dictionary words.
Afterwards, the correction of the wrong word is based on the correct words of the sentence, the correction center and the list of words close to the wrong word.This processing applies a recursive technique [12] and the n-gram correction to the left and right to choose the correct word among the words of the list by calculating the frequency of each word in the list followed or preceded by a n correct word of the phrase [29].Then the correct word is that has a non-zero frequency with a maximum n.
If the position of the erroneous word exists after the correction center, then (10) is used.Otherwise, (11) is adopted.With  ̅ represents the wrong word.Finally, we can correct the sentence, using the correct word left or right and the recursive technique, by (12): with  < represent the words of sentence located before the word existing in the correction center. > represent sentence words located after the word existing in the correction center.

Segmentation of sentences and labeling of parts of speech
In this step, we used natural language processing (openNLP) to segment the sentence (answer of the question) and to make a labeling of each word of the sentence; this is adopted to calculate the semantic similarity between the answer proposed by the teacher and the other given by the student, using the approach of Mihalcea et al. [23].We generate the similarity matrix of Wu & Palmer words of sentences that have the same grammatical field to use (6).If we find a similarity greater than or equal to 0.75, we consider that the similar sentences, that is to say the answer given by the student is correct, so he takes the complete note of the question.If no, we calculate the question score ( 13) by multiplying the similarity of the two answers and the scale of the question.() = (,   ) *  (13) Where R is the answer of the student;   is the answer proposed by the professor;  is the full note of the question.
Figure 3 shows the different steps of the similarity calculation, starting with sentence cleaning, segmentation, generation of the similarity matrix and ending with the Mihalcea similarity calculation.

Application example
The statement of the question is as follows: "give the definition of java class".The answer proposed by the professor is as follows: "a class is a definition model for objects with the same set of attributes, and the same set of operations".The answer written by the student is: "a class is a model to generate and define the objects to have the same attributes and method".The note reserved for this question is 3 points.The correction of this type of question starts with the verification of the spelling errors, the spelling errors, in this example we found two errors "defined" and "th", after the syntax correction of this answer [28] we consider the new answer is "a class is a model to generate and define the objects for the same attributes and method".Figure 4 demonstrate an example to calculate the similarity between response of teacher and student.If we have a similarity between the proposed answer and the student's response less than 0.25, we consider that the answer is false, i.e., the grade that will be given to the student for this question is "0".Otherwise, if the similarity exists in [0.25, 0.75], we consider that the answer is partially correct by a percentage that is to say the student's note for this question equal to "sim * note by cons if the similarity is greater than or equal to 0.75, we consider that the answer is correct and the student will have the whole note.The similarity between the proposed answer and the student's answer in the example equals '0.71', so in this case the student's score is '0.71*3=2.13'.

CONCLUSION
In this end-of-studies project, we have developed an intelligent evaluation system (SEI), which makes it possible to assess learners effectively; this effectiveness lies in the diversification of the types of questions to be proposed in an evaluation.The particularity of this system compared to the existing one is that it added other types of questions, namely: i) programming issues where the learner must answer the question through a program using a programming language and ii) open questions where the learner has to write the answer of the question as a text.The realization of this system required several steps; starting with the study of the existing, then looking for improvement as well as resolution approaches, and finally the choice of tools and the development of the system.Several perspectives are conceivable, namely: i) adapt this evaluation system to correct all types of questions of all subjects and ii) use multilingual dictionaries and corpora to correct answers to questions from different languages.

Figure 3 .
Figure 3. Flowchart that presents the semantic similarity calculation process

Figure 4 .
Figure 4. Example of calculating the semantic similarity between two sentences sentence (student) and sentence (teacher)

291 Figure 5 .
Figure 5. Flow chart that explains the calculation process