Computational solution of networks versus cluster grouping for social network contact recommender system

Received Jan 26, 2020 Revised Feb 20, 2020 Accepted Mar 13, 2020 Graphs have become the dominant life-form of many tasks as they advance a structure to represent many tasks and the corresponding relations. A powerful role of networks/graphs is to bridge local feats that exist in vertices as they blossom into patterns that help explain how nodal relations and their edges impacts a complex effect that ripple via a graph. User cluster are formed as a result of interactions between entities. Many users can hardly categorize their contact into groups today such as “family”, “friends”, “colleagues” etc. Thus, the need to analyze such user social graph via implicit clusters, enables the dynamism in contact management. Study seeks to implement this dynamism via a comparative study of deep neural network and friend suggest algorithm. We analyze a user’s implicit social graph and seek to automatically create custom contact groups using metrics that classify such contacts based on a user’s affinity to contacts. Experimental results demonstrate the importance of both the implicit group relationships and the interaction-based affinity in suggesting friends.


INTRODUCTION
Group communication has brought about more effective interaction that is usually validated amongst a cluster of people. It seeks to broadcast data over a communication channel and/or medium to a cluster of peoplerestricting peer-to-peer communications. Email, text and social platforms allow the support for group conversations and consequently, sharing of data in formats like photo, links and document [1]. Despite the prevalence of such group communication, users spend less time creating and maintaining custom contact group. Social platforms provide users with exclusive relationship and links with their corresponding contacts. Thus, it is common practice that some users can identify a person as a friendeven when they do not know him/her. Treating all contacts in same manner has been the basis for fraud and other social engineering vices. There is the need for users to differentiate and classify into groups, their personal contacts as this is quite a safe and unrestrictive practice. Allowing users to curb and minimize the fears associated with contacts data sharing and interaction on a social networks. Many users have had to quit groups when close relations (family, friends and colleagues) are added or removed from the platform [2].
Many cluster relationships are easily modeled as social graph that are implemented via social networks. A graph is a symbolic representation of a network and of its connectivity as a structure of linked nodes and their relationship. Mathematically, a graph is a set of vertex (node) V, connected by edges E and denoted as G = (V, E). A social graph interconnects users showing their relationships on a social network especially in relation to a user's egocentric personal social graph/network [3]. It denotes a discrete graph containing vertices, linked together by edges that describes nodal relationship between these entities [4,5]. Thus, it describes a socially weighted network that analyses relationship or ties between users, entities or objects by means of their interaction as they share data.
Literature Review: Types of Social Graphs: There are two kinds of social graph namely in the light of social discuss namely: explicit and implicit graph relationship. The implicit social graph describes an interaction between users, their contacts and group of contacts. It defines a graph whose vertices are not represented as explicit data objects in memory; But rather, are determined algorithmically from some more concise inputs. It is also a graph whose edges are weighed by feats such as frequency, recency, and direction of interaction between users and their contacts and their group of contacts. They are used to identify clusters of contacts who form groups that are meaningful and useful to each user [1,6].
Conversely, explicit graphs are such that two-individuals deliberately and mutually describe their connection with one another. Thus, such graphs can be mined more easily, since they begin with hard data, and not algorithms that will be hard for competitors to replicate in the future. It is best understood as truly personal and social [7]. Thus, explicit graphs are rare and examples include Facebook and LinkedIn. Thus, groups change dynamically as new users are added to multi-party communication threads; while, others are also removed. Thus, a person's individual relationship dynamically evolve and changes over time as a friend becomes one's family, a colleague becomes a friend, a friend becomes a colleague etc. The need to consistently update all relationships users have with their contacts, require constant maintenance, which is tedious and time consuming feats [2].
Navigating Many Clustering Models: A graph attempts to bridge local features, as they blossoms into global patterns, to explain how nodal relationships impacts a complex effect that ripple through a population system. Each node shapes a graph's evolution as need arises. Social graphs seeks two goals: (a) better understand how networks evolve, and (b) study the dependent social processes like innovation diffusion and data retrieval via models to specify how local interaction of nodal feats are explored to a global pattern [8]. Graph binds together nodes via a predefined model so that we can effectively analyze its entities along theories that sought to explain the inherent observed patterns [9][10]. Thus, they propagate local feats present in the nodes that eventually emerge as global patterns. It examine dynamics in relationship between nodes as well as helps to locate all the influential entities within such a networkas it theoretically, allows connection convergence of nodes [5].

Statement of problem
The following problems are to be addressed: a. Manual creation of groups from user contacts is quite time consuming as the user must deliberately identify clusters from his/her contact list so as to create the required groups. b. The dynamic nature of social groups especially with the addition, deletion and amending of relationships etc, users often manually handle such updates of custom groups. c. Study seeks to tackle the above problem using an implicit social graph through a friend suggest algorithm.
The study seeks to implement a predicted suggestive algorithm using the implicit social graph. This will help automatically help users to create custom contact groups in their phonebook. It will further, intelligently help eliminate time penalty and cost associated and spent to manually create custom contact graph. The study specific goals include: (a) to describe an interaction-based metric for estimating user's affinity to his contact group, (b) implement the deep neural network and friend suggestion algorithm via a user's implicit social graph, (c) compare results of generated groups using predefined labels (seed-set dataset) of contacts as already categorized as friends by the user, (d) to suggest contacts that can be used to expand seed-set of group, (e) to demonstrate the importance of implicit group relationship and the interaction-based affinity in suggesting contacts to add/remove from groups, and (f) to compare both models and their effectiveness in these classifications.

Data gathering
Study employs the Enron Email Corpus Dataset that consist of a large collection of employee's email messages of the Enron Corporation collected during the legal investigation of Enron accounting fraud in December 2001. It contains over 600,000 messages from 150-users; But, for the study, we consider just a single user (i.e. an employee) email -because we seek a user' egocentric network. Each employee folder in the dataset, has other folders of emails such as incoming message and outgoing message folder. The employee email address to be considered is Shackleton Sara (chosen based on a balance in the number of incoming and outgoing messages through a thorough check on all employees' email folders). Folders considered in Shackleton Sara folders are inbox folder, notes_inbox folder, sent folder and sent item folder. The rationale for the adoption of the dataset used, is based on (a) standard email for social networks, (b) the dataset explores benefits and tie-strength of users, and (c) dataset incorporates the various components metrics to be measured for the graph algorithm. Dataset is obtained from [web]: http://www.cs.cmu.edu.

EXPERIMENTAL FRAMEWORK
Machine learning seeks to develop models, algorithms and systems that mimics intelligence as it allows such systems to evolve its behavior based on empirical data from sensors data and databases. The system explores data mining methodology, tools and techniques to capture characteristics of interest as we seek its underlying unknown probability distribution. Thus, illustrating the relations between observed and historic data. Thus, system seeks to automatically learn to recognize complex patterns and make intelligent decisions from it [11][12][13][14].

Friend suggest algorithm (FSA)
Following the work of Roth et al (2010), the researcher further explores the friend suggest algorithm which probes the presence of implicit clustering in a user's egocentric network by observing groups of contacts who are frequently present as corecipients in the same email threads. FSA functions within the egocentric network in order to show suggestions based only on a user's local data so as to protect user privacy and avoid exposing connections between the user's contacts that may not otherwise have been identified to him. The inputs to friends suggest is a seed, which is a small set of one or more contacts that belong to a group. This seed may be characterized by the user picking a few contacts. e.g. As an initial list in the "To" field of an email. Given this seed, FSA finds the contacts in the user's egocentric network that are related to the seed (i.e. they are present in the same implicit clusters). FSA further returns a score for each suggested contact, indicating the goodness of its fit to the existing seed-set. The FSA is applicable to problems of group clustering any interaction based social graph [1,6]. Figure 1 describes the various components that makes up the architecture of the friend suggest algorithm (FSA). FSA consist of these components explained below as in [6]: a. The Interaction Rank -Here, the implicit social graph has that its weighed edges represent tie-strength relationship between a user and his implicit group. This is computed via the following criteria: (i) Frequency -These are groups with which a user interacts occasionally are more important to the user than groups with which he interacts infrequently, (ii) Recency is ability for group to change and be dynamic over time, (iii) Direction: Interactions a user initiates is of more significance, and (iv) Contact importance: Groups that includes important contact are also more important to a user than other groups. Thus, we adjust the edge weights based on a contact-importance metric, a group that includes one or more important contacts are assigned a greater edge weight that group that do not include such important contact. The contact-importance metric for a respective contact of a user may be determined by other data about user's relationship or global information about the contact's position within the socio-centric graph. Furthermore, receiving an email from a contact (passive interaction) is a weaker signal of closeness than an active interaction of sending an email to that contact. b. The Core Routine Function expands (adds/removes contact) via the seedset in the ego-centric network. c. The Scoring Functionimplements the various versions of the Update_Score algorithm, and consists of: (i) intersecting group score, (ii) intersecting weighted score, (iii) intersect group count, (iv) top_contact score, and (v) suggesting contact to remove algorithms.
Finally, extending [1], study seeks to explore the FSA taking advantage of its strength on a custom dataset.

Spectral deep learning network
Deep neural network has successfully been implemented in systems that seek to learn useful features and construct multi-layer networks from a vast amount of training data. Forecast accuracy is improved using DNNs, allowing more data about a raw dataset to be obtained. DNN deep architectures including multiple hidden layers-and each hidden layer alone conducts a non-linear transformation from the previous layer to the next [15,16]. With deep learning proposed by [17], DNN is trained according to two sections: (a) pretrained, and (b) fine-tuned procedures [18,19].
Auto-Encoder: [20] Auto-Encoder is a type of unsupervised three-layered neural network whose output target is an input data shown in Figure 2. It includes both encoder and decoder networks. The encoder network transforms input data from a high-dimensional space into a low-dimensional space; While, the decoder network remodels the input from the previous step. The encoder network is defined as encoding function fencoderwith encoding process as in (1) where x m is a data point and h m is the encoding vector obtained from x m : The Decoder is a reconstruction function denoted as fdecoder as in (2), where x m is the decoding vector obtained from h m . Other algorithms for encoding (reconstruction function) includes (3)(4)(5) respectively.
Pre-Training: N auto-encoders can be stacked to pre-train N-hidden-layer DNN. When given an input dataset, the input layer and the first hidden layer of the DNN are treated as the encoder network of the first auto-encoder. Next, the first auto-encoder is trained by minimizing its reconstruction error. The trained parameter set of the encoder network is used to initialize the first hidden layer of the DNN. The first and second hidden layers of the DNN are regarded as the encoder network of the second auto-encoder. Accordingly, the second hidden layer of the DNN is initialized by the second trained auto-encoder. This continues until the Nth auto-encoder is trained to initialize the final hidden layer of the DNN [20]. Thus, all hidden layers of the DNN are stacked in an auto-encoder in each training N times, and are regarded as pretrained. This pre-training process is proven to be significantly better than random initialization of DNN and quite useful in achieving generalization in many of the classification tasks [21,17]. Fine-Tuning is a supervised process that improves performance of a DNN. The network is retrained, training data are labelled, and errors calculated by difference between real and predicted values are backpropagated using stochastic gradient descent (SGD) for all multi-layer networks. SGD randomly selects data samples, and iteratively updates gradient direction with weight parameters. Best gradient direction is obtained with a minimum loss function. The merit of SGD is that it converges faster and also does not consider the entire dataset. Making it far suitable for complex neural networks [20] as in (6) below: E is loss function, y is real label and t is net output. Gradient of weight w is obtained as derivative of error equationso that with the gradient of the weight wij, the updated SGD equation is defined by (7) [20]: h is the step-size and it is greater than 0; while, ω is number of hidden layers in DNN [20]. The process is optimized and tuned using the weights and threshold of the correctly labelled data. This enables the DNN to learn important knowledge for its final output and direct the parameters of entire network to perform correct classifications [14].

K-nearest neighbourhood (KNN)
Is a well-known supervised learning model for pattern recognition, introduced by Fix and Hodges in 1951. It remains one of the most popular nonparametric models for classification problems [22]. KNN assumes that observations, that are close together, are likely to have the same classification. The probability that a point x belongs to a class is estimated by proportion of training points in a specified neighbourhood of x that belong to that class. This point(s) is then either classified by majority vote or by a similarity degree sum of the specified number (k) of nearest points. In majority voting, number of points in neighbourhood belonging to each class is counted, and the class to which the highest proportion belongs to is most likely classification of x [13]. Similarity degree sum calculates a similarity score for each class based on the Knearest points and classifies x into the class with the highest similarity score. Its lower sensitivity to outliers allows majority voting to be used other than similarity degree sum [23]. We use majority voting for data to determine which points belongs to neighbourhood so that distances from x to all points in the training set must be calculated. Any distance function that specifies which of two points is closer to the sample point could be used. The most common distance metric used in K-nearest neighbour is Euclidean distance [24] given by Eq. 8 as distance between each test point ft and training set point fs, each with n attributes as thus: In general, KNN performs the following steps: (a) chosen of k value, (b) distance calculation, (c) distance sort in ascending order, (d) finding k class values, and (e) finding the dominant class [25]. A challenge with KNN is to determine the optimal size of k that acts as smoothing parameter. A small k is not sufficient to accurately estimate population proportions around the test point. A larger k will result in less variance in probability estimates (but for risk of introducing more bias). K should be large enough to minimize probability of a non-Bayes decision, and small enough that all points included, gives an accurate estimate of the true class. [26] in [13] found optimal value of k to depend on sample size and covariance structures in each population, and on proportions for each population in the total sample. In cases where the differences in covariance matrices and the difference between sample proportions are either both small (or both large), then optimal k is N 3/8 (N is number of samples in training set). If there is a large difference between covariance matrices, and a small difference between sample proportions (or vice-versa), optimal k is determined by N 2/8 . This model presents some merits [27]: (a) mathematical simplicity allows for achieve classification results as good or even better than other more complex pattern recognition techniques, (b) it is free of statistical assumptions, (c) its effectiveness does not depend on the space distribution of classes, and (d) when the boundaries between classes are not hyper-linear or hyper-conic, K-nearest neighbour performs better than LDA [13].
Its major demerits is that it does not work well for large differences in samples in each class. KNN yields poor data about the structure of its classes, and relative importance of variables in classification. Also, it does not allow graphical representation of the results, and in case of large number of samples, computation become excessively slow. In addition, KNN requires more processing and memory needs than other methods. All prototypes in training set must be stored in memory and used to calculate Euclidean distance from every test sample. The computational complexity grows exponentially as the number of prototypes increases [28,13].

Linear discriminant analysis
LDA is an effective supervised classification method with wide range of applications. Its theory is to classify compounds (rules) dividing n-dimensional descriptor space into two regions separated by a hyperplane that is defined by linear discriminant function. Discriminant analysis transforms classification tasks into functions with data partitioned into classes. It reduces the problem to an identification of a function [13]. The focus of discriminant analysis is to determine this functional form (assumed to be linear) and estimate its coefficients. Introduced in 1936 by Ronald Fisher, LDA seeks the mean of a set of attributes for each class, and using the mean of these means as boundary. It thus projects attribute points onto the vector that maximally separates their class means, and minimizes their within-class variance as in (9) [29]: X is vector of the observed values, Xi (i = 1, 2…) is the mean of values for each group, S is sample covariance matrix of all variables, and c is cost function. If the misclassification cost of each group is considered equal, then c = 0. A member is classified into one group if the result of the equation is greater than c (or = 0), and into the other if it less than c (or = 0). A result that equals c (set to 0) indicates such a sample cannot be classified into either class, based on the features used by the analysis. LDA function distinguishes between two classesif a data set has more than two classes, the process must be broken down into multiple two-class problems. The LDA function is found for each class versus all samples that were not of that class (one-versus-all). Final class membership for each sample is determined by LDA function that produced the highest value and is optimal when variables are normally distributed with equal covariance matrices. In this case, the LDA function is in same direction as Bayes optimal classifier [30], and it performs well on moderate dataset in comparison to other more complex method [13]. Its mathematical function is simple and requires nothing more complicated than matrix arithmetic. It assumes linearity in the class boundary, however, limits the scope of application for linear discriminant analysis. When boundaries are nonlinear, the performance of the linear discriminant may be inferior to other classification methods. Thus, to curb thiswe adopt a decimal encoding of the data to give us a semblance of linear, continuous boundaries [31].

Rationale for choice of algorithms
Stochastic models are inspired by biological populations and laws of evolution. They search for optimal solution via hill-climbing method that is flexible, adaptive to changing states and suited for real-time tasks. They guarantee high convergence in multimodal task(s) via an initialized random population and allocates increasing trials to regions found with high fitness for optimality in time. Its demerit is in their inefficiency with linear systems in that if the optimal is in small region surrounded by regions of low fitness function becomes difficult to optimize. But, the adoption of such stochastic graph-based models entails that for iterated hill-climbingonce a peak is located, it restarts with another, randomly chosen, starting point. Its merit is simplicity and with each random trial performed in isolation, no overall picture of the domain is obtained. As the evolution progresses, it continues to allocate its trials evenly over a search space. This means that the algorithm continues to evaluate as many points in the regions found to be of low fitness as in regions found to be of high fitness [11][12].

RESULT, DISCUSSION AND FINDINGS 4.1. Model performance
Performance is evaluated using the mean square error (MSE), mean regression error (MRE), mean absolute error (MAE) and coefficient of efficiency (COE) as in Table 1 [13]. It shows that the adopted models for DNN has a mean square error of .73, mean regression error of 0.79, mean absolute error of 0.75 and coefficient of efficiency of 0.581 respectively. Also, we have that the FSA has a mean square error of .41, mean regression error of 0.51, mean absolute error of 0.45 and coefficient of efficiency of 0.781 respectively. Then LDA has a mean square error of .18, mean regression error of 0.21, mean absolute error of 0.43 and coefficient of efficiency of 0.492 respectively. Lastly, KNN has a mean square error of .36, mean regression error of 0.21, mean absolute error of 0.23 and coefficient of efficiency of 0.853 respectively. It is worthy of note that the more the values of MSE, MRE and MAE tends to 1, the more accurate the adopted model's prediction; While, the value of COE conversely should tend more to zero (0). To ensure model's efficiency and accuracy, we compute misclassification rate and improvement percentages for both training and testing dataset, summarized in Table 2 and Table 3 respectively via (10, 11)   Table 3 shows LDA, KNN and FSA yields an improvement of 41.1%, 43% and 69.9% respectively; while, DNN yields 76%. It was also observed that though KNN is quite sensitive to the relative magnitude of different attributes, all attributes are thus scaled by their z-scores before using KNN [32].

Classification accuracy
The Figure 3 shows the prediction accuracy of the various models with FSA showing an accuracy of 92-percent, DNN showing an accuracy of 89-percent, KNN showing an accuracy of 74-percent and LDA showing an accuracy of 70-percent respectively. It is also worthy of note that in order forms of classification, DNN (deep neural networks) have been found to outperform these models.

. Processing speed
The Figure 4 shows the mean prediction processing time for the various models with FSA showing an men processing time of 1.22, DNN has mean processing time of 0.98, KNN shows a mean processing time of 2.98 and LDA shows a mean processing time of 3.36 respectively. It becomes clearer thus, that DNN reach an accuracy of 89-percent at 0.98 seconds, FSA reached an accuracy of 92-percent at 1.22 seconds. KNN reached an accuracy of 74-percent at 2.98seconds; while, LDA reached its 70-percent at 3.36 seconds respectively.

Convergence time
The rationale for model choice(s) is to compare and measure their convergence behavior amongst other statistics as seen in Figure 3 -5 respectively. Also, LDA and KNN converged after 405 and 387iterations respectively; While, FSA and DNN converged after 253-and 193-iterations respectively. DNN outperforms FSA for the task being considered. We note, model's speed is traded-off for greater accuracy of classification, a greater number of rule set generated to update the knowledge database for optimality and greater functionality.

CONCLUSION
There has to be effective communication between two entities for an effective relationship. For this to happen, both persons have to be in constant communication. A means through which this happens, is via social platforms such as link editing, photo sharing, and email communication and so on. All models used shows that two or more users can connect using an implicit social graph. They all generate a friend's group, given a small seed-set of contacts already categorized by a user, as friends, colleagues groups. They equally suggest contacts to expand the seed-set as contained in the user's egocentric network. For the study, we used the Enron Corpus Email dataset to determine the interaction rank (edge weight) between a user and his group 193 of contacts. We detailed the comparative statistics of the various models based on the efficiency, speed, convergence and others.