Inventory prediction and management in Nigeria using market basket analysis associative rule mining: memetic algorithm based approach

A key challenge in businesses today is determining inventory level for each product (to be) sold to clients. A pre-knowledge will suppress inventory stock-up and help avert unnecessary demurrage. It will also avoid stock out and avert loss of clients to competition. Study aims to unveil customer’s behavior in purchasing goods and thus, predict a next time purchase as well as serve as decision support to determine the required amount of each goods inventory. Study is conducted for Delta Mall (Asaba and Warri branches) department store. We adapt the memetic algorithm on market basket dataset to examine buying behavior of customers, their preference and frequency at which goods are purchased in common (basket). Result shows some items placed in basket allow customers to purchase items of similar value, or best combined with the selected items due to shelf-placement via concept of feature drift. Model yields 21-rules for eight items obtained from data transaction mining dataset acquired from Delta Mall.


INTRODUCTION
The world over, businesses take stock and inventories of their daily production so as to account for goods and services rendered to their client in exchange for money. Inventories have been viewed by many as raw materials, work in progress or finished products that are stored to meet the supply demands of consumers [1,2]. If the amount of inventory is less than amount of actual need, the business may lose the opportunity to maximize sales. They may also lose potential clients, lose loyalties as well as lose anticipated maximum profits; While, if they stock too much of the inventory, it will increase the cost of maintenance and storage, and also consequentlyreduce the profit margins [3,4].
Inventory supply demand value chain and its management have rippled many businesses with a range of complications. Thus, the field has attracted the attention of many researchers and practitioners nowadays. [5] used moving average model for a company with fluctuating demand, proved that moving average is able to accommodate rapid changes in data; And quite suitable for companies with conditions of high variety of products and raw materials. This method is appropriate when used to predict long-term predictions. Thus, [5] explored other studies that employed exponential smoothing model [6] and the Box-Jenkins auto-regressive integrated moving average [7]. The inherent limitations of each method and others, accounted for the difficulty in applying the method to knowledge-base. However, [8] employed genetic algorithm to predict inventory stocks and proved that memetic algorithm offers many benefits such as its being a computational more efficient algorithm, more accurate and less time-consuming. In furtherance, [9] extended this work using deep learning and noted that there are inherent challenges in using genetic algorithm. That while, artificial neural networks Int J Inf & Commun Technol ISSN: 2252-8776  Inventory prediction and management in Nigeria using market basket analysis … (Arnold Adimabua Ojugo) 129 (ANN) are suited for learning the underlying probabilities of feats of interest in market basket analysisdeep learning is best suited to predict the amount of product inventory needs due to other spatial data therein like prediction of concept evolution, concept drift amongst others. Thus, the rationale and mainstay focus for using memetic algorithm here is to advance the use of ANN in prediction of inventory as here proposed. MBA is a data mining method, focusing on identification of products that are purchased at same time on each transaction [10]. Output of MBA is a set of rules that indicate the products that are purchased on the same time. This output will be used as input for the prediction of inventory. The rules generated by MBA are association rule(s) are of the form: If antecedent (A), then consequent (B). Each rule is equipped with a support level that indicates the number of transactions containing A and B and confidence level that is a measure of accuracy which is the rule of association rules. [11,12] each rule is also equipped with an expected condition and a lift so that for each antecedent (A) and consequence (B), the support, confidence, expected confidence, and lift are as in Equations below and fig. 1 Eq. 1. P(h) is prevalence or support which yields how often the combination A and B co-occurs.
Eq. 2. P(B|A) is confidence value which yields the confidence that item B appears in the basket given A is already in the basket. Thus, we use the rule A → B. (3) Eq. 3. P(A)P(B) expected confidence yields the confidence on how frequent items A and items B cooccur in the number of times that the items B is chosen and placed in the basket.

= (4)
Finally, (4). L (A,B) lift of the rule A → B yields a measure of how much more confident we are in item B given that we see item A in the basket. MBA is a subset of market research that many researchers are currently paying special attention to with more detailed in [3]. Tang, K, et al [13] proposed an approach to perform market basket analysis in a multistore, multi-period environment. Chen, Y, et al [14] noted that most models used in dealing with market basket problem could not discover any important purchasing patterns when and where multiple stores exist. They developed a method to overcome this weakness; while, Yun, C, et al clustered data of market basket using a novel measurement they named category-based adherence [15]. Cavique, L converted market basket problem into a maximum-weighted clique problem for discovering large item set patterns [6]. According to [16] they developed optimization model for shelf-space management problem in which products are grouped as families and the location of each family is determined on the shelf like cataloging. They considered shelf location effect on sales; but, did not attend the cross-selling effect. Thus, they did not use the purchase data. Nierop, E, et al [17] proposed a method for dealing with shelf-space management problem that consists of two-parts. In the first phase, a statistical model was provided to measure the impact of shelf layout on sales. In the second part simulated annealing (SA) was used to maximize expected total profit. They also like [16], did not consider the association rules from customers' purchasing data. Thus, did not use it to maximize cross-selling effect.
Recent researches consider other problems. According to Saraf, R. and Patil, S they proposed a bottomup hierarchical cluster-model for clustering retail items [18]. To do this, they applied the concept of 'distance' between the entities or, groups of entities to achieve the purpose of market-basket analysis. Market basket analysis is now employed by many researchers to other applicable tasks. Shiokawa, Y, et al applied market basket analysis framework to visualize transaction data to assess the various human lifestyles [19]. Solnet, D, et al also studied potentials to grow hotel revenue by exploring most attractive services and products that can attract/satisfy guests and encourage them to repeat their purchase [20]. In furtherance, [5] explore the cultural behaviour of consumers. Further studies can refer to [21][22][23][24][25][26]. The reviewing of related researches reveal that a main focus of market basket analysis and its applicationis geared towards creating a more efficient optimization algorithm for data mining. We can apply evolutionary model and association rule mining [27][28][29][30][31][32].

MEMETIC BAYESIAN NETWORK EXPERIMENTAL FRAMEWORK
Evolutionary algorithm seeks to exploit historic numeric data and explore human knowledge via mathematic models and symbolic reasoning to yield an output that is tolerant to imprecision, noise, uncertainty and partial truth as applied to its input [33]. It evolves into meta-rules for constraint satisfaction tasks that use intelligent agents in vector space to seek for optimality. These algorithms/models are inspired by evolution, behavioral patterns in biological populations and nature laws to mimic agents seeking survival [34,35] as they have proven efficient in complex optimization. Simply put, evolutionary model attempts to explore dynamic processes through exploitation of observed data to yield an output that exhibits robustness, continuous adaptation and flexibilitywhile displaying the underlying probabilities of data feats of interest. Thus, it seeks an output feat with uncontrollable constraints modeled within the models input that may not be explicitly present in the search space but confined to real parameters as well as limited by boundary values [36][37][38].

Artificial neural network (ANN)
ANN data processing model is inspired by neurons in the human brain. Thus, consists of interconnected neurons (nodes) with capability to learn by example that makes them universal estimators. As it processes data, its nodes shares data signals and adjust its weights and bias adjustments representing the synapse axons and dendrites to indicate connection strength between synapses respectively [38][39]. Signals are converted so that weights are adjusted as learning occurs and is summed by an adder. Depending on task, its activation function limits its output [40][41] to modulate associated inputs and nonlinear feats exhibited via transfer or activation function as in (5) below: ANN attempts to translate into mathematical model, principles of biological processing so as to generate in the fastest time, implicit predictive outcomes of a task [42][43]. Its outcomes are derived from experience, and it is able to recognize feats and behaviours of interest from historic datasetto yield an optimal solution of high quality and void of over-fitting, irrespective of modification via other approximations with multiple agents. These also, constantly affects the quality of any solution [44]. Its configuration depends on the area to be applied, captured data feats and system requirement. Its connections are set as either explicit (apriori knowledge) and/or implicit (post-priori knowledge) to allow learning so that the net is trained to learn patterns that change its weight and bias based on a rule [40]. Its learning is grouped into either of: supervised, unsupervised and reinforcement [45,46].
The nature of market data is chaotic and requires previous knowledge. Thus, we adopt the recurrent (Jordan) network so that it incorporates previous dataset and previous output to be feedback as input into the model's hidden units, as input into model [47,48] to yield next output. Its correlated weights (Wi.j) between the input and hidden layers, bias (Woj) and the market basket analysis dataset (xi) is summed via the tangent/sigmoid transfer function to yield its output as in (6) and (7) [49].

131
We construct our Jordan net by modifying the multilayered feedforward with addition a context layer to help retain data between observations. With each move, new inputs are fed in and previous contents in hidden layer is passed into context layer, and later fed back into the hidden layer in the next time-step. The context layer at start is initialized to zeroso that output from the hidden layer on the first iteration will be same as if there is no context layer [50].
The net resolves structural dependencies imposed on it by dataset and hybrid heuristics used via its ability to store earlier data as generated from previous layer(s) [51]. Feed-forward nets are expanded and extended to represent complex dynamic patterns (as our data is rippled with new and previous sets).
Feedforward nets treat all data as new so that previous dataset cannot help the model identify data feats, even if such datasets exhibits temporal dependence; causing practical difficulty as network becomes larger. However, Jordan network overcomes this difficulty through its internal feedbacksmaking it appropriately suitable for dynamic, non-linear and complex tasks. Thus, output is fed back as input into hidden layer with a time delay [52][53].
Our rationale for the Jordan's network is because it is more plausible and computationally more powerful than others due to use of backpropagation-in-time learning so that its output at time t is used along with new input data to compute its output at time t+1 in response to model's dynamic and non-linear feats [51]. Output is computed via Tansig function y k , which sums input, receives target value of training pattern, computes error data as well as updates weight c j k and bias c o k . Error is sent back in next move to input nodes from output via error-backpropagation to correct the weights and find those that approximates to the target output with selected accuracy. Weights are modified by minimizing error between target and computed outputs as forward pass ends. If the error is higher than selected value, process continues with reverse pass; else, training stops [54][55].
Its training aim at best fit weight dataset that assumes approximation influence of data points at the centerso that function decreases with distance from its center. Its Euclidean length (r j ) yields distance between datum vector y = (y 1 ,...,y m ) and center (w1j,...,wmj) as in (8) [48,49]: The suitable transfer function is applied to rj to yield (9): Finally, output k receives weighted combination as in (10):

Genetic algorithm (GA)
GA is inspired by Darwinian genetic evolution (survival of fittest) consists of population (data) chosen for selection with potential solutions to a specific task. Each potential solution is an individual for which optimal is found using four operators: initialize, select, crossover and mutation [33,56]. Individuals with genes close to optimal, is said to be fit. Fitness function determines how close an individual is to optimal solution. Ojugo, A.A, et al [38] notes the operators as: a. Initialize -Individual data are encoded into forms suitable for selection. Each encodings type used has its merit. Binary encodings are computationally more expensive. Decimal encoding has greater diversity in chromosome and greater variance of pools generated; float-point encoding or its combination is more efficient than binary. Thus, it encodes as fixed length vectors for one or more pools of different types. The fitness function evaluates how close a solution is to its optimalafter which they are chosen for reproduction. If solution is found, function is good; else, is bad and not selected for crossover. The fitness function is the only part with knowledge of task. If more solutions are found, the higher its fitness value. b. Selectionbest fit individuals close to optimal are chosen to mate. The larger the number of selected, the better the chances of yielding fitter individuals. This continues until one is chosen, from the last two/three remaining solutions, to become selected parents to new offspring. Selection ensures the fittest individuals are chosen for mating but also allows for less fit individuals from the pool and the fittest to be selected.
A selection that only mates the fittest is elitist and often leads to converging at local optima. c. Crossover ensures best fit individual genes are exchanged to yield a new, fitter pool. There are two crossover types (depends on encoding type used): (a) simple crossover for binary encoded pool. It allows single-or multi-point cross with all genes from a parent, and (b) arithmetic crossover allows new pool to be created by adding an individual's percentage to another. d. Mutation alters chromosomes by changing its genes or its sequence, to ensure new pool converges to global minima (instead of local optima). Algorithm stops if optimal is found, or after number of runs if new pools are created (though computationally expensive), or when no better solution is found. Genes may change based on probability of mutation rate. Mutation improves the much needed diversity in reproduction and its algorithm is as thus: Cultural GA is a variants of GA with a belief space define as thus: (a) Normative (has specific value ranges to which an individual is bound), (b) Domain (has data about task domain), (c) Temporal (has data about events' space is available), and (d) Spatial (has topographical data). In addition, an influence function mediates between belief space and the poolto ensure and alter individuals in the pool to conform to belief space. CGA is chosen to yield a pool that does not violate its belief space and helps reduce number of possible individuals GA generates till an optimum is found [56,34,37,38].

MATERIALS AND METHODOLOGY 3.1. Problem description and formulation
Consider market data logs that include items purchased by customers. The manager of a supermarket wants to maximize the interestingness of the product placement on shelves. The interestingness value is related to mined association rules and each item's location on shelves [57]. The rationale for the interestingness maximization with location considerations is based on the fact that, association rule mining helps maximize cross-selling effect of items [58]. It is also clear that the location of shelves has the undeniable impact on the selling rate of items. Thus, items that are placed near the entrance or exit doors have more chance to be purchased. So, preference function of the store's manager depends on the following parameters: selling benefit, support and confidence of each pair of items, and the selling possibility of each item from each shelf [59]. These parameters are thus integrated into the preference function (pf) as in (11): m is the number of items, p is the number of shelves, C il is the confidence of the rule (item i → item l), b i is selling benefit of the ith item, v ik is the selling possibility degree of the item i if and when placed into the kth-shelf, and x ik is binary decision variable that takes value of 1 when the item i is allocated to the shelf k; Otherwise, its value is 0. There are restrictions that limit preference function value. Thus, capacity limitation (cl) of each shelf must be considered as the following constraint: U k is the capacity of the kth-shelf. The second constraint is the association constraint such that support of the rule (item i → item l) must be greater than minimum threshold determined by the decision maker. The objective function and constraints are non-linear functions in which decision variables are binary. Thus, we will deal with a rough feasible space that increases the probability of trapping in the local optimum. Thus, our need for the use of an evolutionary unsupervised model in the scenario thus presented.

Numeric example dataset
Dataset is retrieved from Delta Mall (Shoprite) Asaba and Warri respectively dataset as in Table 1 below. Table 1 shows the encoded market basket dataset of items as they are co-selected of the various shelves and placed in the basket at the same time. For example, S01 for Item 1 has a prevalence of 0.81. This implies that there is 81% chance that items 1, 2, 6 and 8 are picked from shelf S01 and placed in the basket. The Delta Mall market basket dataset was employed to simulate the model as well as yield cum describe the proposed model-based solution. Thus, the system shows eight items that must be allocated into four-shelves. Also, based on the shelf's positions, each shelf has a different impact on the selling possibility of allocated goods, and these possibilities were determined by economists and experts as presented in Table 1.

Model design problems
Issues to be resolved in the model design include: a. Many studies aim at single heuristic to globally classify data or rules into various classes. This has often yielded false-positives (classifying a rule as genuine when it is false) and true-negatives (inability of model to classify a rule) error results. b. Such models employ hill climbing methods that often gets their solution trapped at local minima because their speed shrinks as such models often approaches its optima. c. Resolving conflict issues in structured learning and from statistical dependencies imposed by data and the use of multiple methods adopted/adapted, is quite a tedious.

Model design goals and objective
The proposed system aims to solve the existing problem of market basket analysis utilizing the following properties, which is in tandem to [60][61][62]: a. Embody the knowledge of human experts with the help of special software tools, manipulate data to solve problems and make decisions in that domain. b. Processes are better formalized and defined on machines. c. Knowledgebase update is automatic d. Processes are better formalized and defined on machines.

Experimental model / algorithm framework
The proposed model consists of four parts: a. Knowledgebase consists of historic, observed-structured items co-occurrence dataset (feats) of market basket for Delta Mall. These have been gracefully encoded as if-then ruleshereby represented as optimized binary functions for the selected data feats. b. Inference engine consists of hybrid associative rules and the genetic algorithm trained neural network model. Thus, the inference engine seeks to infer consequents derived from antecedents that have been trained using the hybrid memetic algorithm. The rules represent selected data feats of interest encoded as if-then (rule-based) conditions with possible outcomes and actions (classified into support, confidence and lift classes) upon criterion score being met or achieved. The Jordan network provides a self-learning machine, better tuned for robustness via genetic algorithm optimizer that yields greater flexibility of the rule-based data. Thus, it adapts the system to autonomously classify the market basket data into varying class-types as well as yield centralized-scaled boundary in determining high or low degree membership function. c. Decision supportconsists of the predicted output and the output database that is updated automatically in time as patients are diagnoses as long as it encounters and read sin new data. The decision support predicts system output based on the cognitive and the emotional filers as display by the output device. This is seen in Figure 2.
Model is first initialized with rules. Individual solutions are selected from pool via tournament method to determine the candidates to mate and yield next generation. Crossover and mutation is applied to help network learn dynamic and non-linear feats in the dataset and feats of interest using a multi-point crossover. With mutation, data are randomly generated using Gaussian distribution corresponding to crossover points (all genes are from single parent). As new parents contributes to yield new pool, mutation is applied to yield random genes from which three-candidates are selected (and allocated new random values that conforms to boundary limits) to undergo further mutation. The number of mutation applied depends on how far GA is progressed on the network (how fit is the fittest individual in the pool), which equals fitness of the fittest individual divided by 2. New individuals replace old with low fitness so as to create a new pool. Process continues until individual with a fitness of 0 is found. Thus, solution has been reached. Initialization/Selection with ANN ensures the first 3-beliefs are met; while, mutation ensures fourth-belief is met. Its influence function determines the number of mutations that take place, and knowledge of solution (i.e. how close solution is) has direct impact on how model is processed. The GANN model pseudocode is as thus:

Result findings and discussion
The result simulated association of basket analysis dataset is shown in Table 2. Another feature of major concern and impact to the market basket data in the allocation of items to shelves is the selling benefit. So, it is logical that in maximizing expected benefit of the selling, the products with the higher benefits must be allocated to shelves with higher selling possibilities. Table 3 shows the values of the products' benefit. Table  4 and Table 5 show the support and confidence simulated values respectively.

Figure 2. Evolution Convergence Time Using 4-Testbeds
Lastly, Figure 2 shows execution time versus convergence using four (4) separate test-beds to simulate the effectiveness and efficiency of the model. With data logs of items purchased by customers, the proposed model-based solution converges faster as items commonly placed in a basket are selected [63][64]; and thus, yields an effective means to maximize the interestingness of product placement on the shelves. This interestingness value(s) are rules mined by association using the frequency growth path-algorithm for item location on the shelves [65-69]. The rationale for interestingness maximization with location considerations is based on the fact that, association rule mining helps to maximize cross-selling effect of items. Also, it is clear that the location of shelves has the undeniable impact on the selling rate of items. Thus, items that are placed near the entrance or exit doors have more chance to be purchased. So, preference function of the store's manager depends on the following parameters: selling benefit, support and confidence of each pair of items, and the selling possibility of each item from each shelf.

CONCLUSION
Our memetic (genetic algorithm trained neural net) model as used for classification of market basket dataadapts GA to help speed up the final stages of ANN and thus, yield a robust optima in the shortest amount of time for such a dynamic and complex task. The rule-based heuristics will help better represent data values in the model [70][71][72]. Though, hybrids are quite difficult to implement, exploit and exploreit however yields better solutions with appropriate parameter selection that must be encoded through the model's structured learning. This will in turn help address the issues of statistical dependencies imposed on hybrid by the underlying stochastic heuristics adopted, resolve conflicts imposed with encoding of dataset cum data feats of interest [73] as well as highlight the implications of such multi-agent model as agents seek to create their own behavioural rules on the dataset usedas the model proposes a solution to display underlying probabilities of data feats of interest. GA helps to yield better generation via its process of recombination and mutation as applied.