A convolutional neural network for skin cancer classification

ABSTRACT


INTRODUCTION
Deep learning is a neural network model that can help in good computing [1].Deep learning is also called convolution neural network (CNN).The CNN layers are the convolution layer, activation layer, pooling layer, fully connected layer, and softmax classification [2].CNN is also used to diagnose skin cancer [2]- [4] and breast cancer classification with an accuracy of 91.3% [5].CNN is also used to diagnose cervical cancer into seven types of disease and the accuracy is 91.2% to 99.5% [6].The CNN technique proved significant in dermoscopic melanoma classification with a sensitivity of 95% [7].
Dermatologists need an effective and reliable system for diagnosing skin diseases.Previous researches related to the system used to identify skin diseases are still inefficient because the accuracy value is below 90%.CNN is an accurate and efficient method for identifying skin diseases, to assist dermatologists [8].With some previous research, we will classify skin cancer using the CNN algorithm.The purpose of this research is to help identify skin diseases early on.Identifying skin diseases early, can help in the treatment and reduce mortality rates.This research previously existed, which was related to the identification of skin diseases using either machine learning or CNN methods.The CNN method produces good accuracy from previous research, so we propose the CNN method to identify skin diseases.This research develops a CNN architecture to identify skin diseases, the best CNN architecture is used as a model in developing a skin disease identification website.

RESEARCH METHOD 2.1. Related work
The image of the diseased skin is trained with the Alexnet and VGG-16 architecture so that it can be classified into Benign and Malignant.Melanoma-type skin cancer is a very fatal cancer.To retrieve features from the skin image using principal component analysis (PCA), and wavelet transform [9].The current algorithm that provides good and reliable accuracy, is CNN.Convolutional neural networks (CNN) are excellent at classifying skin lesions and analyzing images.Diagnosis using a computer with the CNN algorithm can help in the performance of doctors.The framework for the computer-based diagnosis of skin lesions combines the image of the segmented skin lesions and classifies the skin lesions into multiple classes [10].Classification with convolutional neural networks (CNN) includes accelerated learning (transfer learning), where this process uses an existing network architecture.The transfer learning architecture uses the Inception-v3 pre-trained, resNet-50, Inception-ResNet-v2, and DenseNet-201 [11].Classification of skin lesions is a process caused by the limitation of the characteristics of the dermoscopic images during the capture or sampling process.There are several types of skin lesions, including cancer such as melanoma, Benign cancer such as nevi, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC) [12].Convolutional neural networks can be used for the classification of skin lesions in the dermatological field.Image analysis and the process of segmentation and feature extraction of skin lesions must be considered carefully.CNN using a rapid learning architecture (transfer learning) was used to classify skin lesions [13].CNN is an efficient and accurate method for the analysis of skin disorders.Dermatologists need an effective system to facilitate diagnosis with the ability [14].
Early detection of skin cancer is very important and can prevent death, and several types of skin cancer, carcinoma, and melanoma [15].A reliable automatic melanoma screening (early detection) system is a system that can perform diagnostics using a computer-based algorithm.The CNN algorithm can be used to screen and detect malignant skin lesions early.The CNN process must require a dataset image along with the skin lesion type as machine learning.The types of skin lesions Balazs research, include melanoma, nevus, and seborrheic [16].The segmentation of skin lesions is an important process in computer-generated dermoscopic images.There are many segmentation methods for taking the features of skin lesions, one of which is convolutional nerve tissue.The CNN network architecture that is often used for segmentation is (FCN-8s and U-Net) [16].Computerized convolutional neural networks (CNN) can differentiate melanoma and nevi based on dermoscopic images [17], [18].11,444 dermoscopic images were used as the CNN training dataset.The CNN results can be used as a dermatologist's aid in classifying skin lesions on dermoscopic images [18].Skin cancer is a type of cancer that is often experienced by white people.A good algorithmic approach for the classification or diagnosis of skin lesions is pre-trained CNN [19].Automatic diagnostic systems for the early detection of skin cancer have had a very good effect [19], [20].It is proven that the process of treating patients who are detected early can be treated quickly.So that you can make a computerized diagnostic system based on dermoscopic images, you have to do several complete steps.The first step is to segment the skin lesion and remove the dermatoscopic feature.These features are used as a reference for learning convolutional neural networks [20].Melanoma is a deadly type of skin cancer [21], [22].So, we need a computer-based system that has a good learning algorithm.Image-based skin cancer detection consists of image repair, segmentation, extraction of interesting features from images, and classification of skin lesions.One good learning algorithm is a convolutional neural network (CNN).CNN can be used to identify malignant tumors on the skin surface with a sensitivity value of 93.3% [22].This research develops the CNN architecture to identify skin diseases.The best architecture is used as a model to create a skin disease identification website.

Convolution neural network
This research conducted a classification of Benign and Malignant skin cancer as shown in Figure 1.The dataset image is trained with the CNN algorithm with convolution layer architecture, pooling layer, activation screen, and fully connected.Each screen has a different function, for example, the convolutional screen is used to capture the most interesting image features, as in Figure 1.The pooling screen function of the convolutional feature is taken as the most prominent feature, and the activation screen is to modify or normalize the output.The result of CNN training is a model or weight vector.The model or weight vector is saved to model.h5,then used for testing or testing the classification of skin cancer types.The process of classifying skin cancer on an offline website.How to create an offline website using the framework Django.
Convolution 2D is to multiply the input image with a kernel or filter.The process of multiplying each image pixel will be multiplied by a filter, illustration of multiplication, or a convolution process as in Figure 2(a) and Figure 2(b).The purpose of 2D convolution is to take the maximum features.Figure 2(c) the input image is multiplied by the filter, which changes the size of the input image, initially  ×  to ( + 2) × ( + 2).The increase in the size of n and m for each edge pixel is given a value of 0. Then each pixel is multiplied by the filter (1): where, i, j are row and column indexes of the image or each image pixel.
Figure 1.Classification of skin cancer with CNN Furthermore, the pooling screen is a screen for determining the best feature value, as shown in Figure 3.The image of the feature extraction results is taken for each 2×2 size which is the maximum (Figure 3(a)) or the average (Figure 3(b)).The pooling screen takes the best feature employing the maximum value of each image size or the average value of the image size (Figure 3) and the last screen is a screen for classifying the type of cancer (Benign and Malignant) using the sigmoid function (1).

First CNN architecture
We classified skin cancer into two classes, Benign and Malignant [7].We classify using two CNN architectural models.The first CNN model architecture has a parameter value of 6,427,745, with architecture like Figure 5.In Figure 5 there are two 2D convolution screens, two pooling screens (using max pooling), and the sigmoid activation function.The training process for the first CNN model architecture was carried out repeatedly, with as many as 10 epochs and each epoch consisting of 200 iterations.With accuracy values ranging from 85-95%.The training process for each epoch will iterate 200 times, and each iteration will calculate the accuracy value or error value, in order to improve the weight vector.The result of the training process is a model or weight vector (h5), which is then used for classification trials, as in Figure 6. Figure 6 shows how the performance of the training data trials with validation data.Figure 6.First CNN architectural trial results

Second CNN architecture
Next, we made the second CNN architecture model with a smaller number of parameters of 2,797,665 as shown in Figure 7. Figure 7 shows a convolution layer three times, three times the pooling screen with max-pooling, and there is a dropout screen.The dropout layer is used to remove some unimportant parameters.From the second CNN architectural model, training was carried out many times.Training is the process of recognizing a pattern or model from an image, which is carried out in as many as 10 epochs and each epoch consists of (100-200 iterations).Each epoch was iterated 200 times and each iteration calculates the accuracy or error, to improve the weight.
The training result is the weight (h5) which is then used for testing or validation.The results of testing or data validation show that the accuracy of the second model is lower because the number of parameters is less as shown in Figure 8. Figure 8 shows that the red lines and blue lines show the results of the accuracy of the validation data and training data.
The testing and training of the proposed CNN algorithm show that a high parameter value will result in high accuracy too.Table 2 shows the results of the testing accuracy of the CNN algorithm that we propose and use the pre-trained.The training used for training and testing CNN were VGG16 [7], Inception-V3, and ResNet50 [9].The training is a process of transfer learning where the model has been trained with data that has a classification of 1000 classes.
Table 2 shows that with the pre-trained used, the average accuracy result is still low compared to the proposed CNN model.We trained for 10 epochs and each epoch with iterations between 100 and 200 times.The input image that we enter varies in size, but the same type of image is in color.The highest number of parameters is 122,223,521 with the Inception-V3 pre-trained [9] and the size of 224×224, but the accuracy results are almost the same as the second CNN model we proposed.In figure 10 the index.htmlfile will display the main web page of the application as shown in Figure 9. On the main page (index.html)a menu will appear to select an image file to be tested in a folder or PC drive.Then the image will appear along with the file name as in Figure 9. Click the submit button, the image will be processed and determined the type or class of skin cancer (Benign and Malignant) as depicted in Figure 9. Files that have been submitted will be stored in the media folder as shown in Figure 10.The submit button in Figure 9 will process the classification based on model.h5 in the models' folder.Model.h5 is the CNN training result file.How to determine using the limit value (0.8).This limit value is the value to determine the Benign or Malignant class, if <0.8 then Benign cancer, if not Malignant cancer.This limit value is generated from the testing process many times and seeing the results of the displayed sigmoid value.The results of the sigmoid were then taken as the average value of the two classes.The process for determining the classification of skin cancer is in the views.pyfile.In order to run the web application, do the following command: i) Open a command prompt; ii) Write the command move to the application folder that was created: cd nama_folder_project; and iii) Write the command python manage.pyrunserver.This command will call the file manage.pyand will run as long as the web application is restarted as shown in Figure 9.

CONCLUSION
The trial results showed that 6,427,745 parameters were able to classify skin cancer with the highest accuracy of 93%.Parameters 2,797,665 were able to classify skin cancer with the highest accuracy of 73%.The number of parameters determines the results of classification accuracy (Benign and Malignant).The number of parameters is determined by the architectural array (CNN layers).Parameter 6,427,745 is the model that has the highest accuracy, then it is stored.The model is used to identify web-based skin diseases with the Django framework.Future research is expected to be able to implement this skin cancer classification with CNN architecture with fewer parameters and high accuracy.

Figure 2 .Figure 3 .
Figure 2. Input image of (a) convolution layer illustration, (b) output feature image size same as input image, and (c) output feature image size smaller

Figure 4 .
Figure 4. Image of skin cancer types of (a) Benign skin cancer and (b) Malignant skin cancer

Table 1 .
Skin cancer image dataset

Table 2 .
CNN testing results and transfer learning