Malware detection using Gini, Simpson diversity, and Shannon-Wiener indexes

Yeong Tyng Ling, Kang Leng Chiew, Piau Phang, Xiaowei Zhang

Abstract


The increasing number of malware attacks poses a significant challenge to cyber security. This paper proposes a methodology for static malware analysis using biodiveristy-inspired metrics that is Gini coefficient, Simpson diversity, and Shannon-Wiener index for malware detection. These metrics are used to build the structural feature representation on the raw binary file as the feature space. The effectiveness of these metrics are evaluated using multilayer perceptron (MLP) neural network and extreme gradient boosting (XGBoost) models. A deterministic algorithm is used to generate these features that represent the feature signature of the executable file. Additionally, we investigated the effectiveness of different byte sizes as the input feature for these two classifiers. According to the results, Gini coefficient with on chunk size of 128 has successfully achieved average F1 score of more than 98.7% by using XGBoost model.


Keywords


Gini coefficient; Malware detection; MLP; Shannon-Wiener; Simpson diversity; XGBoost

Full Text:

PDF


DOI: http://doi.org/10.11591/ijict.v14i2.pp737-750

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Yeong Tyng Ling, Kang Leng Chiew, Piau Phang, Xiaowei Zhang

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSNĀ 2722-2616
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

Web Analytics View IJICT Stats