Efficient email classification technique: a comparative study of header-only and full-content approaches

Worawit Kitikusoun; Nawaporn Wisitpongphan

doi:10.11591/ijict.v15i2.pp665-673

Efficient email classification technique: a comparative study of header-only and full-content approaches

Worawit Kitikusoun, Nawaporn Wisitpongphan

Abstract

The purpose of this research is to explore efficient techniques and sufficient features for organizational email classification, with a focus on identifying emails that are not beneficial for work to reduce the burden of email management. This study proposes a novel approach by comparing the performance of using email header features (Header-Only) versus full email data (Header + Body), aiming to evaluate the accuracy and processing time of widely used machine learning algorithms, including Random Forest, SVM, KNN, XGBoost, and ANN. The experiment was conducted using the Enron dataset, with key features extracted from email headers such as sender and recipient addresses and from the body content. The results show that using only header information provides classification performance comparable to using full email content. In particular, models such as Random Forest, XGBoost, and LightGBM achieved accuracy exceeding 95%, while reducing processing time by up to 21.66% in the Random Forest model. It is evident that classifying emails using header-only features is both highly accurate and resource-efficient. This research offers practical guidance for organizations in developing effective email filtering systems without compromising classification quality.

Keywords

Email classification; Email header; Email security; Machine learning; Spam email

Full Text:

PDF

DOI: http://doi.org/10.11591/ijict.v15i2.pp665-673

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSN 2722-2616
This journal is published by the Intelektual Pustaka Media Utama (IPMU).

View IJICT Stats

Username
Password
Remember me