Efficient email classification technique: a comparative study of header-only and full-content approaches
Abstract
The purpose of this research is to explore efficient techniques and sufficient features for organizational email classification, with a focus on identifying emails that are not beneficial for work to reduce the burden of email management. This study proposes a novel approach by comparing the performance of using email header features (Header-Only) versus full email data (Header + Body), aiming to evaluate the accuracy and processing time of widely used machine learning algorithms, including Random Forest, SVM, KNN, XGBoost, and ANN. The experiment was conducted using the Enron dataset, with key features extracted from email headers such as sender and recipient addresses and from the body content. The results show that using only header information provides classification performance comparable to using full email content. In particular, models such as Random Forest, XGBoost, and LightGBM achieved accuracy exceeding 95%, while reducing processing time by up to 21.66% in the Random Forest model. It is evident that classifying emails using header-only features is both highly accurate and resource-efficient. This research offers practical guidance for organizations in developing effective email filtering systems without compromising classification quality.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijict.v15i2.pp665-673
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Worawit Kitikusoun, Nawaporn Wisitpongphan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The International Journal of Informatics and Communication Technology (IJ-ICT)
p-ISSN 2252-8776, e-ISSNĀ 2722-2616
This journal is published by theĀ Intelektual Pustaka Media Utama (IPMU).