The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Bridging the Gap: Ensemble Learning-Based NLP Framework for AI-Generated Text Identification in Academia

Background: The advent of Large Language Models (LLMs), including Chat Generative Pre-trained Transformer (ChatGPT) and Bard, has revolutionised text generation while raising ethical concerns regarding academic integrity. Differentiating Artificial Intelligence-Generated Texts (AIGT) from human-written content is crucial to maintaining transparency and trust in scholarly communication. Objective: This study aims to address the limitations in existing detection methods by introducing a Machine Learning (ML)-based Natural Language Processing (NLP) framework that effectively distinguishes between AI-generated and Human-Written academic texts (HWAI). Methodology: The proposed framework integrates comprehensive preprocessing, Exploratory Data Analysis (EDA), linguistic analysis, and ensemble learning techniques. Text representation was achieved using Term Frequency-Inverse Document Frequency (TF-IDF) and word embeddings. We employed two diverse datasets, Artificial Intelligence-Generated Academic (AI-GA) and HWAI, to validate the framework’s efficacy, ensuring robust classification performance. Results: The ensemble model did better than individual classifiers. On the AI-GA dataset, it achieved state-of-the-art accuracy (98.67%) and Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) (99.88%). The HWAI dataset achieved 96.52% accuracy and 99.37% ROC-AUC. These results highlight the framework’s capability to identify unique linguistic patterns in AI-generated content. Conclusion: The framework addresses key linguistic and computational challenges and provides a scalable and reliable solution for detecting AI-generated content in academic domains. Future work will explore hybrid human-AI authorship detection and real-time deployment to enhance its practical utility across disciplines.

 

[1] ACL, ACL 2023 Policy on AI Writing Assistance, the 61st Annual Meeting of the Association for Computational Linguistics, https://2023.aclweb.org/blog/ACL-2023-policy/, Last Visited, 2025.

[2] Agarwal B. and Mittal N., “Text Classification Using Machine Learning Methods-A Survey,” in Proceedings of the 2nd International Conference on Soft Computing for Problem Solving, Jaipur, pp. 701-709, 2014. https://doi.org/10.1007/978- 81-322-1602-5_75

[3] Alamleh H., Alqahtani A., and Elsaid A., “Distinguishing Human-Written and ChatGPT- Generated Text Using Machine Learning,” in Proceedings of the Systems and Information Engineering Design Symposium, Charlottesville, pp. 154-158, 2023. https://ieeexplore.ieee.org/document/10137767

[4] Alkaoud M., Alsaqoub M., Aljodhi I., Alqadibi A., and Altammami O., “ACLM: Developing a 1066 The International Arab Journal of Information Technology, Vol. 22, No. 6, November 2025 Compact Arabic Language Model,” The International Arab Journal of Information Technology, vol. 22, no. 3, pp. 535-546, 2025. https://doi.org/10.34028/iajit/22/3/9

[5] Berriche L. and Larabi-Marie-Sainte S., “Unveiling ChatGPT Text Using Writing Style,” Heliyon, vol. 10, no. 12, pp. 1-19, 2024. https://doi.org/10.1016/j.heliyon.2024.e32976

[6] Boden M. and Edmonds E., “What is Generative Art?,” Digital Creativity, vol. 20, no. 1-2, pp. 21- 46, 2009. https://doi.org/10.1080/14626260902867915

[7] Cabanac G. and Labbe C., “Prevalence of Nonsensical Algorithmically Generated Papers in the Scientific Literature,” Journal of the Association for Information Science and Technology, vol. 72, no. 12, pp. 1461-1476, 2021. https://doi.org/10.1002/asi.24495

[8] Clark E., August T., Serrano S., Haduong N., and et al., “All That’s ‘Human’ is not Gold: Evaluating Human Evaluation of Generated Text,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual, pp. 7282-7296, 2021. https://aclanthology.org/2021.acl-long.565/

[9] Coppersmith G. and Kelly E., “Dynamic Wordclouds and Vennclouds for Exploratory Data Analysis,” in Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Baltimore, pp. 22-29, 2014. https://aclanthology.org/W14-3103.pdf

[10] Crothers E., Japkowicz N., and Viktor H., “Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods,” arXiv Preprint, vol. arXiv:2210.07321v4, pp. 1- 36, 2023. https://arxiv.org/abs/2210.07321

[11] Dalalah D. and Dalalah O., “The False Positives and False Negatives of Generative AI Detection Tools in Education and Academic Research: The Case of ChatGPT,” The International Journal of Management Education, vol. 21, no. 2, pp. 100822, 2023. https://doi.org/10.1016/j.ijme.2023.100822

[12] Desaire H., Chua A., Isom M., Jarosova R., and Hua D., “Distinguishing Academic Science Writing from Humans or ChatGPT with over 99% Accuracy Using Off-the-Shelf Machine Learning Tools,” Cell Reports Physical Science, vol. 4, no. 6, pp. 101426, 2023. https://doi.org/10.1016/j.xcrp.2023.101426

[13] Dhariwal P. and Nichol A., Advances in Neural Information Processing Systems, Curran Associates, 2021. https://proceedings.neurips.cc/paper_files/paper/2 021/file/49ad23d1ec9fa4bd8d77d02681df5cfa- Paper.pdf

[14] Elek A., Yildiz H., Akca B., Oren N., and Gundogdu B., “Evaluating the Efficacy of Perplexity Scores in Distinguishing AI-Generated and Human-Written Abstracts,” Academic Radiology, vol. 32, no. 4, pp. 1785-1790, 2025. https://doi.org/10.1016/j.acra.2025.01.017

[15] Fernandez-Hernandez A., Arboledas-Marquez J., Ariza-Merino J., and Jimenez-Zafra S., “Taming the Turing Test: Exploring Machine Learning Approaches to Discriminate Human vs. AI- Generated Texts,” in Proceedings of the Iberian Languages Evaluation Forum, Jaen, pp. 1-18, 2023. https://ceur-ws.org/Vol-3496/

[16] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., and et al., “Generative Adversarial Networks,” Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020. https://doi.org/10.1145/342262

[17] Hazim L. and Ata O., “Textual Authenticity in the AI Era: Evaluating BERT and RoBERTa with Logistic Regression and Neural Networks for Text Classification,” in Proceedings of the 16th International Symposium on Electronics and Telecommunications, Timisoara, pp. 1-6, 2024. https://ieeexplore.ieee.org/document/10797291

[18] ICML, Clarification on Large Language Model Policy LLM, the 14th International Conference on Machine Learning, https://icml.cc/Conferences/2023/llm-policy, Last Visited, 2025.

[19] Islam N., Sutradhar D., Noor H., Raya J., Maisha M., and Farid D., “Distinguishing Human Generated Text from ChatGPT Generated Text Using Machine Learning,” arXiv Preprint, vol. arXiv:2306.01761v1, pp. 1-6, 2023. http://arxiv.org/abs/2306.01761

[20] Jakesch M., Hancock J., and Naaman M., “Human Heuristics for AI-Generated Language are Flawed,” PNAS, vol. 120, no. 11, pp. 1-7, 2023. https://doi.org/10.1073/pnas.2208839120

[21] Katib I., Assiri F., Abdushkour H., Hamed D., and Ragab M., “Differentiating Chat Generative Pretrained Transformer from Humans: Detecting ChatGPT-Generated Text and Human Text Using Machine Learning,” Mathematics, vol. 11, no. 15, pp. 1-19, 2023. https://doi.org/10.3390/math11153400

[22] Kobis N. and Mossink L., “Artificial Intelligence Versus Maya Angelou: Experimental Evidence that People Cannot Differentiate AI-Generated from Human-Written Poetry,” Computers in Human Behavior, vol. 114, pp. 106553, 2021. https://doi.org/10.1016/j.chb.2020.106553

[23] Kowsari K., Meimandi K., Heidarysafa M., Mendu S., and et al., “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, pp. 1-68, 2019. https://doi.org/10.3390/info10040150

[24] Lavoie A. and Krishnamoorthy M., “Algorithmic Detection of Computer Generated Text,” arXiv Bridging the Gap: Ensemble Learning-Based NLP Framework for AI-Generated Text ... 1067 Preprint, vol. arXiv:1008.0706v1, pp. 1-6, 2010. http://arxiv.org/abs/1008.0706

[25] Lewis M., Liu Y., Goyal N., Ghazvininejad M., and et al., “BART: Denoising Sequence-to- Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” arXiv Preprint, vol. arXiv:1910.13461v1, pp. 1- 10, 2019. https://doi.org/10.48550/arXiv.1910.13461

[26] Liao W., Liu Z., Dai H., Xu S., and et al., “Differentiate ChatGPT-Generated and Human- Written Medical Texts,” arXiv Preprint, vol. arXiv:2304.11567v1, pp. 1-15, 2023. http://arxiv.org/abs/2304.11567

[27] Ma Y., Liu J., Yi F., Cheng Q., and et al., “AI vs. Human-Differentiation Analysis of Scientific Content Generation,” arXiv Preprint, vol. arXiv:2301.10416v2, pp. 1-18, 2023. https://doi.org/10.48550/arXiv.2301.10416

[28] Nguyen-Son H., Tieu N., Nguyen H., Yamagishi J., and Zen I., “Identifying Computer-Generated Text Using Statistical Analysis,” in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kuala Lumpur, pp. 1504-1511, 2017. https://ieeexplore.ieee.org/document/8282270

[29] Oelke D. and Gurevych I., “A Study on Human- Generated Tag Structures to Inform Tag Cloud Layout,” in Proceedings of the International Working Conference on Advanced Visual Interfaces, Como, pp. 297-304, 2014. https://doi.org/10.1145/2598153.2598155

[30] OpenAI, Chatgpt: Optimizing Language Models for Dialogue (2022), https://openai.com/blog/chatgpt, Last Visited, 2025.

[31] Pataranutaporn P. Danry V., Leong J., Punpongsanon P., and et al., “AI-Generated Characters for Supporting Personalized Learning and Well-Being,” Nature Machine Intelligence, vol. 3, pp. 1013-1022, 2021. https://www.nature.com/articles/s42256-021- 00417-9

[32] Perez-Castro A., Martinez-Torres M., and Toral S., “Efficiency of Automatic Text Generators for Online Review Content Generation,” Technological Forecasting and Social Change, vol. 189, pp. 122380, 2023. https://doi.org/10.1016/j.techfore.2023.122380

[33] Petroni F., Rocktaschel T., Lewis P., Bakhtin A., and et al., “Language Models as Knowledge Bases?,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, and the 9th International Joint Conference on Natural Language Processing, Hong Kong, pp. 2463-2473, 2019. https://aclanthology.org/D19-1250.pdf

[34] Qaiser S., Utara U., Sintok M., Kedah M., and et al., “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents Text Mining,” International Journal of Computer Applications, vol. 181, no. 1, pp. 25-29, 2018. DOI: 10.5120/ijca2018917395

[35] Rafea L., Ahmed A., and Abdullah W., “Classification of a COVID-19 Dataset by Using Labels Created from Clustering Algorithms,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 21, no. 1, pp. 164-173, 2021. http://doi.org/10.11591/ijeecs.v21.i1.pp164-173

[36] Rosenfeld R., “A Maximum Entropy Approach to Adaptive Statistical Language Modelling,” Computer Speech and Language, vol. 10, no. 3, pp. 187-228, 1996. https://doi.org/10.1006/csla.1996.0011

[37] Roumeliotis K. and Tselikas N., “ChatGPT and Open-AI Models: A Preliminary Review,” Future Internet, vol. 15, no. 6, pp. 1-24, 2023. https://doi.org/10.3390/fi15060192

[38] SCIgen, An Automatic CS Paper Generator, https://pdos.csail.mit.edu/archive/scigen/, Last Visited, 2025.

[39] Singh A., Sharma D., Nandy A., and Singh V., “Towards a Large Sized Curated and Annotated Corpus for Discriminating between Human Written and AI Generated Texts: A Case Study of Text Sourced from Wikipedia and ChatGPT,” Natural Language Processing Journal, vol. 6, pp. 100050, 2024. https://doi.org/10.1016/j.nlp.2023.100050

[40] Stiff H. and Johansson F., “Detecting Computer- Generated Disinformation,” International Journal of Data Science and Analytics, vol. 13, no. 4, pp. 363-383, 2022. https://doi.org/10.1007/s41060- 021-00299-5

[41] Sun S., Zhao W., Manjunatha V., Jain R., and et al., “IGA: An Intent-Guided Authoring Assistant,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Punta Cana, pp. 5972-5985, 2021. https://aclanthology.org/2021.emnlp-main.483/

[42] Sun Z., Zhang Z., Shen X., Zhang Z., and et al., “Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media,” arXiv Preprint, vol. arXiv:2412.18148v3, pp. 1-29, 2025. https://arxiv.org/abs/2412.18148

[43] Taylor R., Kardas M., Cucurull G., Scialom T., and et al., “Galactica: A Large Language Model for Science,” arXiv Preprint, vol. arXiv:2211.09085v1, pp. 1-58, 2022. http://arxiv.org/abs/2211.09085

[44] Theocharopoulos P., Anagnostou P., Tsoukala A., Georgakopoulos S., and et al., “Detection of Fake Generated Scientific Abstracts,” in Proceedings of the IEEE 9th International Conference on Big 1068 The International Arab Journal of Information Technology, Vol. 22, No. 6, November 2025 Data Computing Service and Applications, Athens, pp. 33-39, 2023. https://ieeexplore.ieee.org/document/10233982

[45] Uysal A. and Gunal S., “The Impact of Preprocessing on Text Classification,” Information Processing and Management, vol. 50, no. 1, pp. 104-112, 2014. https://doi.org/10.1016/j.ipm.2013.08.006

[46] Wu J., Yang S., Zhan R., Yuan Y., and et al., “A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions,” Computational Linguistics, vol. 51, no. 1, pp. 275- 338, 2025. https://aclanthology.org/2025.cl-1.8/

[47] Zellers R., Holtzman A., Rashkin H., Bisk Y., and et al., “Defending Against Neural Fake News,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, pp. 9054-9065, 2019. https://dl.acm.org/doi/10.5555/3454287.3455099