The International Arab Journal of Information Technology (IAJIT)


New Model of Feature Selection based Chaotic Firefly Algorithm for Arabic Text Categorization

The dimensionality reduction is a type of problem that appear in the most classification processes. It contains a large number of features; these features may contain unreliable data which may lead the categorization process to unwanted results. Feature selection can be used for reducing dimensionality of datasets and find interesting relevant information. In Arabic language, the number of works applies a meta-heuristic algorithm for feature selection is still limited due to the complex nature of Arabic inflectional and derivational rules as well as its intricate grammatical rules and its rich morphology. This paper proposes a new model for Arabic Feature Selection that combines the chaotic method in the Firefly Algorithm (CFA). The Chaotic Algorithm replaces the attractiveness coefficient in firefly algorithm by the outputs of chaotic application. The enhancement of the new approach involves introducing a novel search strategy which is able to obtain a good ratio between exploitation and exploration abilities of the algorithm. In terms In terms of performance, the experiments of the proposed method are tested using classifiers, namely Naive Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) and three evaluation measures, including precision, recall, and F-measure. The experimental findings show that the combining of CFA and SVM classifiers outperforms other combinations in terms of precision.

[1] Ahmad S., Yusop N., Bakar A., and Yaakub M., “Statistical Analysis for Validating ACO-KNN Algorithm As Feature Selection in Sentiment Analysis,” International Conference on Electronics and Communication System, vol. 1891, no. 1, 2017.

[2] Alghamdi H. and Selamat A., “The Hybrid Feature Selection K-Means Method for Arabic Web Page Classification,” Jurnal Teknologi, vol. 70, no. 5, pp. 73-79, 2014. DOI:

[3] Alghamdi H., Tang H., and Alshomrani S., “Hybrid ACO and TOFA Feature Selection Approach for Text Classification,” in Proceedings of the IEEE World Congress on Computational Intelligence, Brisbane, pp. 10-15, 2012. DOI: 10.1109/CEC.2012.6252960

[4] Al-Harbi S., Al-Muhareb A., Al-Thubaity M., Khorsheed S., and Al-Rajeh A.,” Automatic Arabic Text Classification,” in Proceedings of The 9th International Conference on the Statistical Analysis of Textual Data, pp. 77-87, 2008.

[5] Al-Zahrani A., and Mathkour H., Abdalla H., “PSO-based Feature Selection for Arabic Text Summarization,” Journal of Universal Computer Science, vol. 21, no. 11, pp. 1454-1469, 2015. 10.3217/jucs-021-11-1454

[6] Bessou S., Saadi A., and Touahria M., “Un système d’indexation et de recherche des textes en arabe SITRA,” 1er séminaire national sur le langage naturel et l’intelligence artificielle LANIA, pp. 20-21, 2007.

[7] El-Halees A., “Arabic Text Classification Using Maximum Entropy,” The Islamic University Journal, vol. 15, no. 1, 157-167, 2007.

[8] El-Kourdi M., Bensaid A., and Rachidi T., “Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm,” in Proceeding of the 20th International Conference on Computational Linguistics, Geneva, 2004. DOI:10.3115/1621804.1621819

[9] Greene D. and Cross J., “Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach,” Political Analysis, Vol. 25, no. 1, pp. 77-94, 2017.

[10] Hadni M. and Hassane H., “A New Meta- heuristic Approach Based Feature Selection for Arabic Text Categorization,” The International Arab Conference on Information Technology, Abu Dhabi, pp. 1-7, 2022. doi: 10.1109/ACIT57182.2022.9994102.

[11] Hadni M., El Alaoui S., and Lachkar A., “Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization,” International Journal of Data Mining and Knowledge Management Process, vol. 3, no. 4, 2013. DOI:10.5121/ijdkp.2013.3401

[12] Hadni M., El Alaoui S., and Lachkar A., Meknassi M., “Hybrid Part-of-SpeechTagger for Non-Vocalized Arabic Text,” International Journal on Natural Language Computing, vol. 2, no. 6, pp. 1-15, 2013.

[13] Ja’afaru B., SabonGari N., and Zubairu B., “An Analytical Review on the Recent performances of Firefly Algorithm Fa,” Journal of Engineering Research and Application, vol. 10, no. 4, 2020. DOI: 10.9790/9622-1004032637

[14] Larabi Marie-Sainte S. and Alalyani N., “Firefly Algorithm based Feature Selection for Arabic 1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8 468 The International Arab Journal of Information Technology, Vol. 20, No. 3A, Special Issue 2023 Text Classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 3, pp. 320-328, 2018.

[15] Mesleh A., “Chi-Square Feature Extraction Based SVMs Arabic Language Text Categorization System,” Journal of Computer Science, vol.3, no. 6, 430-435, 2007. DOI:10.3844/jcssp.2007.430.435

[16] Mohammad S., “Sentiment analysis: Automatically Detecting Valence, Emotions, and Other Affectual States from Text,” Emotion Measurement (Second Edition), pp. 323-379, 2021. 3.00011-9

[17] Nahar K., Al‐Khatib R., Al‐Shannaq M., Daradkeh M., and Malkawi R., “Direct Text Classifier for Thematic Arabic Discourse Documents,” The International Arab Journal of Information Technology, vol. 17, no. 3, pp. 394-403, 2019.

[18] Sarac E. and Ozel S., “Web Page Classification Using Firefly Optimization,” in Proceedings of the Innovations in Intelligent Systems and Applications Albena, 2013. DOI: 10.1109/INISTA.2013.6577619

[19] Suchanek F., Kasneci G., and Weikum G., “Yago:A Large Ontology from Wikipedia and WN,” Journal of Web Semantics, vol. 6, no. 3, pp. 203-217, 2008. DOI:10.1016/j.websem.2008.06.001

[20] Tandra V., Yowen Y., Tanjaya R., Santoso W., and Qomariyah N., “Short Message Service Filtering with Natural Language Processing in Indonesian Language,” in Proceedings of the International Conference on ICT for Smart Society ICISS, Bandung, pp. 1-7, 2021. DOI: 10.1109/ICISS53185.2021.9532503

[21] Thirumagal D., Nithya S., Sangavi P., and Pugazhendi E., “Email Spam Detection and Data Optimization using NLP Techniques,” International Journal of Engineering Research and Technology IJERT, vol. 10, no. 8, pp. 38-49, 2021.

[22] Thribhuvan N. and Elayidom S., “Transfer Learning for Feature Dimensionality Reduction,” The International Arab Journal of Information Technology, vol. 19, no. 5, 2022.

[23] Touati-Hamad Z., Laouar M., Bendib I., and Hakak S., “Arabic Quran Verses Authentication Using Deep Learning and Word Embeddings,” The International Arab Journal of Information Technology, vol. 19, no. 4, pp. 681-688, 2022.

[24] Wang L. and Zhao X., “Improved knn Classification Algorithm Research in Text Categorization,” in the Proceedings of the 2nd International Conference on Communications and Networks CECNet, Yichang, pp.1848-1852, 2012. DOI: 10.1109/CECNet.2012.6201850

[25] Yang X., “Firefly Algorithm, Stochastic Test Functions and Design Optimization,” International Journal of Bio Inspired Computation Archive, vol. 2, no. 2, pp. 78-84 2010. DOI:10.1504/IJBIC.2010.032124

[26] Yang Y. and Liu X., “A Re-Examination of Text Categorization Methods,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR’99, Berkley, pp. 42- 49, 1999.

[27] Yoshida M., Ikeda M., Ono S., Sato I., and Nakagawa H., “Person Name Disambiguation by Boot strapping,” in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 10-17, 2010.

[28] Yousif S., Sultani Z., and Samawi V., “Utilizing Arabic Word Net Relations in Arabic Text Classification: New Feature Selection Methods,” IAENG International Journal of Computer Science, vol. 64, no. 4, pp750-761, 2019.

[29] Zhang L., Mistry K., Limc S., and Neoh S., “Feature Selection Using Firefly Optimization for Classification And Regression Models,” Decision Support Systems, vol. 106, pp. 64-85, 2018.