The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


An ML-Based Classification Scheme for Analyzing the Social Network Reviews of Yemeni People

The social network allows individuals to create public and semi-public web-based profiles to communicate with other users in the network and online interaction sources. Social media sites such as Facebook, Twitter, etc., are prime examples of the social network, which enable people to express their ideas, suggestions, views, and opinions about a particular product, service, political entity, and affairs. This research introduces a Machine Learning-based (ML-based) classification scheme for analyzing the social network reviews of Yemeni people using data mining techniques. A constructed dataset consisting of 2000 MSA and Yemeni dialects records used for training and testing purposes along with a test dataset consisting of 300 Modern Standard Arabic (MSA) and Yemeni dialects records used to demonstrate the capacity of our scheme. Four supervised machine learning algorithms were applied and a comparison was made of performance algorithms based on Accuracy, Recall, Precision and F-measure. The results show that the Support Vector Machine algorithm outperformed the others in terms of Accuracy on both training and testing datasets with 90.65% and 90.00, respectively. It is further noted that the accuracy of the selected algorithms was influenced by noisy and sarcastic opinions.


[1] Abdulla N., Ahmed N., Shehab M., and Al- Ayyoub M., “Arabic Sentiment Analysis: Lexicon-Based and Corpus-Based,” in Proceeding of IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies, Amman, pp. 1-6, 2013.

[2] Aggarwal C., and Zhai C., Mining Text Data, Springer Science and Business Media, 2012.

[3] Alayba A., Palade V., England M., and Iqbal R., “Arabic Language Sentiment Analysis on Health Services,” in Proceeding of IEEE 1st International Workshop on Arabic Script Analysis and Recognition, Nancy, pp. 114-118, 2017.

[4] Al-Ayyoub M., Rihani M., Dalgamoni N., and Abdulla N., “Spoken Arabic Dialects Identification: The Case of Egyptian and Jordanian Dialects,” in Proceeding of 5th International Conference on Information and Communication Systems, Irbid, pp. 1-6, 2014.

[5] Al-Azani S. and El-Alfy E., “Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text,” in Proceeding of 8th international Conference on Ambient Systems, Networks and Technologies, ANT, Dhahran, pp. 359-366, 2017.

[6] Al-Harbi O., “Classifying Sentiment of Dialectal Arabic Reviews: A Semi-Supervised Approach,” The International Arab Journal of Information Technology, vol. 16, no. 6, pp. 995-1002, 2019.

[7] Al-Harbi O., “Using Objective Words in the Reviews to Improve the Colloquial Arabic Sentiment Analysis,” International Journal on Natural Language Computing, vol. 6, no. 3, pp. 01-14, 2017.

[8] Alhumoud S., Albuhairi T., and Altuwaijri M., “Arabic Sentiment Analysis Using WEKA A Hybrid Learning Approach,” in Proceeding of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Lisbon, pp. 402-408, 2015.

[9] AL-Sukkar G., Aljarah I., and Alsawalqah H., “Enhancing the Arabic Sentiment Analysis Using Different Preprocessing Operators,” in Proceedings of the New Trends in Information Technology, Amman, pp. 113, 2017.

[10] Ashari A., Paryudi I., and Tjoa A., “Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool,” International Journal of Advanced Computer Science and Applications, vol. 4, no. 11, 2013.

[11] Bilal M., Israr H., Shahid M., and Khan A., “Sentiment Classification of Roman-Urdu Opinions Using Naïve Bayesian, Decision Tree and KNN Classification Techniques,” Journal of King Saud University-Computer and Information Sciences, vol. 28, no. 3, pp. 330-344, 2016.

[12] Duwairi R., Marji R., Sha'ban N., and Rushaidat S., “Sentiment Analysis in Arabic Tweets,” in Proceeding of 5th International Conference on Information and Communication Systems, Irbid, pp. 1-6, 2014.

[13] Duwairi R. and Qarqaz I., “Arabic Sentiment Analysis Using Supervised Classification,” in Proceeding of International Conference on Future Internet of Things and Cloud, Barcelona, pp. 579-583, 2014.

[14] El-Masri M., Altrabsheh N., Mansour H., and Ramsay A., “A Web-Based Tool for Arabic Sentiment Analysis,” Procedia Computer Science, vol. 117, pp. 38-45, 2017.

[15] Elnagar A., “Investigation on Sentiment Analysis for Arabic Reviews,” in Proceeding of IEEE/ACS 13th International Conference of Computer Systems and Applications, Agadir, pp. 1-7, 2016.

[16] Han J., Kamber M., and Pei J., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2011.

[17] Itani M., Roast C., and Al-Khayatt S., “Corpora for Sentiment Analysis of Arabic Text In Social Media,” in Proceeding of IEEE 8th International Conference on Information and Communication Systems, Irbid, pp. 64-69, 2017.

[18] Khan F., Arnold M., and Pottenger W., “Finite Precision Analysis of Support Vector Machine Classification in Logarithmic Number Systems,” in Proceeding of IEEE Euromicro Symposium on Digital System Design, Rennes, pp. 254-261, 2004.

[19] Liu B., Sentiment Analysis and Opinion Mining, Springer Link, 2012.

[20] Liu B., Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Science and Business Media, 2007.

[21] Mahyoub F., Siddiqui M., and Dahab M., “Building an Arabic Sentiment Lexicon Using Semi-supervised Learning,” Journal of King Saud University-Computer and Information Sciences, vol. 26, no. 4, pp. 417-424, 2014.

[22] Mostafa A., “An Automatic Lexicon with Exceptional-Negation Algorithm for Arabic Sentiments Using Supervised Classification,” Journal of Theoretical and Applied Information Technology, vol. 95, no. 15, pp. 3662-3671, 2017.

[23] Mukhtar N. and Khan M., “Urdu Sentiment Analysis Using Supervised Machine Learning Approach,” International Journal of Pattern 914 The International Arab Journal of Information Technology, Vol. 19, No. 6, November 2022 Recognition and Artificial Intelligence, vol. 32, no. 02, pp. 1851001, 2018.

[24] Mustafa H., Mohamed A., and Elzanfaly D., “An Enhanced Approach for Arabic Sentiment Analysis,” International Journal of Artificial Intelligence and Applications, vol. 8, no. 5, pp. 01-14, 2017.

[25] Parveen S., Surnar A., and Sonawane S., “Mining in Twitter: How to make use of Sarcasm to Enhance Sentiment Analysis: A Review,” International Journal of Advanced Research in Computer Engineering and Technology, vol. 6, no. 6, 2017.

[26] Prabowo D., Setiawan N., and Nugroho H., “A Study of Data Randomization on A Computer Based Feature Selection for Diagnosing Coronary Artery Disease,” Advances in Intelligent Systems, vol. 53, pp. 237-248, 2014.

[27] Sati B., Ali M., and Abdou S., “Arabic Text Question Answering from an Answer Retrieval Point of View: A Survey,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, pp. 478-484, 2016.

[28] Sohail S., Siddiqui J., and Ali R., “Book Recommendation System Using Opinion Mining Technique,” in Proceeding of International Conference on Advances in Computing, Communications and Informatics, Mysore, pp. 1609-1614, 2013.

[29] Zubair M., “Survey of Data Mining Techniques for Social Network Analysis,” International Journal of Research in Computer Engineering and Electronics, vol. 6, no. 3, pp. 01-08, 2014. Emran Al-Buraihy received the B.Sc. (Hons) degree in information technology from University of Science and Technology Taiz, Yemen, in 2014, and the M.S. degree in information technology from Institute of Business and Management Science (IBMS), The University of Agriculture Peshawar, Peshawar, Pakistan in 2018. He is currently pursuing the Ph.D. degree in Computer Science and Technology at Beijing University of Technology, Beijing, China. Wang Dan received the B.S. degree in computer application, the M.S. degree in computer software and theory, and the Ph.D. degree in computer software and theory from Northeastern University, China, in 1991, 1996, and 2002, respectively. She is currently a Professor with the College of Computer Science, Beijing University of Technology. Her major areas of interests include trusted software, web security, and big data. Rafi Ullah Khan received the B.S. degree in computer science from Islamia College Peshawar, Peshawar, Pakistan, in 2007, the M.S. degree in internetworking and digital communication from the Institute of Management Sciences (IMS), Peshawar, in 2010, and the Ph.D. degree in computer science from the Capital University of Science & Technology, Islamabad, Pakistan, in 2020. He has been working as a Senior Lecturer with the Institute of Computer Sciences and Information Technology, The University of Agriculture, Peshawar, Pakistan, since 2011. His research interests include data mining, machine learning, web user privacy, sentiment analysis, and computer networks. Mohib Ullah received the M.S. degree from Birmingham City University, U.K., and the Ph.D. degree from the Capital University of Science and Technology, Islamabad, Pakistan. He is currently working as a Senior Lecturer with the Institute of Computer Sciences and Information Technology (ICS/IT), The University of Agriculture, Peshawar, Pakistan. He has published 15 research articles in well-reputed journals and international conferences. His research interests include the security and privacy issues associated with computer networks, WSN, and the IoT.