The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Arabic Language Characteristics that Make its Automatic Processing Challenging

Arabic Natural Language Processing (ANLP) is an area of Artificial Intelligence (AI) that enables computers to process and understand Arabic text and speech. It powers applications such as translation, sentiment analysis, and speech recognition, opening new horizons. However, despite the worldwide popularity of Arabic, ANLP is lagging. The Arabic language possesses unique linguistic features that complicate computational processing. In this context, our article seeks to explore these challenges and examine Arabic’s inherent characteristics in orthography, morphology, grammar, syntax, and linguistic diversity, which contribute to the complexities of its Natural Language Processing (NLP). Indeed, the main challenges are the diversity of dialects, the morphological and syntactic richness of Arabic, diglossia, and the absence of short vowels. In addition, the scarcity of Arabic resources further complicates NLP efforts. This review can serve as a guide for practitioners in the field of Arabic NLP, whether they are computer scientists or linguists. It also calls on the Arab community scientists to take steps to meet the potential challenges and increase efforts in the field to promote Arabic NLP.

[1] Aarab A., Oussous A., and Saddoune M., “Optimizing Arabic Information Retrieval: A Comprehensive Evaluation of Preprocessing Techniques,” in Proceedings of the IEEE 12th International Symposium on Signal, Image, Video and Communications, Marrakech, pp. 1-4, 2024, DOI:10.1109/ISIVC61350.2024.10577827

[2] Abdallah A., Kasem M., Abdalla M., Mahmoud M., Elkasaby M., Elbendary Y., and Jatowt A., “ArabicaQA: Comprehensive Dataset for Arabic Question Answering,” in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington (DC), pp. 2049-2059, 2024. https://doi.org/10.1145/3626772.3657889

[3] Abdul-Mageed M., Elmadany A., and Nagoudi E., “ARBERT and MARBERT: Deep Bidirectional Transformers for Arabic,” in Proceedings of the 59th Annual Meeting of the Association for 826 The International Arab Journal of Information Technology, Vol. 22, No. 4, July 2025 Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 7088-7105, 2020. DOI: 10.18653/v1/2021.acl-long.551

[4] Abu Nada A., Alajrami E., Al-Saqqa A., and Abu- Naser S., “Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach,” International Journal of Academic Information Systems Research, vol. 4, no. 8, pp. 6-9, 2020. http://ijeais.org/wp- content/uploads/2020/8/IJAISR200802.pdf

[5] Africa A., Lamdagan R., and Lacanilao J., “Audio-based Assessment in Determining Language,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 7, pp. 2984-2988, 2020. https://doi.org/10.30534/ijeter/2020/16872020

[6] Ahmed A., Ali N., Alzubaidi M., Zaghouani W., Abd-alrazaq A., and Househ M., “Freely Available Arabic Corpora: A Scoping Review,” Computer Methods and Programs in Biomedicine Update, vol. 2, pp. 100049, 2022. https://doi.org/10.1016/j.cmpbup.2022.100049

[7] Akbulut F., “A Study on Interdisciplinary Nature of Translation Studies,” Journal of Language Research, vol. 6, no. 1, pp. 45-56, 2022. https://doi.org/10.51726/jlr.1193899

[8] Akhmanova O. and Mikaeljan G., The Theory of Syntax in Modern Linguistics, Mouton, 1969. https://api.pageplace.de/preview/DT0400.978311 2414668_A44908655/preview- 9783112414668_A44908655.pdf

[9] Al Ghanim M., Almohaimeed S., Zheng M., Solihin Y., and Lou Q., “Jailbreaking LLMs with Arabic Transliteration and Arabizi,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Florida, pp. 18584-18600, 2024. https://aclanthology.org/2024.emnlp-main.1034/

[10] Al Moaiad Y., Alobed M., Alsakhnini M., and Momani A., “Challenges in Natural Arabic Language Processing,” Edelweiss Applied Science and Technology, vol. 8, no. 6, pp. 4700-4705, 2024. https://doi.org/10.55214/25768484.v8i6.3018

[11] Al-Ghadhban D. and Al-Twairesh N., “Nabiha: An Arabic Dialect Chatbot,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 3, pp. 452-459, 2020. https://www.researchgate.net/publication/340403 610_Nabiha_An_Arabic_Dialect_Chatbot

[12] Ali A., Siddiqui M., Algunaibet R., and Ali H., “A Large and Diverse Arabic Corpus for Language Modeling,” Procedia Computer Science, vol. 225, pp. 12-21, 2022. https://doi.org/10.1016/j.procs.2023.09.086

[13] Alkaoud M., Alsaqoub M., Aljodhi I., Alqadibi A., and Altammami O., “ACLM: Developing a Compact Arabic Language Model,” The International Arab Journal of Information Technology, vol. 22, no. 3, pp. 535-546, 2025. https://doi.org/10.34028/iajit/22/3/9

[14] Almars A., “Attention-based Bi-LSTM Model for Arabic Depression Classification,” Computers, Materials and Continua, vol. 71, no. 2, pp. 3091- 3106, 2022. https://doi.org/10.32604/cmc.2022.022609

[15] Almurayh A., “The Challenges of Using Arabic Chatbot in Saudi Universities,” IAENG International Journal of Computer Science, vol. 48, no. 1, pp. 1-12, 2021. https://www.iaeng.org/IJCS/issues_v48/issue_1/I JCS_48_1_21.pdf

[16] Alothman A. and Alsalman A., “Arabic Morphological Analysis Techniques,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 214- 222, 2020. file:///C:/Users/user/Downloads/Arabic_Morphol ogical_Analysis_Techniques.pdf

[17] Aloui M., Chouikhi H., Chaabane G., Kchaou H., and Dhaouadi C., “101 Billion Arabic Words Dataset,” arXiv Preprint, vol. arXiv:2405.01590v1, pp. 1-15, 2024. https://doi.org/10.48550/arXiv.2405.01590

[18] Alsaleh A., Althabiti S., Alshammari I., Alnefaie S., et al., “LK2022 at Qur’an QA 2022: Simple Transformers Model for Finding Answers to Questions from Qur’an,” in Proceedings of the OSACT Workshop, ELRA European Language Resources Association, Marseille, pp. 120-125, 2022. https://aclanthology.org/2022.osact- 1.14.pdf

[19] AL-Sarayreh S., Mohamed A., and Shaalan K., “Challenges and Solutions for Arabic Natural Language Processing in Social Media,” in Proceedings of the International Conference on Business Intelligence and Information Technology, Smart Innovation, Systems and Technologies, Harbin, pp. 293-302, 2023. https://doi.org/10.1007/978-981-99-3416-4_24

[20] Alshaari M., Modern Standard Arabic Speech Recognition: Using Formants Measurements to Extract Vowels from Arabic Words’ Consonant- Vowel-Consonant-Vowel Structure, Doctoral Theses, Florida Institute of Technology, 2020. https://repository.fit.edu/etd/858

[21] Alsharhan E., Ramsay A., and Ahmed H., “Evaluating the Effect of Using Different Transcription Schemes in Building a Speech Recognition System for Arabic,” International Journal of Speech Technology, vol. 25, no. 1, pp. 43-56, 2022. https://link.springer.com/article/10.1007/s10772- 020-09720-z

[22] Alturayeif N., Luqman H., Alyafeai Z., and Arabic Language Characteristics that Make its Automatic Processing Challenging 827 Yamani A., “StancEval 2024: The First Arabic Stance Detection Shared Task,” in Proceedings of the 2nd Arabic Natural Language Processing Conference, Bangkok, pp. 774-782, 2024. DOI: 10.18653/v1/2024.arabicnlp-1.88

[23] Antoun W., Baly F., and Hajj H., “AraBERT: Transformer-based Model for Arabic Language Understanding,” in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, pp. 9-15, 2020. https://aclanthology.org/2020.osact-1.2/

[24] Aronoff M. and Fudeman K., What is Morphology?, John Wiley and Sons, 2022. https://www.wiley.com/en- ie/What+is+Morphology%3F%2C+3rd+Edition- p-9781119715221

[25] Aronoff M. and Rees-Miller J., The Handbook of Linguistics, John Wiley and Sons, 2017. DOI:10.1002/9781119072256

[26] Ashraf Y., Wang Y., Gu B., Nakov P., and Baldwin T., “Arabic Dataset for LLM safeguard Evaluation,” in Proceedings of the Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Human Language Technologies, New Mexico, pp. 5529-5546, 2025. https://aclanthology.org/2025.naacl-long.285/

[27] Ataboyev I. and Turgunova F., “The Concept of Semantic Field in Linguistics,” ACADEMICIA: An International Multidisciplinary Research Journal, vol. 12, no. 3, pp. 319-324, 2022. DOI :10.5958/2249-7137.2022.00223.3

[28] Baktash J. and Dawodi M., “GPT-4: A Review on Advancements and Opportunities in Natural Language Processing,” Journal of Electrical Electronics Engineering, vol. 2, no. 4, pp. 548- 549, 2023. DOI: 10.33140/JEEE.02.04.19

[29] Bashir M., Azmi A., Nawaz H., Zaghouani W., Diab M., Al-Fuqaha A., and Qadir J., “Arabic Natural Language Processing for Qur’anic Research: A Systematic Review,” Artificial Intelligence Review, vol. 56, no. 7, pp. 6801-6854, 2023. https://doi.org/10.1007/s10462-022-10313- 2

[30] Boudad N., Faizi R., Thami R., and Chiheb R., “Sentiment Analysis in Arabic: A Review of the Literature,” Ain Shams Engineering Journal, vol. 9, no. 4, pp. 2479-2490, 2018. https://doi.org/10.1016/j.asej.2017.04.007

[31] Boyd R. and Schwartz H., “Natural Language Analysis and the Psychology of Verbal Behavior: The Past, Present, and Future States of the Field,” Journal of Language and Social Psychology, vol. 40, no. 1, pp. 21-41, 2021. https://doi.org/10.1177/0261927X20967028

[32] Bragg D., Koller O., Bellard M., and Berke L., et al., “Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective,” in Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility, Pittsburgh, pp. 16-31, 2019. https://doi.org/10.1145/3308561.3353774

[33] Brown K. and Miller J., Syntax: A Linguistic Introduction to Sentence Structure, Routledge, 2020. https://doi.org/10.4324/9781003070702

[34] Cheng D., “Corpus Linguistics for Pragmatics: A Guide for Research, Written by Ruhlemann, Christoph,” Contrastive Pragmatics, pp. 1-5, 2022. DOI:10.1163/26660393-bja10064

[35] Chomsky N., Aspects of the Theory of Syntax, The MIT Press, 2014. https://www.scribd.com/document/543469281/A spects-of-the-Theory-of-Syntax-by-Noam- ChomskyAJ

[36] Chouikhi H., Chniter H., and Jarray F., “Arabic Sentiment Analysis Using Bert Model,” in Proceedings of the 13th International Conference, Advances in Computational Collective Intelligence, Rhodes, pp. 621-632, 2021. https://doi.org/10.1007/978-3-030-88113-9_50

[37] Church K. and Liberman M., “The Future of Computational Linguistics: on beyond Alchemy,” Frontiers in Artificial Intelligence, vol. 4, pp. 1- 18, 2021. https://doi.org/10.3389/frai.2021.625341

[38] Dahou A., Abdelmoazz M., and Cheragui M., “A3C: Arabic Anaphora Annotated Corpus,” in Proceedings of the 4th International Conference on Natural Language and Speech Processing, Trento, pp. 147-155, 2021. https://aclanthology.org/2021.icnlsp-1.17/

[39] Darwish K., Habash N., Abbas M., and Al-Khalifa H., et al., “A Panoramic Survey of Natural Language Processing in the Arab World,” Communications of the ACM, vol. 64, no. 4, pp. 72-81, 2021. https://doi.org/10.1145/3447735

[40] Dohma U., “Cognitive Linguistics with its Theoretical Aspects,” Uludag Universitesi Fen- Edebiyat Fakultesi Sosyal Bilimler Dergisi, vol. 23, no. 43, pp. 1235-1259, 2022. https://doi.org/10.21550/SOSBILDER.1037676

[41] Dubey S., Shukla O., and Tiwari S., “Analysis of Application of Natural Language Processing in Artificial Intelligence,” International Journal of Mechanical Engineering, vol. 7, no. 5, pp. 419- 421, 2022. https://kalaharijournals.com/resources/Special_Is sue_April_May_55.pdf

[42] Eisenstein J., Introduction to Natural Language Processing, The MIT Press, 2019. https://mitpress.mit.edu/9780262042840/introduc tion-to-natural-language-processing/

[43] El-Alami F., El Alaoui S., and Nahnahi N., “Contextual Semantic Embeddings based on Fine- Tuned AraBERT Model for Arabic Text Multi- 828 The International Arab Journal of Information Technology, Vol. 22, No. 4, July 2025 Class Categorization,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 10, pp. 8422-8428, 2022. https://doi.org/10.1016/j.jksuci.2021.02.005

[44] Elgamal S., Obeid O., Kabbani T., Inoue G., and Habash N., “Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, pp. 14815-14829, 2024. https://aclanthology.org/2024.acl-long.792/

[45] Elnagar A., Yagi S., Nassif A., Shahin I., and Salloum S., “Systematic Literature Review of Dialectal Arabic: Identification and Detection,” IEEE Access, vol. 9, pp. 31010-31042, 2021. DOI: 10.1109/ACCESS.2021.3059504

[46] Elsaid A., Mohammed A., Ibrahim L., and Sakre M., “A Comprehensive Review of Arabic Text Summarization,” IEEE Access, vol. 10, pp. 38012- 38030, 2022. DOI:10.1109/ACCESS.2022.3163292

[47] Fabbro F., Fabbro A., and Crescentini C., “The Nature and Function of Languages,” Languages, vol. 7, no. 4, pp. 1-10, 2022. https://doi.org/10.3390/languages7040303

[48] Faizullah S., Ayub M., Hussain S., and Khan M., “A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges,” Applied Sciences, vol. 13, no. 7, pp. 1-27, 2023. https://doi.org/10.3390/app13074584

[49] Farha I. and Magdy W., “A Comparative Study of Effective Approaches for Arabic Sentiment Analysis,” Information Processing and Management, vol. 58, no. 2, pp. 102438, 2021. https://doi.org/10.1016/j.ipm.2020.102438

[50] Frawley W., Linguistic Semantics, Routledge Lawrence and Francis Group, 1992. https://api.pageplace.de/preview/DT0400.978113 5441708_A23802583/preview- 9781135441708_A23802583.pdf

[51] Freeman D., “Arguing for a Knowledge-Base in Language Teacher Education, then (1998) and Now (2018),” Language Teaching Research, vol. 24, no. 1, pp. 5-16, 2020. https://doi.org/10.1177/1362168818777534

[52] Ghaddar A., Wu Y., Bagga S., and Rashid A., “Revisiting Pre-Trained Language Models and Their Evaluation for Arabic Natural Language Processing,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, pp. 3135-3151, 2022. DOI: 10.18653/v1/2022.emnlp-main.205

[53] Guellil I., Saadane H., Azouaou F., Gueni B., and Nouvel D., “Arabic Natural Language Processing: An Overview,” Journal of King Saud University- Computer and Information Sciences, vol. 33, no. 5, pp. 497-507, 2021. https://doi.org/10.1016/j.jksuci.2019.02.006

[54] Habash N., Bouamor H., and Chung C., “Automatic Gender Identification and Reinflection in Arabic,” in Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing, Florence, pp. 155-165, 2019. https://doi.org/10.18653/v1/W19-3822

[55] Habash N., Bouamor H., Eskander R., and Tomeh N., et al., “Proceedings of the Second Arabic Natural Language Processing Conference,” in Proceedings of the 2nd Arabic Natural Language Processing Conference, Bangkok, pp. 1-17, 2024. https://aclanthology.org/2024.arabicnlp-1.0/

[56] Habash N., Introduction to Arabic Natural Language Processing, Springer Nature, 2010. https://doi.org/10.1007/978-3-031-02139-8

[57] Huang X., Zou D., Cheng G., Chen X., and Xie H., “Trends, Research Issues and Applications of Artificial Intelligence in Language Education,” Educational Technology and Society, vol. 26, no. 1, pp. 112-131, 2023. https://doi.org/10.30191/ETS.202301_26(1).0009

[58] Islomov D., “Phonetics and Phonology,” Middle European Scientific Bulletin, vol. 11, no. 1, pp. 575-579, 2021. https://core.ac.uk/download/pdf/480517092.pdf

[59] Ismail Q., Alissa K., and Duwairi R., “Arabic News Summarization based on T5 Transformer Approach,” in Proceedings of the 14th International Conference on Information and Communication Systems, Irbid, pp. 1-7, 2023. DOI:10.1109/ICICS60529.2023.10330509

[60] Johnson E. and White K., “Developmental Sociolinguistics: Children’s Acquisition of Language Variation,” WIREs Cognitive Science, vol. 11, no. 1, pp. e1515, 2020. https://doi.org/10.1002/wcs.1515

[61] Julian G., “What are the Most Spoken Languages in the World,” Online, pp. 1-15, 2020. http://tony- silva.com/eslefl/miscstudent/downloadpagearticl es/mostspokenlangs-fluentin3months.pdf

[62] Kaddoura S., Ahmed R., and Jude Hemanth D., “A Comprehensive Review on Arabic Word Sense Disambiguation for Natural Language Processing Applications,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 12, no. 4, pp. e1447, 2022. https://doi.org/10.1002/widm.1447

[63] Khalilia M., Malaysha S., Suwaileh R., Jarrar M., Aljabari A., Elsayed T., and Zitouni I., “ArabicNLU 2024: The First Arabic Natural Language Understanding Shared Task,” in Proceedings of the 2nd Arabic Natural Language Processing Conference, Bangkok, pp. 361-371, 2024. https://aclanthology.org/2024.arabicnlp- 1.30/

[64] Khurana D., Koli A., Khatter K., and Singh S., “Natural Language Processing: State of the Art, Current Trends and Challenges,” Multimedia Arabic Language Characteristics that Make its Automatic Processing Challenging 829 Tools and Applications, vol. 82, no. 3, pp. 3713- 3744, 2023. https://doi.org/10.1007/s11042-022- 13428-4

[65] Koroteev M., “BERT: A Review of Applications in Natural Language Processing and Understanding,” arXiv Preprint, vol. arXiv:2103.11943v1, pp. 1-18, 2021. https://doi.org/10.48550/arXiv.2103.11943

[66] Koubaa A., Ammar A., Ghouti L., Najar O., and Sibaee S., “ArabianGPT: Native Arabic GPT- based Large Language Model,” arXiv Preprint, vol. arXiv:2402.15313v2, pp. 1-21, 2024. https://doi.org/10.48550/arXiv.2402.15313

[67] Kremmel B. and Harding L., “Towards a Comprehensive, Empirical Model of Language Assessment Literacy Across Stakeholder Groups: Developing the Language Assessment Literacy Survey,” Language Assessment Quarterly, vol. 17, no. 1, pp. 100-120, 2020. https://doi.org/10.1080/15434303.2019.1674855

[68] Leech G., Principles of Pragmatics, Routledge, 2016. https://doi.org/10.4324/9781315835976

[69] Levesque K., Breadmore H., and Deacon S., “How Morphology Impacts Reading and Spelling: Advancing the Role of Morphology in Models of Literacy Development,” Journal of Research in Reading, vol. 44, no. 1, pp. 10-26, 2021. https://doi.org/10.1111/1467-9817.12313

[70] Matchin W. and Hickok G., “The Cortical Organization of Syntax,” Cerebral Cortex, vol. 30, no. 3, pp. 1481-1498, 2020. DOI:10.1093/cercor/bhz180

[71] Matthiessen C., Wang B., Ma Y., and Mwinlaaru I., Systemic Functional Insights on Language and Linguistics, Springer Singapore, 2022. https://doi.org/10.1007/978-981-16-8713-6_5

[72] Mohamed Ali H. and Mostafa M., “Challenges Related to Grammatical and Morphological Processing of Arabic Texts by Means of Artificial Intelligence,” Journal of Electrical Systems, vol. 20, no. 6s, pp. 1366-1380, 2024. https://doi.org/10.52783/jes.2918

[73] Mohamed M. and Alosman K., “A Comparative Study of Deep Learning Approaches for Arabic Language Processing,” Jordan Journal of Electrical Engineering, vol. 11, no. 1, pp. 18-34, 2024. https://doi.org/10.5455/jjee.204- 1711016538

[74] Mousi B., Durrani N., Ahmad F., and Hasan M., et al., “AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs,” in Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, pp. 4186- 4218, 2025. https://aclanthology.org/2025.coling- main.283/

[75] Mubarak H., Hussein A., Chowdhury S., and Ali A., “QASR: QCRI Aljazeera Speech Resource-A Large Scale Annotated Arabic Speech Corpus,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 2274-2285, 2021. https://aclanthology.org/2021.acl-long.177/

[76] Nafea A., Muayad M., Majeed R., Ali A., Bashaddadh O., Khalaf M., Sami A., and Steiti A., “A Brief Review on Preprocessing Text in Arabic Language Dataset: Techniques and Challenges,” Babylonian Journal of Artificial Intelligence, vol. 2024, pp. 46-53, 2024. https://doi.org/10.58496/BJAI/2024/007

[77] Nagoudi E., Elmadany A., and Abdul-Mageed M., “TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation,” in Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, Marseille, pp. 1-11, 2022. https://aclanthology.org/2022.osact-1.1/

[78] Nelson L., Burk D., Knudsen M., and McCall L., “The Future of Coding: A Comparison of Hand- Coding and Three Types of Computer-Assisted Text Analysis Methods,” Sociological Methods and Research, vol. 50, no. 1, pp. 202-237, 2021. https://doi.org/10.1177/0049124118769114

[79] Nouhaila B., Habib A., Abdellah A., and Abdelhamid I., “Assessing the Impact of Static, Contextual and Character Embeddings for Arabic Machine Translation,” Journal of Information and Knowledge Management, vol. 23, no. 2, pp. 2450009, 2024. https://doi.org/10.1142/S0219649224500096

[80] Nozza D., Passaro L., and Polignano M., “Preface to the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI),” in Proceedings of the 6th Workshop on Natural Language for Artificial Intelligence, Co-Located with the 21st International Conference of the Italian Association for Artificial Intelligence, Udine, pp. 1-5, 2022. https://ceur-ws.org/Vol-3287/

[81] Obeid O., Zalmout N., Khalifa S., Taji D., Oudah M., Alhafni B., Inoue G., Eryani F., Erdmann A., and Habash N., “CAMeL Tools: An Open-Source Python Toolkit for Arabic Natural Language Processing,” in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, pp. 7022-7032, 2020. https://aclanthology.org/2020.lrec-1.868/

[82] Otter D., Medina J., and Kalita J., “A Survey of the Usages of Deep Learning for Natural Language Processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604-624, 2020. DOI:10.1109/TNNLS.2020.2979670

[83] Oviogun P. and Veerdee P., “Definition of Language and Linguistics: Basic Competence,” Macrolinguistics and Microlinguistics, vol. 1, no. 830 The International Arab Journal of Information Technology, Vol. 22, No. 4, July 2025 1, pp. 1-12, 2020. https://doi.org/10.21744/mami.v1n1.1

[84] Peniro R. and Cyntas J., “Applied Linguistics Theory and Application,” Linguistics and Culture Review, vol. 3, no. 1, pp. 1-13, 2019. DOI:10.21744/lingcure.v3n1.7

[85] Pereltsvaig A., Languages of the World, Cambridge University Press, 2020. https://books.google.jo/books/about/Languages_o f_the_World.html?id=ucjlEAAAQBAJ&redir_es c=y

[86] Qarah F. and Alsanoosy T., “A Comprehensive Analysis of Various Tokenizers for Arabic Large Language Models,” Applied Sciences, vol. 14, no. 13, pp. 1-17, 2024. https://doi.org/10.3390/app14135696

[87] Sharaf A. and Atwell E., “QurAna: Corpus of the Quran Annotated with Pronominal Anaphora,” in Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, pp. 130-137, 2012. http://www.lrec- conf.org/proceedings/lrec2012/pdf/123_Paper.pdf

[88] Shendy R., “The Limitations of Reading to Young Children in Literary Arabic: The Unspoken Struggle with Arabic Diglossia,” Theory and Practice in Language Studies, vol. 9, no. 2, pp. 123-130, 2019. http://dx.doi.org/10.17507/tpls.0902.01

[89] Shparberg A., “Linguistics Database,” The Charleston Advisor, vol. 23, no. 4, pp. 30-32, 2022. https://doi.org/10.5260/chara.23.4.30

[90] Smith N., Linguistic Structure Prediction, Springer Nature, 2022. https://doi.org/10.1007/978-3-031-02143-5

[91] Staib M., Teh T., Torresquintero A., Mohan D., Foglianti L., Lenain R., and Gao J., “Phonological Features for 0-Shot Multilingual Speech Synthesis,” arXiv Preprint, vol. arXiv:2008.04107, pp. 2942-2946, 2020. https://doi.org/10.48550/arXiv.2008.04107

[92] Sterling J., Jost J., and Bonneau R., “Political Psycholinguistics: A Comprehensive Analysis of the Language Habits of Liberal and Conservative Social Media Users,” Journal of Personality and Social Psychology, vol. 118, no. 4, pp. 805-834, 2020. DOI:10.1037/pspp0000275

[93] Tasheva N., “Exploring the Rich Tapestry of Linguistics: A Comprehensive Overview,” Science and Innovation in the Education System, vol. 2, no. 11, pp. 51-57, 2023. https://doi.org/10.5281/zenodo.10006452

[94] Tatlılıoglu K. and Senchylo-Tatlilioglu N., “Language Development at Early Childhood: An Overview in The Context of Psycholinguistics,” in Proceedings of the 16th Scientific and Practical Conference on Psycholinguistics in a Modern World, Pereiaslav, pp. 283-288, 2021. https://doi.org/10.31470/2706-7904-2021-16- 283-288

[95] Taylor R., Kardas M., Cucurull G., and Scialom T., et al., “Galactica: A Large Language Model for Science,” arXiv Preprint, vol. arXiv:2211.09085v1, pp. 1-58, 2022. https://doi.org/10.48550/arXiv.2211.09085

[96] Torfi A., Shirvani R., Keneshloo Y., Tavaf N., and Fox E., “Natural Language Processing Advancements by Deep Learning: A Survey,” arXiv Preprint, vol. arXiv:2003.01200v4, pp. 1- 23, 2020. https://doi.org/10.48550/arXiv.2003.01200

[97] Torjmen R. and Haddar K., “Tunisian Dialect Agglutination Processing with Finite Transducers,” Computacion y Sistemas, vol. 26, no. 3, pp. 1215-1223, 2022. https://doi.org/10.13053/cys-26-3-4344

[98] Tsujii J., “Computational Linguistics and Natural Language Processing,” in Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, Tokyo, pp. 52-67, 2011. https://dl.acm.org/doi/10.5555/1964799.1964806

[99] Vinson R., Language, Culture and Society, Online, 2022. https://www.bibliotex.com/explore;searchText=R andy%20Vinson;mainSearch=1;themeName=Def ault-Theme

[100] Vocroix L., “Morphology in Micro Linguistics and Macro Linguistics,” Macrolinguistics and Microlinguistics, vol. 2, no. 1, pp. 1-20, 2021. https://doi.org/10.21744/mami.v2n1.11

[101] Wazery Y., Saleh M., Alharbi A., and Ali A., “Abstractive Arabic Text Summarization based on Deep Learning,” Computational Intelligence and Neuroscience, vol. 2022, no. 1, pp. 1-14, 2022. https://onlinelibrary.wiley.com/doi/10.1155/2022/ 1566890

[102] Yagi S., Elnagar A., and Yaghi E., “Arabic Punctuation Dataset,” Data in Brief, vol. 53, pp. 110118, 2024. https://doi.org/10.1016/j.dib.2024.110118

[103] Yule G., The Study of Language, Cambridge University Press, 2017. https://archive.org/details/georgeyulethestudyofla nguage2017cambridgeuniversitypress/page/n27/ mode/2up

[104] Zalmout N., Erdmann A., and Habash N., “Noise- Robust Morphological Disambiguation for Dialectal Arabic,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, pp. 953-964, 2018. https://aclanthology.org/N18- 1087/

[105] Zhao W., Zhou K., Li J., and Tang T., et al., “A Survey of Large Language Models,” arXiv Preprint, vol. arXiv:2303.18223v16, pp. 1-144, Arabic Language Characteristics that Make its Automatic Processing Challenging 831 2023. https://doi.org/10.48550/arXiv.2303.18223

[106] Zlatev J., Zywiczynski P., and Wacewicz S., “Pantomime as the Original Human-Specific Communicative System,” Journal of Language Evolution, vol. 5, no. 2, pp. 156-174, 2020. DOI:10.1093/jole/lzaa006

[107] Zokirov M. and Dadabayeva S., “About the Role of Languages Contacts in the Development of Languages,” Theoretical and Applied Science, vol. 84, no. 4, pp. 687-691, 2020. DOI:10.15863/TAS.2020.04.84.118

[108] Zokirov M. and Zokirova S., “On Researching Phonetic Level of the Languages,” GIS Business, vol. 15, no. 6, pp. 148-154, 2020. https://gisbusiness.org/index.php/gis/article/view/ 20223