
Hybrid Transformer Framework for Domain Generated Algorithms Detection: A Fusion of Textual and Numeric Features
A Domain Generation Algorithm (DGA) is a program that generates a large number of spam domain names, cyber criminals use domain generation algorithms to initiate a malware attack, making it important for cybersecurity teams to identify the DGA domains and strengthen the organization’s defense against threats. This paper designs a state-of-the-art artificial Intelligence model for DGI domain detection, which is developed using an innovative fusion of semantic and statistical modalities. The textual features are processed using the BERT text transformer, while the numerical features are processed using a simple Multi-Layer Perceptron Network. The model is applied to a dataset of 160,000 Alexa domains labeled as DGA or Legit. The evaluation of the approach is done based on different measures such as accuracy, precision, recall F1-score and confusion matrix which showed a promising result for accurately detecting the DGA domains. Our model achieved an accuracy of 0.9932. The result demonstrated the effectiveness of the model in classifying the domain names and the ability to generalize the model to other unseen domains and many other real-world scenarios.
[1] Al Messabi K., Aldwairi M., Al Yousif A., Thoban A., and Belqasmi F., “Malware Detection Using DNS Records and Domain Name Features,” in Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, Amman, pp. 1-7, 2018. https://doi.org/10.1145/3231053.3231082
[2] Alawneh H. and A. Hasasneh. “Survival Prediction of Children after Bone Marrow Transplant Using Machine Learning Algorithms,” The International Arab Journal of Information Technology, vol. 21, no. 3, pp. 394-407, 2024. https://doi.org/10.34028/iajit/21/3/4
[3] Ali A., Naeem S., Anam S., and Ahmed M., “Entropy in Information Theory from Many Perspectives and Various Mathematical Models,” Journal of Applied and Emerging Sciences, vol. 12, no. 2, pp. 156-165, 2022. https://doi.org/10.36785/jaes.122548
[4] Al-Kababji A., Bensaali F., and Dakua S., “Scheduling Techniques for Liver Segmentation: ReduceLRonPlateau Vs OneCycleLR,” Intelligent Systems and Pattern Recognition, pp. 204-212, 2022. https://doi.org/10.1007/978-3- 031-08277-1_17
[5] Avasthi S., Chauhan R., and Acharjya D., “Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing,” in Proceedings of the Second International Conference on Information Management and Machine Intelligence: ICIMMI, pp. 21-32, 2020. https://doi.org/10.1007/978-981-15-9689-6_3
[6] Chen J., Qiu J., and Chen Y., “A Hybrid DGA DefenseNet for Detecting DGA Domain Names Bbased on FastText and Deep Learning Techniques,” Computers and Security, vol. 150, pp. 104232, 2025. https://doi.org/10.1016/j.cose.2024.104232
[7] Devlin J., Chang M., Lee K., and Toutanova K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv Preprint, vol. arXiv: 1810.04805, pp. 1-16, 2019. https://doi.org/10.48550/arXiv.1810.04805
[8] Ding L., Du P., Hou H., Zhang J., and et al., “Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding,” Big Data Research, vol. 33, 2023. https://doi.org/10.1016/j.bdr.2023.100395
[9] Fadziso T., Thaduri U., Dekkati S., Desamsetti H., and Ballamudi V., “Evolution of the Cyber Security Threat: An Overview of the Scale of Cyber Threat,” Digitalization and Sustainability Review, vol. 3, no. 1, pp. 1-12, 2023. https://upright.pub/index.php/dsr/article/view/79
[10] Highnam K., Puzio D., Luo S., and Jennings N., “Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning,” SN Computer Science, vol. 2, no. 110, pp. 1-17, 2021. https://doi.org/10.1007/s42979-021-00507-w
[11] Hwang C., Kim H., Lee H., and Lee T., “Effective DGA-Domain Detection and Classification with TextCNN and Additional Features,” Electronics, vol. 9, pp. 1-18, 2020. https://doi.org/10.3390/electronics9071070
[12] Keyword Research, Competitive Analysis, and Website Ranking, Alexa, https://www.alexa.com/, Last Visited, 2025.
[13] Liang J., Shuhui C., Wei Z., Shuang Z., and Ziling W., “HAGDetector: Heterogeneous DGA Domain Name Detection Model,” Computers and Security, vol. 120, pp. 102803, 2022. Hybrid Transformer Framework for Domain Generated Algorithms Detection ... 107 https://doi.org/10.1016/j.cose.2022.102803
[14] Merchan E., Brizuela R., and Carvajal S., “Comparing BERT Against Traditional Machine Learning Models in Text Classification,” Journal of Computational and Cognitive Engineering, vol. 2, no. 4, pp. 352-356, 2023. https://doi.org/10.47852/bonviewJCCE3202838
[15] Nadagoudar R. and Ramakrishna M., “Algorithmically Generated Domain Names Detection Using Gated Recurrent Unit Deep Learning,” Journal of Electrical Systems, vol. 20, no. 7, pp. 469-481, 2024. https://doi.org/10.52783/jes.3342
[16] Nadagoudar R. and Ramakrishna M., “DGA Domain Name Detection and Classification Using Deep Learning Models,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 7, pp. 306-315, 2024. https://dx.doi.org/10.14569/IJACSA.2024.01507 30
[17] Namgung J., Son S., and Moon Y., “Efficient Deep Learning Models for DGA Domain Detection,” Security and Communication Networks, vol. 2021, pp. 1-15, 2021. https://doi.org/10.1155/2021/8887881
[18] Shahzad H., Sattar A., and Skandaraniyam J., “DGA Domain Detection Using Deep Learning,” in Proceedings of the IEEE 5th International Conference on Cryptography, Security and Privacy, Zhuhai, pp. 139-143, 2021. https://doi.org/10.1109/CSP51677.2021.9357591
[19] Singh J. and Banerjee R., “A Study on Single and Multi-Layer Perceptron Neural Network,” in Proceedings of the Third International Conference on Computing Methodologies and Communication, Erode, pp. 35-40, 2019. https://doi.org/10.1109/ICCMC.2019.8819775
[20] Soleymani A. and Arabgol F., “A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques,” Journal of Computer Networks and Communications, vol. 2021, pp. 1-13, 2021. https://doi.org/10.1155/2021/4767388
[21] Sood A. and Zeadally S., “A Taxonomy of Domain-Generation Algorithms,” IEEE Security and Privacy, vol. 14, pp. 46-53, 2016. https://doi.org/10.1109/MSP.2016.76
[22] Stampar M. and Fertalj K., “Applied Machine Learning in Recognition of DGA Domain Names,” Computer Science and Information Systems, vol. 19, no. 1, pp. 205-227, 2022. https://doi.org/10.2298/CSIS210104046S
[23] Suen C., “N-Gram Statistics for Natural Language Understanding and Text Processing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, pp. 164-172, 1979. https://doi.org/10.1109/TPAMI.1979.4766902
[24] Sun X. and Liu Z., “Domain Generation Algorithms Detection with Feature Extraction and Domain Center Construction,” Plos One, vol. 18, pp. 1-25, 2023. https://doi.org/10.1371/journal.pone.0279866
[25] Tang J., Guan Y., Zhao S., Wang H., and Chen Y., “DGA Domain Detection Based on Transformer and Rapid Selective Kernel Network,” Electronics, vol. 13, no. 24, pp. 1-16, 2023. https://www.mdpi.com/2079-9292/13/24/4982#
[26] Thakur K., Alqahtani H., and Kumar G., “An Intelligent Algorithmically Generated Domain Detection System,” Computers and Electrical Engineering, vol. 92, pp. 107129, 2021. https://doi.org/10.1016/j.compeleceng.2021.107129
[27] Tran D., Mac H., Van T., Tran H., and Giang N., “A LSTM based Framework for Handling Multiclass Imbalance in DGA Botnet Detection,” Neurocomputing, vol. 275, pp. 2401-2413, 2018. https://doi.org/10.1016/j.neucom.2017.11.018
[28] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., and et al., “Attention Is All You Need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, pp. 6000-6010, 2017. https://dl.acm.org/doi/10.5555/3295222.3295349
[29] Vij P., Nikam S., and Bhatia A., “Detection of Algorithmically Generated Domain Names Uing LSTM,” in Proceedings of the International Conference on COMmunication Systems and NETworkS, Bengaluru, 2020. https://doi.org/10.1109/COMSNETS48256.2020. 9027342
[30] Vranken H. and Alizadeh H., “Detection of DGA- Generated Domain Names with TF-IDF,” Electronics, vol. 11, no. 3, pp. 1-28, 2022. https://doi.org/10.3390/electronics11030414
[31] Wang C., Nulty P., and Lillis D., “A Comparative Study on Word Embeddings in Deep Learning for Text Classification,” in Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, Seoul Republic, pp. 37-46, 2020. https://doi.org/10.1145/3443279.3443304
[32] Wang Y., Pan R., Wang Z., and Li L., “A Classification Method Based on CNN-BiLSTM for Difficult Detecting DGA Domain Name,” in Proceedings of the IEEE 13th International Conference on Electronics Information and Emergency Communication, Beijing, pp. 17-21, 2023. https://doi.org/10.1109/ICEIEC58029.2023.1020 0702
[33] Wang Z., Jia Z., and Zhang B., “A Detection Scheme for DGA Domain Names Based on SVM,” in Proceedings of the International Conference on Mathematics, Modelling, Simulation and Algorithms, Chengdu, pp. 257- 263, 2018. https://doi.org/10.2991/mmsa- 108 The International Arab Journal of Information Technology, Vol. 23, No. 1, January 2026 18.2018.58
[34] Wong A., Detecting Domain-Generation Algorithm Based Fully-Qualified Domain Names with Shannon Entropy, Technical Report, 2023. file:///C:/Users/acit2k/Downloads/2304.07943v1.pdf
[35] Xie Z., Sato I., and Sugiyama M., “Stable Weight Decay Regularization,” arXiv Preprint, vol. abs/2011-11152, pp. 1-18, 2020. https://openreview.net/forum?id=YzgAOeA67xX
[36] Zhao D., Li H., Sun X., and Tang Y., “Detecting DGA-based Botnets Through Effective Phonics- Based Features,” Future Generation Computer Systems, vol. 143, pp. 105-117, 2023. https://doi.org/10.1016/j.future.2023.01.027
[37] Zhao K., Guo W., Qin F., and Wang X., “D3- SACNN: DGA Domain Detection with Self- Attention Convolutional Network,” IEEE Access, vol. 10, pp. 69250-69263, 2021. https://doi.org/10.1109/ACCESS.2021.3127913
[38] Zhou Z. and Zhu L., “Research on Domain Generation Algorithms and their Detection,” International Journal of Science, vol. 10, no. 11. pp. 63-69, 2023. http://www.ijscience.org/download/IJS-10-11-63- 69.pdf