The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Swin Transformer-Enhanced UAV Surveillance: A Multi-Modal Feature Optimization for High-Precision Road Vehicles Detection

Unmanned Aerial Vehicles (UAVs) have emerged as powerful platforms for intelligent traffic monitoring due to their high-resolution imaging and wide-area coverage. This paper introduces a robust vehicle detection and classification framework that employs a multi-modal feature optimization strategy to enhance detection accuracy in aerial environments. The proposed pipeline begins with Histogram Equalization for contrast enhancement, followed by semantic segmentation using DeepLabV3+ to accurately isolate vehicle regions. YOLOv10, a state-of-the-art real-time object detector, is then applied to localize vehicles with high precision. For feature extraction, we integrate three complementary modalities: Wavelet Transform Features (capturing multi-resolution frequency details), Gabor Filters (highlighting directional textures), and Speeded-Up Robust Features (SURF) (detecting keypoints and descriptors). A Genetic Algorithm (GA) is employed to optimize the extracted features by selecting the most discriminative subset, thus reducing redundancy. Final classification is performed using the Swin Transformer, a vision transformer that utilizes shifted window self-attention to model long-range spatial dependencies effectively. Experimental evaluations on two UAV benchmark datasets, Roundabout Aerial Images and VAID which demonstrate the superiority of our method, achieving classification accuracies of 97.71% and 98.57%, respectively. These results demonstrate the effectiveness, scalability, and real-world applicability of our approach in UAV-based vehicle monitoring, contributing to the advancement of autonomous aerial surveillance systems for intelligent transportation analytics and enhanced situational awareness in smart city applications.

 

[1] Al Mokhtar Z. and Dawwd S., “3D VAE Video Prediction Model with Kullback Leibler Loss Enhancement,” The International Arab Journal of Information Technology, vol. 21, no. 5, pp. 879- 888, 2024. DOI:10.34028/iajit/21/5/9

[2] Bhosale B., Kayastha V., and Harpale V., “Feature Extraction Using SURF Algorithm for Object Recognition,” International Journal of Technical Research and Applications, vol. 2, no. 4, pp. 197- 199, 2014. https://www.ijtra.com/view.php-paper- feature-extraction-using-surf-algorithm-for- object-recognition.pdf

[3] Bilik S. and Horak K., “SIFT and SURF-Based Feature Extraction for Anomaly Detection,” arXiv Preprint, vol. arXiv:2203.13068, pp. 1-7, 2022. https://arxiv.org/pdf/2203.13068

[4] Chen Y., Feng J., Liu J., Pang B., and et al., “Detection and Classification of Lung Cancer Cells Using Swin Transformer,” Journal of Cancer Therapy, vol. 13, no. 7, pp. 464-475, 2022, https://www.scirp.org/journal/paperinformation?p aperid=118642

[5] Cheng X. and Zhang P., “Enhanced Soccer Training Simulation Using Progressive Wasserstein GAN and Termite Life Cycle Optimization in Virtual Reality,” The International Arab Journal of Information Technology, vol. 21, no. 4, pp. 549-559, 2024. DOI: 10.34028/iajit/21/4/1

[6] Ghazali K., Mansor M., Mustafa M., and Hussain A., “Feature Extraction Technique Using Discrete Wavelet Transform for Image Classification,” in Proceedings of the 5th Student Conference on Research and Development, Selangor, pp. 1-4, 2007. https://doi.org/10.1109/SCORED.2007.4451366

[7] Gupta P., Pareek B., Singal G., and Rao D., “Edge Device-based Military Vehicle Detection and Classification from UAV,” Multimedia Tools and Applications, vol. 81, no. 14, pp. 19813-19834, 2022. DOI: 10.1007/s11042-021-11242-y

[8] Hamzenejadi M. and Mohseni H., “Real-Time Vehicle Detection and Classification in UAV Imagery Using Improved YOLOv5,” in Proceedings of the 12th International Conference on Computer and Knowledge Engineering, Mashhad, pp. 231-236, 2022. DOI: 10.1109/ICCKE57176.2022.9960099

[9] Homaifar A., Qi C., and Lai S., “Constrained Optimization via Genetic Algorithms,” Simulation, vol. 62, no. 4, pp. 242-253, 1994. https://doi.org/10.1177/003754979406200405

[10] Hussein F., Kharma N., and Ward R., “Genetic Algorithms for Feature Selection and Weighting: A Review and Study,” in Proceedings of the 6th International Conference on Document Analysis and Recognition, Seattle, pp. 1240-1244, 2001. https://doi.org/10.1109/ICDAR.2001.953980

[11] Jamali A. and Mahdianpari M., “Swin Transformer and Deep Convolutional Neural Networks for Coastal Wetland Classification using Sentinel-1, Sentinel-2, and LiDAR Data,” Remote Sensing, vol. 14, no. 2, pp. 359, 2022. https://doi.org/10.3390/rs14020359

[12] Kanistras K., Martins G., Rutherford M., and Valavanis P., “A Survey of Unmanned Aerial Vehicles (UAVs) for Traffic Monitoring,” in Proceedings of the International Conference on Unmanned Aircraft Systems, Atlanta, pp. 221-234, 2013. DOI: 10.1109/ICUAS.2013.6564694

[13] Kumar B., Kumar A., and Pandey R., “MF- MSCNN: Multi-Feature based Multi-Scale Convolutional Neural Network for Image Dehazing via Input Transformation,” IETE Journal of Research, vol. 71, no. 5, pp. 1527-1546, 2025. DOI: 10.1080/03772063.2025.2462789

[14] Li S., Yang X., Lin X., Zhang Y., and Wu J., “Real- Time Vehicle Detection from UAV Aerial Images Swin Transformer-Enhanced UAV Surveillance: A Multi-Modal Feature Optimization ... 153 Based on Improved YOLOv5,” Sensors, vol. 23, no. 12, pp. 1-18, 2023. DOI: 10.3390/s23125634

[15] Li W., Mao K., Zhang H., and Chai T., “Selection of Gabor Filters for Improved Texture Feature Extraction,” in Proceedings of the IEEE International Conference on Image Processing, Hong Kong, pp. 361-364, 2010. https://doi.org/10.1109/ICIP.2010.5653278

[16] Lin H., Tu K., and Li C., “VAID: An Aerial Image Dataset for Vehicle Detection and Classification,” IEEE Access, vol. 8, pp. 212209-212219, 2020. DOI: 10.1109/ACCESS.2020.3039798

[17] Mustafa W. and Abdul Kader M., “A Review of Histogram Equalization Techniques in Image Enhancement Application,” Journal of Physics: Conference Series, vol. 1019, pp. 1-8, 2018. DOI: 10.1088/1742-6596/1019/1/012026

[18] Puertas E., Heras G., Andres J., and Soriano J., “Dataset: Roundabout Aerial Images for Vehicle Detection,” Data, vol. 7, no. 4, pp. 1-11, 2022. DOI: 10.3390/data7040047

[19] Yurtkulu S., Sahin Y., and Unal G., “Semantic Segmentation with Extended DeepLabv3 Architecture,” in Proceedings of the 27th Signal Processing and Communications Applications Conference, Sivas, pp. 1-4, 2019. DOI: 10.1109/SIU.2019.8806539

[20] Zhang L., “Enterprise Employee Work Behavior Recognition Method Based on Faster Region- Convolutional Neural Network,” The International Arab Journal of Information Technology, vol. 22, no. 2, pp. 291-302, 2025. https://doi.org/10.34028/iajit/22/2/7

[21] Zhao Q. and Zhang L., “ECG Feature Extraction and Classification Using Wavelet Transform and Support Vector Machines,” in Proceedings of the International Conference on Neural Networks and Brain, Beijing, pp. 1089-1092, 2005. https://doi.org/10.1109/ICNNB.2005.1614807