Downloads 204

..............................

..............................

Cited by

..............................

Received date April 29

Accepted date 2025; November 27

Innovative Advertising Data Analysis Method:

Author Workflow Design Based on Federated Learning and,

Keywords #Federated learning #large language models #advertising data analysis #privacy‑preserving recommendations #prompt engineering

Abstract This paper provides a novel approach to analyzing advertising datasets by combining Federated Learning (FL) and Large Language Models (LLMs), and offers a systematic workflow for improving the accuracy of advertising recommendation systems. By adopting a FL paradigm, heterogeneous sources of data collectively train models with shared information, thus allowing distributed and privacy-restricted analysis. At the same time, optimized prompts are engineered for LLMs to decode multidimensional features of advertising information to promote ad personalization and intelligence. The main contribution of this paper is a systematic workflow regarding advertising analysis, including data preprocessing, visualization, federated model training, prompt engineering, and strategic generation. At the stage of data analysis, the FL paradigm, combined with visualization methods, supports presentation of user behavior and advertising performance in a multi-angle manner, allowing model optimization with privacy maintenance. Additionally, prompt designs specific to advertising analysis greatly improve LLM’s interpretability, allowing deep analysis of user interests, advertising trend, as well as ad delivery strategy, ultimately leading to highly personalized ad recommendations. Experimental evidence shows remarkable gains in recommendation accuracy, strategy effectiveness, as well as protection of data privacy. In contrast to previous methods that are based on a centralized model, the workflow suggested has a higher degree of freedom in handling different types of datasets in scale and structure. This approach provides not just a smart, privacy-protected solution to advertising analytics, but also a useful paradigm on applying cross-modal data processing and privacy-protected technology to other fields.

References [1] Bengio Y., Ducharme R., Vincent P., and Jauvin C., “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003. https://www.jmlr.org/papers/volume3/bengio03a/ bengio03a.pdf [2] Brown T., Mann B., Ryder N., Subbiah M., and et al., “Language Models are Few-Shot Learners,” in Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, pp. 1877-1901, 2020. https://api.semanticscholar.org/CorpusID:218971783 [3] DeepSeek-AI, DeepSeek-V3, Technical Report, 504 The International Arab Journal of Information Technology, Vol. 23, No. 3, May 2026 2025. https://arxiv.org/pdf/2412.19437 [4] Gai K., Zhu X., Li H., Liu K., and Wang Z., “Learning Piece-Wise Linear Models from Large Scale Data for Ad Click Prediction,” arXiv Preprint, vol. arXiv:1704.05194, pp. 1-12, 2017. https://doi.org/10.48550/arXiv.1704.05194 [5] Gentry C., “Fully Homomorphic Encryption Using Ideal Lattices,” in Proceedings of the 41st Annual ACM Symposium on Theory of Computing, Bethesda, pp. 169-178, 2009. https://doi.org/10.1145/1536414.1536440 [6] Giray L., “Prompt Engineering with ChatGPT: A Guide for Academic Writers,” Annals of Biomedical Engineering, vol. 51, no. 12, pp. 2629- 2633, 2023. https://doi.org/10.1007/s10439-023- 03272-4 [7] Hani A., Tagougui N., and Kherallah M., “Toward Human-Level Understanding: A Systematic Review of Vision-Language Models for Image Captioning,” The International Arab Journal of Information Technology, vol. 23, no. 1, pp. 81-97, 2026. DOI:10.34028/iajit/23/1/8 [8] Kairouz P. and McMahan H., “Advances and Open Problems in Federated Learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1-2, pp. 1-210, 2021. https://doi.org/10.1561/2200000083 [9] Li T., Sahu A., Talwalkar A., and Smith V., “Federated Learning: Challenges, Methods, and Future Directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, 2020. doi:10.1109/MSP.2020.2975749 [10] Li T., Sahu A., Zaheer M., Sanjabi M., and et al., “Federated Optimization in Heterogeneous Networks,” in Proceedings of the Conference on Machine Learning and Systems, Texas, pp. 429- 450, 2020. https://api.semanticscholar.org/CorpusID:59316566 [11] Li Z., Hou Z., Liu H., Li T., and et al., “Federated Learning in Large Model Era: Vision-Language Model for Smart City Safety Operation Management,” in Proceedings of the Companion Proceedings of the ACM Web Conference, pp. 1578-1585, 1578-1585, 2024. https://doi.org/10.1145/3589335.3651939 [12] McMahan B., Moore E., Ramage D., Hampson S., and Arcas B., “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference Artificial Intelligence and Statistics, Florida, pp. 1273-1282, 2017. https://scispace.com/pdf/communication- efficient-learning-of-deep-networks-from- 2s16evj791.pdf [13] Radford A., Narasimhan K., Salimans T., and Sutskever B., Improving Language Understanding by Generative Pre-Training, OpenAI, https://cdn.openai.com/research-covers/language- unsupervised/language_understanding_paper.pdf, Last Visited, 2025. [14] Radford A., Wu J., Child R., Luan D., and et al., Language Models are Unsupervised Multitask Learners, OpenAI Blog, https://storage.prod.researchhub.com/uploads/pap ers/2020/06/01/language-models.pdf. Last Visited, 2025. [15] Roth H., Zephyr M., and Harouni A., Federated Learning with Homomorphic Encryption, NVIDIA Developer Blog, https://developer.nvidia.com/blog/federated- learning-with-homomorphic-encryption, Last Visited, 2025. [16] Tianchi, Ad Display/Click Data on Taobao.com, Alibaba Cloud Tianchi, https://tianchi.aliyun.com/dataset/dataDetail?data Id=56, Last Visited, 2025. [17] Vaswani A., Shazeer, N., Parmar N., Uszkoreit J., and et al., “Attention is all you Need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, California, pp. 5998-6008, 2017. https://doi.org/10.48550/arXiv.1706.03762 [18] White J., Fu Q., Hays S., Sandborn M., and et al., “A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT,” arXiv Preprint, vol. arXiv:2302.11382v1, pp. 1-19., 2023. https://doi.org/10.48550/arXiv.2302.11382 [19] Xie Q., Jiang S., Jiang L., Huang Y., and et al., “Efficiency Optimization Techniques in Privacy- Preserving Federated Learning with Homomorphic Encryption: A Brief Survey,” IEEE Internet Things Journal, vol. 11, no. 14, pp. 24569-24580, 2024. DOI:10.1109/JIOT.2024.3382875 [20] Zhang J., Yang H., Li A., Guo X., and et al., “MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-Tailed Data,” arXiv Preprint, vol. arXiv:2409.06067v2, pp. 1-11, 2024. https://doi.org/10.48550/arXiv.2409.06067 [21] Zhou G., Song C., Zhu X., Ying Fan., and et al., “Deep Interest Network for Click-Through Rate Prediction,” arXiv Preprint, vol. arXiv:1706.06978v4, pp. 1-9, 2017. https://doi.org/10.48550/arXiv.1706.06978 Jialu Li is an undergraduate student at the School of Mathematics and Statistics, Central South University. She has participated in a number of scientific research competitions and social practice programs. Her research interests focus on Large Language Models, Data Analysis, AI+Education, Financial Technology, and Quantitative modeling.

,abstract={This paper provides a novel approach to analyzing advertising datasets by combining Federated Learning (FL) and Large Language Models (LLMs), and offers a systematic workflow for improving the accuracy of advertising recommendation systems. By adopting a FL paradigm, heterogeneous sources of data collectively train models with shared information, thus allowing distributed and privacy-restricted analysis. At the same time, optimized prompts are engineered for LLMs to decode multidimensional features of advertising information to promote ad personalization and intelligence. The main contribution of this paper is a systematic workflow regarding advertising analysis, including data preprocessing, visualization, federated model training, prompt engineering, and strategic generation. At the stage of data analysis, the FL paradigm, combined with visualization methods, supports presentation of user behavior and advertising performance in a multi-angle manner, allowing model optimization with privacy maintenance. Additionally, prompt designs specific to advertising analysis greatly improve LLM’s interpretability, allowing deep analysis of user interests, advertising trend, as well as ad delivery strategy, ultimately leading to highly personalized ad recommendations. Experimental evidence shows remarkable gains in recommendation accuracy, strategy effectiveness, as well as protection of data privacy. In contrast to previous methods that are based on a centralized model, the workflow suggested has a higher degree of freedom in handling different types of datasets in scale and structure. This approach provides not just a smart, privacy-protected solution to advertising analytics, but also a useful paradigm on applying cross-modal data processing and privacy-protected technology to other fields.},
keywords={Federated learning, large language models, advertising data analysis, privacy‑preserving recommendations, prompt engineering},
ISSN={2413-9351},
month={Jan}}