Downloads 1k

..............................

Views 3k

..............................

Cited by

..............................

Received date May 31, 2020

Accepted date accept January 6, 2021

Text Summarization Technique for Punjabi Language Using Neural Networks

Author Arti Jain1, Anuja Arora1, Divakar Yadav2, Jorge Morato3, and Amanpreet Kaur1 1Department of Computer Science and Engineering, Jaypee Institute of Information Technology, India 2Department of Computer Science and Engineering, National Institute of Information Technology, India 3Department of Computer Science and Engineering, Universidad Carlos III de Madrid, Spain,

Keywords #Extractive method #Indian languages corpora initiative #natural language processing #neural networks #Punjabi language #text summarization

Abstract In the contemporary world, utilization of digital content has risen exponentially. For example, newspaper and web articles, status updates, advertisements etc. have become an integral part of our daily routine. Thus, there is a need to build an automated system to summarize such large documents of text in order to save time and effort. Although, there are summarizers for languages such as English since the work has started in the 1950s and at present has led it up to a matured stage but there are several languages that still need special attention such as Punjabi language. The Punjabi language is highly rich in morphological structure as compared to English and other foreign languages. In this work, we provide three phase extractive summarization methodology using neural networks. It induces compendious summary of Punjabi single text document. The methodology incorporates pre-processing phase that cleans the text; processing phase that extracts statistical and linguistic features; and classification phase. The classification based neural network applies an activation function- sigmoid and weighted error reduction-gradient descent optimization to generate the resultant output summary. The proposed summarization system is applied over monolingual Punjabi text corpus from Indian languages corpora initiative phase-II. The precision, recall and F-measure are achieved as 90.0%, 89.28% an 89.65% respectively which is reasonably good in comparison to the performance of other existing Indian languages’ summarizers.

References

[1] Al-Abdallah R. and Al-Taani A., “Arabic Single- Document Text Summarization Using Particle Swarm Optimization Algorithm,” Procedia Computer Science, vol. 117, pp. 30-37, 2017.

[2] Aries A., Zegour D., and Hidouci W., “Automatic Text Summarization: What Has Been Done and What Has to Be Done,” arXiv preprint arXiv:1904.00688, 2019.

[3] Aslam J., Diaz F., Ekstrand-Abueg M., McCreadie R., Pavlu V., and Sakai T., “TREC 2015 Temporal Summarization Track Overview,” in Proceedings of National Institute of Standards and Technology, Gaithersburg MD, 2015.

[4] Dalal V. and Malik L., “Automatic Summarization for Hindi Text Documents Using Bio-Inspired Computing,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 6, no. 4, pp. 682-688, 2017.

[5] Dalal V. and Malik L., “Semantic Graph based Automatic Text Summarization for Hindi Documents Using Particle Swarm Optimization,” in Proceedings of the International Conference on Information and Communication Technology for Intelligent Systems, Ahmedabad, pp. 284-289, 2017.

[6] Gill M., Lehal G., and Joshi S., “Part of Speech Tagging for Grammar Checking of Punjabi,” The Linguistic Journal, vol. 4, no. 1, pp. 6-21, 2009.

[7] Gulati A. and Sawarkar S., “A Novel Technique for Multi-Document Hindi Text Summarization,” in Proceedings of International Conference on Nascent Technologies in Engineering, pp. 1-6, Vashi, 2017.

[8] Gupta V., “Hybrid Algorithm for Multilingual Summarization of Hindi and Punjabi Documents,” in Proceedings of Mining Intelligence and Knowledge Exploration, Tamil Nadu, pp. 717-727, 2013.

[9] Gupta V. and Kaur N., “A Novel Hybrid Text Summarization System for Punjabi Text,” Cognitive Computation, vol. 8, no. 2, pp. 261- 277, 2016.

[10] Gupta V. and Lehal G., “A Survey of Text Summarization Extractive Techniques,” Journal of Emerging Technologies in Web Intelligence, vol. 2, no. 3, pp. 258-268, 2010.

[11] Gupta V. and Lehal G., “Automatic Keywords Extraction for Punjabi Language,” International Journal of Computer Science Issues, vol. 8, no. 5, pp. 327-331, 2011.

[12] Gupta V., “Automatic Normalization of Punjabi Words,” International Journal of Engineering Trends and Technology (IJETT), vol. 6, no. 7, pp. 353-357, 2013.

[13] Gupta V. and Lehal G., “Automatic Text Summarization System for Punjabi Language,” Journal of Emerging Technologies in Web Intelligence, vol. 5, no. 3, pp. 257-271, 2013.

[14] Gupta V. and Lehal G., “Complete Pre- Processing Phase of Punjabi Text Extractive Summarization System,” in Proceedings of the COLING 2012: Demonstration Papers, Mumbai, pp. 199-206, 2012.

[15] Gupta V. and Lehal G., “Named Entity Recognition for Punjabi Language Text Summarization,” International Journal of Computer Applications, vol. 33, no. 3, pp. 28-32, 2011.

[16] Gupta V. and Lehal G., “Preprocessing Phase of Punjabi Language Text Summarization,” in Proceedings of the International Conference on Information Systems for Indian Languages, Patiala, pp. 250-253, 2011.

[17] Houtinezhad M. and Ghaffary H., “Improvement Of Imperialist Competitive Algorithm Based on the Cosine Similarity Criterion of Neighboring Objects,” The International Arab Journal of Information Technology, vol. 18, no. 3, pp. 261- 269, 2021.

[18] Hu H., Tang L., Zhang S., and Wang H., “Predicting the Direction of Stock Markets Using Optimized Neural Networks with Google Trends,” Neurocomputing, vol. 285, pp. 188-195, 2018. 816 The International Arab Journal of Information Technology, Vol. 18, No. 6, November 2021

[19] Jain A., Named Entity Recognition for Hindi Language Using NLP Techniques, PhD Thesis, Jaypee Institute of Information Technology, Noida, 2019.

[20] Jain A. and Arora A., “Named Entity Recognition in Hindi Using Hyperspace Analogue to Language and Conditional Random Field,” Pertanika Journal of Science and Technology, vol. 26, no. 4, pp. 1801-1822, 2018.

[21] Jain A. and Arora A., “Named Entity System for Tweets in Hindi Language,” International Journal of Intelligent Information Technology, vol. 14, no. 4, pp. 55-76, 2018.

[22] Jain A., Tayal D., Yadav D., and Arora A., Data Visualization and Knowledge Engineering, Springer, 2020.

[23] Jain A., Yadav D., and Arora A., “Particle Swarm Optimization for Punjabi Text Summarization,” International Journal of Operations Research and Information Systems, vol. 12, no. 3, pp. 1-17, 2021.

[24] Jain A., Yadav D., and Tayal D., “NER for Hindi Language Using Association Rules,” in Proceedings of the International Conference on Data Mining and Intelligent Computing, New Delh, pp. 1-5, 2014.

[25] Jayashree R., Srikanta K., and Sunny K., “Document Summarization in Kannada Using Keyword Extraction,” in Proceedings of the American Institute of Aeronautics and Astronautics AIAA, pp. 121-127, 2011.

[26] Kanapala A., Pal S., and Pamula R., “Text Summarization from Legal Documents: A Survey,” Artificial Intelligence Review, vol. 51, no. 1, pp. 371-402, 2019.

[27] Kaur A., Josan G., and Kaur J., “Named Entity Recognition for Punjabi: A Conditional Random Field Approach,” in Proceedings of the 7th International Conference on Natural Language Processing, India, 2009.

[28] Kaur A., Singh P., and Kaur K., “Punjabi Dialects Conversion System for Majhi, Malwai and Doabi Dialects,” in Proceedings of the 8th International Conference on Computer Modelling and Simulation, New York, pp. 125- 128, 2017.

[29] Kaur J. and Saini J., “Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle,” in Proceedings of the ACM Symposium on Women in Research, New York, pp. 32-37, 2016.

[30] Kaur R. and Sharma S., “Semi-Automatic Domain Ontology Graph Generation System in Punjabi,” in Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, New York, pp. 1-5, 2016.

[31] Kaur R., Sharma R., Preet S., and Bhatia P., “Punjabi Wordnet Relations and Categorization of Synsets,” in Proceedings of the 3rd National Workshop on IndoWordNet Under the Aegis of the 8th International Conference on Natural Language Processing, Kharagpur, 2010.

[32] Krail N. and Gupta V., “Domain Based Classification of Punjabi Text Documents Using Ontology and Hybrid-Based Approach,” in Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing, Mumbai, pp. 109-122, 2012.

[33] Kumar K. and Yadav D., “An Improvised Extractive Approach to Hindi Text Summarization,” in Proceedings of Information Systems Design and Intelligent Applications, India, pp. 291-300, 2015.

[34] Kumar K., Yadav D., and Sharma A., “Graph Based Technique for Hindi Text Summarization,” in Information Systems Design and Intelligent Applications, India, pp. 301-310, 2015.

[35] Lin J., Roegiest A., Tan L., McCreadie R., Voorhees E., and Diaz F., “Overview of the TREC 2016 Real-Time Summarization,” in Proceedings of the Text REtrieval Conference TREC, Gaithersburg, pp. 15-18, 2016.

[36] Liu Y. and Lapata M., “Text Summarization with Pretrained Encoders,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, pp. 3728-3738, 2019.

[37] Mohd M., Jan R., and Shah M., “Text Document Summarization Using Word Embedding,” Expert Systems with Applications, vol. 143, pp. 112958, 2020.

[38] Nasser I. and Abu-Naser S., “Lung Cancer Detection Using Artificial Neural Network,” International Journal of Engineering and Information Systems, vol. 3, no. 3, pp. 17-23, 2019.

[39] Patel A., Siddiqui T., and Tiwary U., “A Language Independent Approach to Multilingual Text Summarization,” in Proceedings of the Conference RIAO2007, Pittsburgh PA, Paris, pp. 123-132, 2007.

[40] Prudhvi K., Chowdary A., Reddy P., and Prasanna P., “Text Summarization Using Natural Language Processing,” in Proceedings of Intelligent System Design, pp. 535-547, 2021.

[41] Punjabi Monolingual Text Corpus: http://www.tdil- dc.in/index.php?option=com_download&task=sh owresourceDetails&toolid=1890&lang=en, Last Visited, 2020.

[42] Sarkar K., “Bengali Text Summarization by Sentence Extraction,” in Proceedings of the Text Summarization Technique for Punjabi Language Using Neural Networks 817 International Conference on Business and Information Management, pp. 233-245, 2012.

[43] See A., Liu P., and Manning C., “Get To The Point: Summarization with Pointer-Generator Networks,” arXiv preprint arXiv:1704.04368, 2017.

[44] Singh A. and Singh P., “Punjabi Dialects Conversion System for Malwai and Doabi Dialects,” Indian Journal of Science and Technology, vol. 8, no. 27, pp. 1-7, 2015.

[45] Singh S., Kumar A., Mangal A., and Singhal S., “Bilingual Automatic Text Summarization Using Unsupervised Deep Learning,” in Proceedings of International Conference on Electrical, Electronics, and Optimization Techniques, Chennai, pp. 1195-1200, 2016.

[46] Subramaniam M. and Dalal V., “Test Model For Rich Semantic Graph Representation for Hindi Text Using Abstractive Method,” International Research Journal of Engineering and Technology, vol. 2, no. 2, pp. 113-116, 2015.

[47] TDIL: http://www.tdil-dc.in/index.php?lang=en, last visited, 2020.

[48] Tsenov G. and Mladenov V., “Speech Recognition Using Neural Networks,” in Proceedings of 10th Symposium on Neural Network Applications in Electrical Engineering, Belgrade, pp. 181-186, 2010.

[49] Wyllys R., “Extracting and Abstracting by Computer,” Automated Language Processing, pp. 127-179, 1967.

[50] Xu H., Li K., Wang Y., Wang J., Kang S., Chen X., Povey D., and Khudanpur S., “Neural Network Language Modelling With Letter-Based Features and Importance Sampling,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, pp. 6109-6113, 2018.

[51] Yong S., Abidin A., and Chen Y., “A Neural- Based Text Summarization System,” WIT Transactions on Information and Communication Technologies, vol. 37, pp. 185-192, 2006. 818 The International Arab Journal of Information Technology, Vol. 18, No. 6, November 2021 Arti Jain has received her Ph.D, in Computer Science and Engineering (2019) from Jaypee Institute of Information Technology, Noida, India. She has more than 18 years of teaching experience. Currently, she is working as Assistant Professor (Sr. Grade), CSE, JIIT, Noida, India. She is a member of IEEE, INSTICC, IAENG, IFERP and Life Member of TERA. She is reviewer of several reputed International Journals and TPC member of International Conferences. She has more than 20 publications in peer-reviewed International Journals, Book Chapters and International Conferences. Her research interests include Natural Language Processing, Machine Learning, Data Science, Deep Learning, Social Media Analysis, Big Data and Data Mining. Anuja Arora is working as Associate Professor in the Department of Computer Science and Engineering at Jaypee Institute of Information Technology, Noida, India. She is having academic experience of 15 years and industry experience of 1.5 years. She is Senior IEEE Member, ACM Member, SIAM Member, INSTICC and Life Member of IAENG. She is also Vice-Chair for the Delhi ACM-W Chapter. She has more than 70 research papers in peer-reviewed International Journals, Book Chapters, and International Conferences. She has supervised 3 Ph.D. thesis and 2 more are in progress. Her research interest includes Data Science, Deep Learning, Information Retrieval Systems, Machine Learning, Social Network Analysis, Software Testing and Web Intelligence. She is reviewer of many reputed and peer-reviewed IEEE transactions- TKDE, TNSM, IEEE Transaction of Cybernetics, Springer, IGI Global, Inderscience, and De Gruyter Journals. She has guided more than 17 M.Tech Thesis and around 100 B.Tech Projects. Divakar Yadav (SM’2017) is working as Associate Professor in the Department of Computer Science and Engineering at National Institute of Technology (NIT), Hamirpur (HP), India. He did his undergraduate in Computer Science and Engineering (1999), Post Graduate in Information Technology (2005) and PhD in Computer Science and Engineering (2010). He is Senior Member IEEE. He has also worked as Post-Doctoral Fellow at University of Carlos-III, Madrid, Spain from 2011-2012. He has supervised 5 PhD thesis and 22 Master dissertations. He has more than 20 years of teaching and research experience. He has published 85 research articles in reputed International Journals and Conference Proceedings. His area of research is Machine Learning and Information Retrieval. Jorge Morato has received his B.S. degree in Sciences from University of Alcala in 1992, and Ph.D. degree in Information Sciences from University Carlos III in 1999. Since 2000, he is researcher and professor with the Computer Science Department, Carlos III University, Spain. From 1991- 2016, he has had grants and fellowships from the Spanish National Research Council and the Spanish Government. He is the author of more than one hundred papers and book chapters. His research interests include NLP applications, Information Retrieval Algorithms, Web Positioning and Readability, and Knowledge Organization Systems. Amanpreet Kaur has received her B.E. degree in CSE from Guru Nanak Dev University, Amritsar, India, 2006, M.E. degree in Computer Science Engineering from NIT Jalandhar, 2009, and Ph.D. in Computer Science and Engineering from Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India, 2020. She is currently working as Assistant Professor, CSE, Jaypee Institute of Information Technology, Noida (UP), India. Her research interests include Wireless Sensor Networks, Information Security and Performance Analysis.