A Hybrid Technique for Annotating Book Tables Asima Latif1, Shah Khusro1, Irfan Ullah1, and Nasir Ahmad2 1Department of Computer Science, University of Peshawar, Pakistan 2Department of Computer Systems Engineering, University of Engineering and Technology Peshawar,
Table extraction is usually complemented with the table annotation to find the hidden semantics in a particular
piece of document or a book. These hidden semantics are determined by identifying a type for each column, finding the
relationships between the columns, if any, and the entities in each cell. Though used for the small documents and web-pages,
these approaches have not been extended to the table extraction and annotation in the book tables. This paper focuses on
detecting, locating and annotating entities in book tables. More specifically it contributes algorithms for identifying and
locating the tables in books and annotating the table entities by using the online knowledge source DBpedia Spotlight. The
missing entities from the DBpedia Spotlight are then annotated using Google Snippets. It was found that the combined results
give higher accuracy and superior performance over the use of DBpedia alone. The approach is a complementary one to the
existing table annotation approaches as it enables us to discover and annotate entities that are not present in the catalogue.
We have tested our scheme on Computer Science books and got promising results in terms of accuracy and performance.
[39] Zanibbi R., Blostein D., and Cordy J., A Survey of Table Recognition, Document Analysis and Recognition, vol. 7, no. 1, pp. 1-16, 2004. Asima Latif obtained her BS and MS degrees in Computer Science from the Department of Computer Science, University of Peshawar, Pakistan. Her research interests include information retrieval, information extraction, information semantics and search engines. Shah Khusro received his Ph.D. degree from Vienna University of Technology, Vienna, Austria. He is currently working as Professor at the Department of Computer Science, University of Peshawar, Pakistan. His research interests include Web Semantics, Web Engineering, Information Retrieval, Web based Systems, Ambient Assisted Living, and Mobile Technology for People with Special Needs. Irfan Ullah received his MS degree in Web Engineering from the Department of Computer Science, University of Peshawar, Pakistan in 2014. He is now pursuing his PhD from the same institute. His research interests include Web Semantics, Linked Open Data, Information Retrieval, Web Engineering and Digital Libraries. He is also working as Assistant Professor at Shaheed Benazir Bhutto University, Sharingal, Pakistan. Nasir Ahmad received his PhD degree from Loughborough University, UK. Currently he is working as Assistant Professor at the Department of Computer Systems Engineering, University of Engineering and Technology Peshawar, Pakistan. His Research interests include Speech and Video Processing and Digital Signal Processing, Pattern Recognition, and Machine Learning.