Downloads 309

..............................

Views 1k

..............................

Cited by

..............................

Received date November 17, 2013

Accepted date June 23, 2014

Texts Semantic Similarity Detection Based Graph

Author Department of Computer Engineering, Shahid Beheshti University, Iran ,

Keywords #

Abstract Similarity of text documents is important to analyz e and extract useful information from text document s and generation of the appropriate data. Several cases o f lexical matching techniques offered to determine the similarity between documents that have been successful to a certain li mit and these methods are failing to find the seman tic similarity between two texts. Therefore, the semantic similarity appro aches were suggested, such as corpus-based methods and knowledge based methods e.g., WordNet based methods. This paper, of fers a new method for Paraphrase Identification (PI) in order to, measuring the semantic similarity of texts using an idea of a graph. We intend to contribute to the or der of the words in sentence. We offer a graph based algorithm with spe cific implementation for similarity identification that makes extensive use of word similarity information extracted from WordN et. Experiments performed on the Microsoft research paraphrase corpus and we show our approach achieves appropriate perfo rmance.

References

[1] Bhagat R., Hovy E., and Patwardhan S., Acquiring Paraphrases From Text Corpora, in Proceedings of the 5 th International Conference on Knowledge Capture , New York, USA, pp. 161-168, 2009.

[2] Dagan I., Glickman O., and Magnini B., The Pascal Recognising Textual Entailment Challenge, in Proceedings of the 1 st PASCAL Machine Learning Challenges Workshop , Southampton, UK, pp. 177-190, 2006.

[3] Dolan B., Quirk C., and Brockett C., Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources, in Proceedings of the 20 th International Conference on Computational Linguistics , NJ, USA, pp. 350-356, 2004.

[4] Elberrichi Z. and Abidi K., Arabic Text Categorization: A Comparative Study of Different Representation Modes, the International Arab Journal of Information Technology , vol. 9, no. 5, pp. 465-470, 2012.

[5] Fernando S. and Stevenson M., A Semantic Similarity Approach to Paraphrase Detection, in Proceedings of the 11 th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics , Oxford, UK, pp. 45- 52, 2008.

[6] Indurkhya N. and Damerau F., Handbook of Natural Language Processing , CRC Press, 2010.

[7] Jiang J. and Conrath W., Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, in Proceedings of International Conference Research on Computational Linguistics , Taiwan, pp. 1-15, 1997.

[8] Landauer K., Foltz W., and Laham D., An Introduction to Latent Semantic Analysis, Discourse Processes , vol. 25, no. 2, pp. 259-284, 1998.

[9] Leacock C. and Chodorow M., Combining Local Context and Wordnet Sense Similarity for Word Sense Identification, WordNet: An Electronic Lexical Database , Publisher: MIT Press, 2013.

[10] Lesk M., Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, in Proceedings of the 5 th Annual International Conference on Systems Documentation , New York, USA, pp. 24-26, 1986.

[11] Lin D., An Information-Theoretic Definition of Similarity, in Proceedings of the 5 th International Conference on Machine Learning , California, USA, pp. 296-304, 1998.

[12] Madnani N., Tetreault J., and Chodorow M., Re-examining Machine Translation Metrics for Paraphrase Identification, in Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Montr eal, Canada, pp. 182-190, 2012.

[13] Maximum Matching., available at: http://www.cs.dartmouth.edu/~ac/Teach/CS105- Winter05/Notes/kavathekar-scribe.pdf, last visited 2013.

[14] Mihalcea R., Corley C., and Strapparava C., Corpus-based and Knowledge-based Measures of Text Semantic Similarity, in Proceedings of Texts Semantic Similarity Detection Based Graph Approach 251 the American Association for Artificial Intelligence , Boston, USA, pp. 775-780, 2006.

[15] Pedersen T., Patwardhan S., and Michelizzi J., WordNet::Similarity: Measuring the Relatedness of Concepts, in Proceedings of the 19 th National Conference on Artificial Intelligence , California, USA, pp. 1024-1025, 2004.

[16] Rajkumar A. and Chitra A., Paraphrase Recognition using Neural Network Classification, the International Journal of Computer Application , vol. 1, no. 29, pp. 43-48, 2010.

[17] Ramage D., Rafferty N., and Manning D., Random Walks for Text Semantic Similarity, in Proceedings of Workshop on Graph-based Methods for Natural Language Processing , Pennsylvania, USA, pp. 23-31, 2009.

[18] Resnik P., Using Information Content to Evaluate Semantic Similarity in a Taxonomy , in Proceedings of the 14 th International Joint Conference on Artificial Intelligence , San Francisco, USA pp. 448-453, 2013.

[19] Rus V., McCarthy P., Lintean M., McNamara D., and Graesser A., Paraphrase Identification with Lexico-Syntactic Graph Subsumption, in Proceedings of the 21 st International Florida Artificial Intelligence Research Society Conference , Florida, USA, pp. 201-206, 2008.

[20] Salton G. and Buckley C., Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management , vol. 24, no. 5, pp. 513-523, 1988.

[21] Sparck-Jones K., A Statistical Interpretation of Term Specificity and its Application in Retrieval, the Journal of Documentation , vol. 28, no. 1, pp. 11-21, 1972.

[22] Toutanova K., Klein D., Manning C., and Singer Y., Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology , Edmonton, Canada, pp. 252-259, 2003.

[23] Unsupervised Learning., available at: http:// en.wikipedia.org/wiki/Unsupervised_learning, last visited 2013 .

[24] Wu Z. and Palmer M., Verb Semantics and Lexical Selection, in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics , New Mexico, USA, pp. 133-138, 1994.

[25] Wubben S., Van den A., and Krahmer E., Paraphrase Generation as Monolingual Translation: Data and Evaluation, available at: http://ilk.uvt.nl/~swubben/publications/INLG201 0.pdf, last visited 2010.

[26] Zia U. and Wasif A., Paraphrase Identification using Semantic Heuristic Features, Research Journal of Applied Sciences , Engineering and Technology , vol. 4, no. 22, pp. 4894-4904, 2012. Majid Mohebbi received the MSc degree in software engineering from Shahid Beheshti University in 2013, Iran. His research interests include semantic similarity and NLP. Alireza Talebpour received his MSc degree in Artificial Intelligence and PhD degrees in Image Processing from University of Surrey, United Kingdom. His research interests include image processing and pattern recognition, intelligent methods for classification of massive data.