การใช้กราฟความหมายสำหรับสกัดคำสำคัญจากเอกสารในเครื่องมือค้นหาข้อมูลเชิงลึก: Using Semantic Graph for Keyword Extraction in Vertical Search Engine

Rachada Kongkachandra; Wasit Limprasert; Pokpong  Songmuang; Chainarong  Kesamoon

Authors

Rachada Kongkachandra Department of Computer Science, Faculty of Science and Technology, Thammasat University Rangsit Centre, Pathum Thani
Wasit Limprasert Department of Computer Science, Faculty of Science and Technology, Thammasat University Rangsit Centre, Pathum Thani
Pokpong Songmuang Department of Computer Science, Faculty of Science and Technology, Thammasat University Rangsit Centre, Pathum Thani
Chainarong Kesamoon Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University Rangsit Centre, Pathum Thani

Keywords:

Vertical search, Semantic search engine, Conceptual graph, Keyword extraction, Semantic graph

Abstract

This paper presents the usage of semantic graphs as a knowledge resource for semantic comparison between queries and document contents. The semantic graph is generated by the concept of Natural Language Processing. The processes in semantic graph generation are started from document preprocessing. These preprocessing steps are word tokenization, stop word removal, and part-of-speech tagging. The second step is sentence parsing, which is the dependency parsing for this paper. The third step is text-to-semantic graph conversion. In this paper, the semantic graphs are represented in terms of conceptual graphs. Finally, these semantic graphs are measured for their semantic relatedness and then are used for extracting keywords. To evaluate the proposed technique, the semantic graphs and keywords are generated and extracted using 380 documents from the IEEE website and 144 documents from the SemEval standard corpus. The precision, recall, and F1 scores are 40%, 53%, and 40%, respectively.

Downloads

Download data is not yet available.

References

Page, Larry. PageRank: Bringing Order to the Web. Stanford Digital Library Project [Internet]. 2002 [cited 2020 August 31]. Available from: http://infolab.stanford.edu/~page/papers/pagerank/index.htm

Microsoft Inc. Microsoft's New Search at Bing.com Helps People Make Better Decisions. [Internet]. 2009. [cited 2020 August 31]. Available from: https://news.microsoft.com/2009/05/28/microsofts-new-search-at-bing-com-helps-people-make-better-decisions/

Wu YB, Li Q, Bot RS, Chen X. Domain-Specific Keyphrase Extraction. Proceedings of the 14th ACM international conference on Information and knowledge management; Bremen, Germany, 2005;283-4.

Turney P. Learning algorithms for keyphrase extraction. Information Retrieval 2000;2:303–36.

Tomokiyo T, Hurst M. A Language Model Approach to Keyphrase Extraction. Proceedings of the ACL Workshop on Multiword Expressions; Sapporo, Japan, 2003;33–40.

Barker K, Cornacchia N. Using Noun Phrase Heads to Extract Document Keyphrases. Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence; Montreal, Quebec, Canada, 2000:40–52.

Liu F, Pennell D, Liu F, Liu Y. Unsupervised approaches for automatic keyword extraction using meeting transcripts. Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics; Boulder, Colorado, 2009:620–8.

Matsuo Y, Ishizuka M. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 2004:157-69.

Grineva M, Grinev M, Lizorkin D. Extracting key terms from noisy and multitheme documents. Proceedings of the 18th International Conference on World Wide Web; New York, NY, 2009: 661–7.

Jiang JJ, Conrath DW. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the 10th Research on Computational Linguistics International Conference; Taipei, Taiwan, 1997:19-33.

Lin D. An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning; San Francisco, CA, 1998:296-304.

Wu Z, PalmerM. Verb semantics and lexical selection. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics; Las Cruces New Mexico, 1994:133-8.

BrightEdge.com. What is vertical search optimization or VSO?. [Internet]. 2018. [cited 2020 August 31]. Available from: https://www.brightedge.com/blog/vso-vertical-search-optimization

Long B, Chang Y. Relevance Ranking for Vertical Search Engines; Morgan Kaufmann, ISBN: 978-0-12-407171-1, 2014.

Kim SN, Medelyan O, Kan MY, Baldwin T. SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles. Proceedings of the 5th International Workshop on Semantic Evaluation; Uppsala, Sweden; 2010: 21-6.

Sowa JF. Conceptual Graph Standard and Extension. Conceptual Structures: Theory, Tools and Applications. Proceedings of the 6th International Conference on Conceptual Structures; Montpellier, France. August 10-12, 1998:3-14.

Mihalcea R, Tarau P. Textrank: Bringing order into text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; Barcelona, Spain, 2004: 404–11.

Bougouin A , Boudin F , Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP); Nagoya, Japan, 2013:543-51.

Using Semantic Graph for Keyword Extraction in Vertical Search Engine