Keyphrase Extraction in Citation Networks: How Do Citation Contexts Help?

Cornelia Caragea
Assistant Professor
Computer Science and Engineering Department University of North Texas
SERC 306
Wednesday, April 26, 2017 - 11:00
Keyphrase extraction is defined as the problem of automatically extracting descriptive phrases or concepts from documents. Keyphrases for a document act as a concise summary of the document and have been successfully used in many applications such as query formulation, document clustering, classification, recommendation, indexing, and summarization. Previous approaches to keyphrase extraction generally use the textual content of a target document or a local neighborhood that consists of textually-similar documents. We posit that, in a scholarly domain, in addition to a document's textual content and textually-similar neighbors, other informative neighborhoods exist that have the potential to improve keyphrase extraction. In particular, research papers are not isolated. Rather, they are highly inter-connected in giant citation networks, in which papers cite or are cited by other papers in appropriate citation contexts, i.e., short text segments surrounding a citation's mention. These contexts are not arbitrary, but they serve as brief summaries of a cited paper. We effectively exploit citation context information for keyphrase extraction and show remarkable improvements in performance over strong baselines in both supervised and unsupervised settings.