Collective Mining of Vertical Social Communities

Supported by NSF Award #1838145. $733,875. September 15, 2018 - August 2022.
BIGDATA: F: Collaborative Research: Collective Mining of Vertical Social Communities


A large fraction of internet social media content is found in thousands of specialized communities that are hosted by news outlets, typically in the form of reader forums or comments on news articles. The users of the such a site are said to form a vertical social community (VSC), because they deeply engage with a single media source. While each VSC is tiny compared to broad communities such as Facebook, they are important because they expose how different segments of society feel about various world events. This can be a very useful resource for downstream intelligence and predictive analytics. However, current web crawlers cannot effectively access VSCs. Thus their data is invisible to search engines, and remains hidden from analytics tools. The goals of this project are to enable effective access to vertical social communities coalesced at news reports online, and to mine their comments and debates. This project will provide researchers with tools to collect data from these communities and analyze them. The educational component of the project includes the involvement of graduate and undergraduate student training and research and the incorporation of research projects and results in courses

The researchers will develop algorithms to unearth the content generated at thousands of vertical social communities and make their content transparently accessible to data management and analytics tools. The researchers will develop novel deep learning techniques for content detection, and build a novel scalable end-to-end system for real-time access and collective mining of these communities, capable of handling large parallel data streams based on shifting ideas. The specific algorithms will include user population estimation, bootstrap communication patterns for automatic crawling of content, and fine-grained sentiment analysis for intelligence and predictive analytics. Software tools will be made available to researchers in academe and industry. Distribution of free, open-source software for implementing the techniques developed will enhance existing research infrastructure




  • [Temple University] Lihong He
  • [Temple University] Zhijia Chen
  • [Temple University] Abdullah Aljebreen
  • [University of Houston] Fan Yang
  • [University of Houston] Marjan Hosseinia
  • [University of Houston] Yifan Zhang


  • [ICWSM’21] Lihong He, Chen Shen, Arjun Mukherjee, Slobodan Vucetic, Eduard Dragut: [PDF] [BibTex]

    Cannot Predict Comment Volume of a News Article before (a few) Users Read It. The International AAAI Conference on Web and Social Media. 2021.

  • [COLING'20] Fan Yang, Eduard Dragut, and Arjun Mukherjee. [PDF] [BibTex]

    Predicting Personal Opinion on Future Events with Fingerprints. International Conference on ComputationalLinguistics, (COLING'20). Dec. 2020. [acceptance rate: 33.4%]

  • [ASONAM'20]Fan Yang, Eduard Dragut, and Arjun Mukherjee. [PDF] [BibTex]

    Claim Verification under Positive Unlabeled Learning. The international conference series on Advances in Social Network Analysis and Mining (ASONAM'20). Dec. 2020.

  • [SBP’20] Yigeng Zhang, Fan Yang, Yifan Zhang, Eduard Dragut and Arjun Mukherjee: [PDF] [BibTex]

    Birds of a Feather Flock Together: Satirical News Detection via Language Model Differentiation. International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction. 2020.

  • [SBP’20] Marjan Hosseinia, Eduard Dragut and Arjun Mukherjee: [PDF] [BibTex]

    Stance Prediction for Contemporary Issues: Data and Experiments. The 8th International Workshop on Natural Language Processing for Social Media. July 9-10, 2020.

  • [SBP19] Marjan Hosseinia, Eduard Dragut and Arjun Mukherjee: [PDF] [BibTex]

    Pro/Con: Neural Detection of Stance in Argumentative Opinion. International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction. 2019

  • [WIREs'19] Lihong He, Chao Han, Arjun Mukherjee, Zoran Obradovic, Eduard Dragut: [PDF] [BibTex]

    On the Dynamics of User Engagement in News Comment Media, WIREs Data Mining and Knowledge Discovery. 2019.

Research Experience for Undergraduates

  • Kunal Waghray. JSON Schema Matching: Empirical Observations. SIGMOD/PODS-SRC. Pages: 2887–2889. 2020