INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Probabilistic Models for Fine-Grained Opinion Mining: Algorithms and Applications

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/19083

Files in this item

File Description Format
PDF Mukherjee_Arjun.pdf (2MB) (no description provided) PDF
Title: Probabilistic Models for Fine-Grained Opinion Mining: Algorithms and Applications
Author(s): Mukherjee, Arjun
Advisor(s): Liu, Bing
Contributor(s): Buy, Ugo; Kanich, Chris; Wang, Junhui; Yu, Philip S.
Department / Program: Computer Science
Graduate Major: Computer Science
Degree Granting Institution: University of Illinois at Chicago
Degree: PhD, Doctor of Philosophy
Genre: Doctoral
Subject(s): Data mining Graphical models Natural language processing Social media Statistics Text mining.
Abstract: Public sentiments in online debates, discussions, comments are crucial to governmental agencies for passing new bills/policy, gauging upheaval, predicting elections, etc. However, to leverage the sentiments expressed in social opinions, we face two major challenges: (1) fine-grained opinion mining, and (2) filtering opinion spam to ensure credible opinion mining. We start with mining opinions from social conversations. We focus on fine-grained sentiment dimensions like agreement (I’d agree), disagreement (I refute). This is a major departure from the traditional polar (positive/negative) sentiments (e.g., good, nice vs. poor, bad) in standard opinion mining. In the domain of debates, joint topic and sentiment models are proposed to discover disagreement and agreement expressions, and contention points/topics both at the discussion level and also at the individual post level. Proposed models also encode interactions among discussants through quoting and replying relations.. Next, we address the problem of semantic incoherence in aspect extraction by knowledge induction using seeds. Seeds are certain user defined coarse groupings which guide the modeling process. Specifically, we build over topic models to propose novel aspect specific sentiment models guided by aspect seeds. The later part of this thesis proposes solutions for detecting opinion spam. Opinion spam refers to “illegitimate” human activities (e.g., writing fake reviews) that try to mislead readers by giving undeserving opinions/ratings to some entities (e.g., hotels, products) to promote/demote them. We address two problems in opinion spam. First is the problem of group spam, i.e., a group of spammers working in collusion. A novel relational ranking algorithm called GSRank is proposed for ranking spam groups based on mutual-reinforcement. The second problem is opinion spam detection in the absence of labeled data. The situation is important as it is hard and erroneous to manually label fake reviews or reviewers. Our solution is based on the hypothesis that spammers differ markedly from others on behavioral dimensions which creates a distributional divergence between two (latent) population clusters: spammers and non-spammers. Modeling spamicity of users as “latent” with observed behavioral footprints, novel generative models are proposed for detecting opinion spam/fraud.
Issue Date: 2014-10-28
Genre: thesis
URI: http://hdl.handle.net/10027/19083
Rights Information: Copyright 2014 Arjun Mukherjee
Date Available in INDIGO: 2016-11-05
Date Deposited: 2014-08
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
China 206
United States of America 105
Russian Federation 32
Ukraine 7
Germany 5

Browse

My Account

Information

Access Key