INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Aspect and Entity Extraction from Opinion Documents

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/9477

Files in this item

File Description Format
PDF Zhang_Lei.pdf (768KB) (no description provided) PDF
Title: Aspect and Entity Extraction from Opinion Documents
Author(s): Zhang, Lei
Advisor(s): Liu, Bing
Contributor(s): Cruz, Isabel; Yu, Clement; Gmytrasiewicz, Piotr; Zhou, Chi
Department / Program: Computer Science
Graduate Major: Computer Science
Degree Granting Institution: University of Illinois at Chicago
Degree: PhD, Doctor of Philosophy
Genre: Doctoral
Subject(s): Opinion Mining Sentiment Analysis Text Mining Information Extraction Text Classification
Abstract: Opinion mining has been an active research area in Web mining and Natural Language Processing (NLP) in recent years. In this thesis, we present a comprehensive study of aspect and entity extraction from opinion documents for opinion mining. We first introduce the aspect-based opinion mining model. Then, we propose a new method for aspect extraction and ranking, which is based on language patterns and dependency grammar. Meanwhile, it is capable of ranking extracted aspects by their importance, i.e. relevancy and frequency. In addition, we discover that there are two kinds of special product aspects in some domains. One is noun aspect implying opinion. The other is the resource term. Novel extraction algorithms are proposed to identify them from opinion documents. In terms of entity extraction task, it is similar to the classic named entity extraction (NER) problem. However, there is a major difference. In a typical opinion mining application, the users often want to find opinions on some competing entities, e.g., competing or relevant products. This implies that the discovered entities must be of the same type/class. Basically, this is a set expansion problem. To deal with this problem, we present two set expansion algorithms for entity extraction in opinion documents. One is based on positive and unlabeled (PU) learning model. The other is based on Bayesian Sets. We also discuss extracting topic documents from a collection. Opinion mining system crawls and indexes opinion documents first and then used for different specific tasks later. Typically, the documents are not well categorized because one does not know what the future tasks will be. Normally, keyword search is used to find relevant opinion documents for analysis. However, the documents that are retrieved in this way can have both low recall and low precision. Another way is to train a document classifier. But the training procedure is time-consuming and labor-intensive.We propose an unsupervised technique to solve this problem based on a new PU learning algorithm.
Issue Date: 2012-12-13
Genre: thesis
URI: http://hdl.handle.net/10027/9477
Rights Information: Copyright 2012 Lei Zhang
Date Available in INDIGO: 2012-12-13
Date Deposited: 2012-08
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 407
China 263
Russian Federation 38
United Kingdom 13
Malaysia 10

Browse

My Account

Information

Access Key