INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Lifelong Machine Learning for Topic Modeling and Classification

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/20234

Files in this item

File Description Format
PDF Chen_Zhiyuan.pdf (2MB) (no description provided) PDF
Title: Lifelong Machine Learning for Topic Modeling and Classification
Author(s): Chen, Zhiyuan
Advisor(s): Liu, Bing
Contributor(s): Yu, Philip S.; Berger-Wolf, Tanya; Ziebart, Brian; Amatriain, Xavier
Department / Program: Computer Science
Graduate Major: Computer Science
Degree Granting Institution: University of Illinois at Chicago
Degree: PhD, Doctor of Philosophy
Genre: Doctoral
Subject(s): Lifelong Machine Learning Lifelong Learning Topic Modeling Classification Big Data
Abstract: Machine Learning (ML) has been successfully used as a prevalent approach for many computational tasks and applications. However, most ML algorithms are designed to address a specific problem using a single dataset. That is, given a dataset, an ML algorithm is run on the dataset to build a model. Although this one-shot learning is very important and useful, it can never make an AI system intelligent, and its accuracy is also limited. Lifelong Machine Learning (LML), on the other hand, aims to design and develop computational systems and algorithms that learn as humans do, i.e., retaining the results learned in the past, abstracting knowledge from them, and using the knowledge to help future learning and problem solving. The rationale is that when faced with a new situation, we humans use our previous experience and knowledge to help deal with and learn from the new situation. It is essential to incorporate such a capability into a computational system to make it more versatile, holistic, and intelligent. This thesis presents my Ph.D. research work on designing lifelong machine learning approaches for both unsupervised learning and supervised learning. For unsupervised learning, we focus on the area of topic modeling, which aims to discover coherent semantic topics from the documents. For supervised learning, we propose to improve the problem of classification with the integration of lifelong machine learning. Topic modeling has been widely used to uncover topics from document collections. Such topics are important in many text mining and machine learning tasks such as classification, retrieval, clustering and summarization. However, classic unsupervised topic models can generate many incoherent topics. To address them, we proposed several knowledge-based topic models (Chen et al., 2013d; Chen et al., 2013b; Chen et al., 2013c) which require the knowledge to be provided by domain experts. To further ameliorate the topic quality from topic models, in (Chen and Liu, 2014b; Chen and Liu, 2014a), we proposed to automatically extract, accumulate and filter knowledge with the idea of LML, i.e., lifelong machine learning. The experimental results shown in these papers demonstrate the effectiveness of the proposed LML approaches. We also apply LML for supervised learning, specifically classification. Classification is a widely studied machine learning task. The goal is to classify certain objects into a fixed set of categories. Deviated from traditional classification problem which focuses on a single domain, we proposed our Lifelong Sentiment Classification (LSC) model (Chen et al., 2015) which automatically extracts and accumulates sentiment oriented knowledge. Such knowledge is utilized using regularization under the Naive Bayesian optimization framework. The experimental results demonstrate that our proposed LSC model is able to accomplish better and better classification performance with knowledge accumulated from an increasing number of domains, which shows the advantages of having LML. Based on this thesis, we believe that the Lifelong Machine Learning (LML) capability can lead to more robust computational systems to overcome the dynamics and complexity of real-world problems to produce better predictability.
Issue Date: 2016-02-17
Genre: thesis
URI: http://hdl.handle.net/10027/20234
Rights Information: Copyright 2015 Zhiyuan Chen
Date Available in INDIGO: 2018-02-18
Date Deposited: 2015-12
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 155
China 139
Russian Federation 38
Ukraine 23
Germany 16

Browse

My Account

Information

Access Key