INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Modeling Big Data Variety with Graph Mining Techniques

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/19119

Files in this item

File Description Format
PDF Kong_Xiangnan.pdf (7MB) (no description provided) PDF
Title: Modeling Big Data Variety with Graph Mining Techniques
Author(s): Kong, Xiangnan
Advisor(s): Yu, Philip S.
Contributor(s): Liu, Bing; Lillis, John; Wang, Junhui; Ragin, Ann B.
Department / Program: Computer Science
Graduate Major: Computer Science
Degree Granting Institution: University of Illinois at Chicago
Degree: PhD, Doctor of Philosophy
Genre: Doctoral
Subject(s): Graph Mining Data Mining Big Data Data Variety Subgraph Pattern Feature Selection Uncertain Data Drug Discovery Brain Network
Abstract: Graphs are ubiquitous and have become increasingly important in modeling diverse kinds of objects. In many real-world applications, instances are not represented as feature vectors, but as graphs with complex structures, e.g., chemical compounds, program flows, XML web documents and brain networks. One central issue in graph mining research is graph classification, which has a wide variety of real world applications, e.g., drug activity predictions, toxicology tests and kinase inhibitions. There are some major challenges in real-world graph classification problems as follows: 1) Learning from graphs with multiple labels:} For example, a chemical compound can inhibit the activities of multiple types of kinases, e.g., ATPase and MEK kinase; One drug molecular can have anti-cancer efficacies on multiple types of cancers. 2) Learning from a small number of labeled graphs: In many real world applications, the labels of graph data are very expensive or difficult to obtain. Creating a large training dataset can be too expensive, time-consuming or even infeasible. For example, in molecular medicine, it requires time, efforts and excessive resources to test drugs' anti-cancer efficacies by pre-clinical studies and clinical trials, while there are often copious amounts of unlabeled drugs or molecules available from various sources. 3) Learning from uncertain graphs: For example, in neuroimaging, the functional connectivities among different brain regions are highly uncertain. In such applications, each human brain can be represented as an uncertain graph, instead of a certain graph. In this thesis, we explore four different settings of graph classification: multi-label setting, semi-supervised setting, active learning setting, and uncertain graph setting. In the multi-label setting, each graph object can be assigned with multiple labels. In semi-supervise setting and active learning setting, we explore two different settings to reduce the labeling costs in graph classification problems. In uncertain graph setting, we explore how to incorporate the uncertainty information in the graph structure for graph classification problems.
Issue Date: 2014-10-28
Genre: thesis
URI: http://hdl.handle.net/10027/19119
Rights Information: Copyright 2014 Xiangnan Kong
Date Available in INDIGO: 2016-10-29
Date Deposited: 2014-08
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
China 259
United States of America 174
Russian Federation 37
United Kingdom 12
Ukraine 10

Browse

My Account

Information

Access Key