INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

Analysis of presence-only data via semi-supervised learning approaches

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/10652

Files in this item

File Description Format
PDF posonly.pdf (163KB) Main Article PDF
Title: Analysis of presence-only data via semi-supervised learning approaches
Author(s): Wang, Junhui; Fang, Yixin
Subject(s): Cross validation Functional genomics Stability Support vector machine
Abstract: Presence-only data occur in classification, which consist of a sample of observations from presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semisupervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction.
Issue Date: 2013-03
Publisher: Elsevier
Citation Info: Wang JH, Fang YX. Analysis of presence-only data via semi-supervised learning approaches. Computational Statistics & Data Analysis. Mar 2013;59:134-143. DOI: 10.1016/j.csda.2012.10.007
Type: Article
Description: NOTICE: This is the author’s version of a work that was accepted for publication in Computational Statistics and Data Analysis. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computational Statistics and Data Analysis, Vol 59, (2012) DOI: 10.1016/j.csda.2012.10.007
URI: http://hdl.handle.net/10027/10652
ISSN: 0167-9473
Date Available in INDIGO: 2013-11-22
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 187
China 127
Germany 9
United Kingdom 9
Ukraine 8

Browse

My Account

Information

Access Key