Classification of array CGH data using smoothed logistic regression model

  • Jian Huang
  • , Agus Salim
  • , Kaibin Lei
  • , Kathleen O'Sullivan
  • , Yudi Pawitan

Research output: Contribution to journalArticlepeer-review

Abstract

Array comparative genomic hybridization (aCGH) provides a genome-wide information of DNA copy number that is potentially useful for disease classification. One immediate problem is that the data contain many features (probes) but only a few samples. Existing approaches to overcome this problem include features selection, ridge regression and partial least squares. However, these methods typically ignore the spatial characteristic of aCGH data. To explicitly make use of this spatial information we develop a procedure called smoothed logistic regression (SLR) model. The procedure is based on a mixed logistic regression model, where the random component is a mixture distribution that controls smoothness and sparseness. Conceptually such a procedure is straightforward, but its implementation is complicated due to computational problems. We develop a fast and reliable iterative weighted least-squares algorithm based on the singular value decomposition. Simulated data and two real data sets are used to illustrate the procedure. For real data sets, error rates are calculated using the leave-one-out cross validation procedure. For both simulated and real data examples, SLR achieves better misclassification error rates compared with previous methods.

Original languageEnglish
Pages (from-to)3798-3810
Number of pages13
JournalStatistics in Medicine
Volume28
Issue number30
DOIs
Publication statusPublished - 30 Dec 2009

Keywords

  • Cancer
  • Cross-validation
  • Genomics
  • High-throughput data
  • Machine learning

Fingerprint

Dive into the research topics of 'Classification of array CGH data using smoothed logistic regression model'. Together they form a unique fingerprint.

Cite this