The problem of bias in training data in regression problems in medical decision support

  • B. Mac Namee
  • , P. Cunningham
  • , S. Byrne
  • , O. I. Corrigan

Research output: Contribution to journalArticlepeer-review

Abstract

This paper describes a bias problem encountered in a machine learning approach to outcome prediction in anticoagulant drug therapy. The outcome to be predicted is a measure of the clotting time for the patient; this measure is continuous and so the prediction task is a regression problem. Artificial neural networks (ANNs) are a powerful mechanism for learning to predict such outcomes from training data. However, experiments have shown that an ANN is biased towards values more commonly occurring in the training data and is thus, less likely to be correct in predicting extreme values. This issue of bias in training data in regression problems is similar to the associated problem with minority classes in classification. However, this bias issue in classification is well documented and is an on-going area of research. In this paper, we consider stratified sampling and boosting as solutions to this bias problem and evaluate them on this outcome prediction problem and on two other datasets. Both approaches produce some improvements with boosting showing the most promise.

Original languageEnglish
Pages (from-to)51-70
Number of pages20
JournalArtificial Intelligence in Medicine
Volume24
Issue number1
DOIs
Publication statusPublished - 2002
Externally publishedYes

Keywords

  • Anticoagulant drug therapy
  • Artificial neural networks
  • Medical decision support
  • Regression

Fingerprint

Dive into the research topics of 'The problem of bias in training data in regression problems in medical decision support'. Together they form a unique fingerprint.

Cite this