Skip to main navigation Skip to search Skip to main content

A comprehensive pipeline to integrate preprocessing and machine learning techniques for accurate classification in Raman spectroscopy

  • University College Cork
  • Insight Centre for Data Analytics
  • School of Computer Science and Information Technology

Research output: Chapter in Book/Report/Conference proceedingsConference proceedingpeer-review

Abstract

Raman spectroscopy, a non-invasive analytical method, offers insights into molecular structures and interactions in various liquid and solid samples with applications ranging from material science, and chemical analysis to medical diagnostics. Preprocessing of Raman spectra is vital to remove interferences like background signals and calibration errors, ensuring precise data extraction. Artificial intelligence, particularly machine learning (ML), aids in extracting valuable information from complex datasets. However, effective data preprocessing proves to be crucial as it can influence model robustness. This study addresses the integration of preprocessing and ML algorithms, often treated as distinct identities despite their intrinsic interconnection, in Raman spectra of blood samples from patients suffering from ovarian cancer. Optimal preprocessing configuration may not always be evident due to the complexity of spectral data. There are numerous options available for background corrections, normalization, outlier removal, noise filtering, and dimension reduction algorithms for Raman spectra. Moreover, hyperparameter tuning is required to detect the best choices for the preprocessing steps. In this work, we present a pipeline to co-optimize preprocessing techniques and ML classification methods to promote objective selection and minimize processing time. In our approach, preprocessing methods are not chosen arbitrarily but rather systematically evaluated to enhance the robustness of the models. These criteria focus on ensuring that the model performs well not only on the training data but also on unseen data, thus reducing the risk of overfitting and improving the generalization capability of the model. This systematic approach would reduce the time for new studies by detecting the most suitable preprocessing steps and hyperparameters needed and building a robust model for the task.

Original languageEnglish
Title of host publicationData Science for Photonics and Biophotonics
EditorsThomas Bocklitz
PublisherSPIE
ISBN (Electronic)9781510673403
DOIs
Publication statusPublished - 2024
EventData Science for Photonics and Biophotonics 2024 - Strasbourg, France
Duration: 10 Apr 202412 Apr 2024

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume13011
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

ConferenceData Science for Photonics and Biophotonics 2024
Country/TerritoryFrance
CityStrasbourg
Period10/04/2412/04/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Classification
  • Machine Learning
  • Ovarian Cancer
  • Pipeline
  • Preprocessing
  • Raman spectroscopy

Fingerprint

Dive into the research topics of 'A comprehensive pipeline to integrate preprocessing and machine learning techniques for accurate classification in Raman spectroscopy'. Together they form a unique fingerprint.

Cite this