Engineering an online collection point to standardize biosample spectral data and metadata allows for the creation of machine learning models.
Alvaro Fernandez-Galiana (Schmidt Science Fellow)(pictured)
Biosample Spectral Repository for Machine Learning
PI Alvaro Fernandez-Galiana (Schmidt Fellow, MIT)
The Biosample Spectral Repository (BSR) for Machine Learning, an initiative of Alvaro Fernandez-Galiana, Schmidt Science Fellow (PI), seeks to foster the development of machine learning models for detecting the existence of pathogens from spectroscopy data of patient biosamples. As spectra provide a measure of the vibrational modes of sample molecules, they are a kind of molecular fingerprint that may be altered in systematic but unpredictable ways in the presence of disease, presenting a tantalizing opportunity for machine learning disease classification. A successful ML classification technology would process spectra from regularly or ordinarily collected biosamples, such as blood plasma or urine, and provide an accurate initial screening for a variety of disease conditions that would be followed up with traditional confirmatory tests, much earlier diagnosis and more effective treatments.
Major obstacles to this vision are the scarcity and complexity of biosample spectral data: they are expensive to create and label, there are many combinations of diseases, biosamples, patient characteristics and spectral modalities, each with associated metadata, and protocols and instrumentation can differ widely among clinical laboratories. The BSR attempts to address this onerous situation through collaboration by providing an online community collection point for biosample spectral data and metadata, organized according to a formal data model. Crowdsourced data are filtered and reduced into administrator-defined, quality controlled datasets and made available to the community in an expanding digital catalog. Therefore, the proposed BSR resource is envisioned alongside spectroscopy and machine learning as the third enabling component for the solution to this problem.