Abstract

Poster Presentations

Day 2: Thursday, May 15  Poster Room(Gekko)

Evaluation of Data Mining and Machine Learning Approaches for the Comparison of the Mass Spectra of Biological Tissues

(1Univ. Yamanashi, 2Waseda Univ.)
oHisashi Johno1, Masataka Kawai1, Kazunori Nakamoto1, Kentaro Yoshimura1, Satoshi Funayama1, Jyo Nakamura1, Kunio Tanabe2, Sen Takeda1

Mass spectrometry (MS) holds great promise for biomarker discovery and disease diagnosis. A variety of statistical methods have been utilized to deal with the large and high dimensional MS data. However, there is a limited number of studies that systematically compare the statistical tools with each other.
Using principal component analysis (PCA), we previously evaluated the applicability of probe electrospray ionization mass spectrometry (PESI-MS) for cancer diagnosis. Even though PCA is a powerful and widely used technique for dimensionality reduction, it is not absolutely suitable for the purpose of classification because it uses no class information. Moreover, the obtained principal components are not easy to interpret, as they are transformed data on a new coordinate system.
In this study, we used multiclass data sets obtained by the measurements with PESI-MS to compare the performance of several data mining tools for identifying class-specific peaks, which include Student t-test, Wilcoxon rank sum test, analysis of variance and Bayesian model selection. We also assessed the potential of various algorithms for data classification including linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression model (LRM) and support vector machines (SVMs).