2P-02 PDF
マススペクトルの比較に有用なデータマインニング法と機械学習法の検討
Mass spectrometry (MS) holds great promise for biomarker discovery and disease diagnosis. A variety of statistical methods have been utilized to deal with the large and high dimensional MS data. However, there is a limited number of studies that systematically compare the statistical tools with each other.
Using principal component analysis (PCA), we previously evaluated the applicability of probe electrospray ionization mass spectrometry (PESI-MS) for cancer diagnosis. Even though PCA is a powerful and widely used technique for dimensionality reduction, it is not absolutely suitable for the purpose of classification because it uses no class information. Moreover, the obtained principal components are not easy to interpret, as they are transformed data on a new coordinate system.
In this study, we used multiclass data sets obtained by the measurements with PESI-MS to compare the performance of several data mining tools for identifying class-specific peaks, which include Student t-test, Wilcoxon rank sum test, analysis of variance and Bayesian model selection. We also assessed the potential of various algorithms for data classification including linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression model (LRM) and support vector machines (SVMs).