Poster Presentations
Day 3, June 12(Fri.) Room P (5F 501+502)
- 3P-12
MassBank in silico: Construction of a Structural Annotation Analysis Pipeline for Low-Molecular-Weight Compounds
(1UOsaka, 2Kyushu Univ., 3Kyushu Univ., 4Tokyo Univ. Agr. Tech., 5Niigata Univ., 6AIST, 7UOsaka)
oTaihei Torigoe1, Masatomo Takahashi1, Heravizadeh Omidreza3, Taiki Hirose4, Kazuki Ikeda2, Kohta Nakatani5, Yuki Soma6, Takeshi Bamba2,3, Fumio Matsuda7, Hiroshi Tsugawa4, Yoshihiro Izumi1
Although recent LC/MS-based metabolomics can detect thousands of metabolic features, 80–90% of them remain unidentified. To address this issue, we developed an ensemble-based structural annotation pipeline that integrates multiple structural annotation tools to improve the transparency and accuracy of MassBank spectra. Additionally, we built a unified-HILIC/AEX retention time prediction model in Python using scikit-learn and evaluated it with the SRM 1950 plasma sample. The model was developed with three elements in mind: a high-quality training dataset, calculation of 12,420 molecular descriptors, and systematic descriptor selection and model construction. The pipeline was implemented in Python 3.11.5 on Ubuntu (WSL2). Retention time-based filtering eliminated approximately 50% of candidate structures as false positives in plasma data evaluation, demonstrating its effectiveness in reducing annotation errors. We further integrated CFM-ID, MetFrag, MS-FINDER, and SIRIUS into the ensemble framework. Future work will validate the practical utility of the pipeline for retention time prediction and expand it into a new framework called MassBank in silico, which combines the complementary strengths of existing annotation tools.
