For the analysis of complicated biological molecular systems, integrated computational approaches are now required, where several different software programs even in the different scales and different levels are hybridized and driven synchronously. The grid architecture is advantageous for this purpose. Here, a new markup language, BioMolecular Simulation Markup Language (BMSML), is proposed based on the XML technology suitable for the grid system, so that data communication is made between the different program pieces by a standard description that is independent of the individual program developers. By using the BMSML, application program units are easily exchangeable among many routines, which have similar functions.
Until now, many application programs used only binary data for intermediate and temporary data in the field of High-Performance Computing (HPC). This was because data files become very large and access very slow if data is described in a text format. In the BMSML, three forms are applicable: a text form, a hexadecimal form, and a Base64 form. In particular when the Base64 form is used, the size is only about 1.3 times larger than that used in the binary form.
The advantage of the BMSML for intermediate and output data is that any meta-data can be easily added as an attribute or as tagged information in addition to the actual computed data to be exchanged among the different application programs. It should be emphasized that the unit of data can always be provided in the BMSML, so that the different application programs recognize and confirm the unit system for computation and analysis. Researchers in different scientific fields usually use their own conventional units, and a barrier for research integration is encountered without a common understanding of the units.
The data structure in the BMSML is simple with a shallow hierarchy. In
the BMSML version 1, only the data description is defined, and the three
tags are used:
In the future version, a tag of <workflow> will be added to describe the workflow of any simulation programs on the grid architecture.
The schema of the BMSML and the corresponding API libraries for C, C++, and even FORTRAN languages are provided at http://www.biogrid.jp/BMSML/.
The BMSML version 1 was developed in 2004 by Prof. Haruki Nakamura, Institute
for Protein Research, Osaka University, and Dr. Toshikazu Takada, Fundamental
and Environmental Research Laboratories, NEC Corporation, in a research
project supported by the IT-program of the Japanese Ministry of Education,
Culture, Sports, Science and Technology.