about
manual
BioPfuga
Development df BioPfuga
The usual usage of the grid architecture is running one computation on many distributed CPUs through a rapid network. However, in order to analyze much more complicated biological systems, composed of simulations at different levels, along the new paradigm for biological science, more integrated computational approaches are required. The individual programs should be driven on their own corresponding machines on the grid system. For this purpose, we have designed and developed a new platform, BioPfuga (Biosimulation Platform United on Grid Architecture) where individual applications, corresponding to the different levels of bio-simulations, are united and executed as a hybrid application.

BioPfuga requires that (1) application programs are divided into a set of many pieces, each of which corresponds to a unit simulation procedure, and that (2) data communication is made between the program pieces by a standard description. For the former requirement, the simulation unit should not be too small for rapid computation, so that the data communication time among different machines is minimal. For the latter problem, we have proposed a simple, standard description using XML, UDS-XML (Universal Data Set-XML) for the data exchange between different computational programs for actual execution of Grid computations. Until now, many application programs used only binary data for intermediate and temporary data in the field of High-Performance Computing (HPC). This was BioPfuga data files become very large and access very slow if data is described in a text format. When the XML description is used, we designed three forms: a text form, a hexadecimal form, and a Base64 form. In particular when the Base64 form is used, the size is only about 1.3 times larger than that used in the binary form. The advantage of the XML form for intermediate and output data is that any meta-data can be easily added as an attribute or as tagged information in addition to the actual computed data to be exchanged among the different application programs. It should be emphasized that the unit of data can always be provided in UDS-XML, so that the different application programs recognize and confirm the unit system for computation and analysis. Researchers in different scientific fields usually use their own conventional units, and a barrier for research integration is encountered without a common understanding of the units.

The schema of UDS-XML, examples for three forms (a text form, a hexadecimal form, and a Base64 form), and the corresponding program library are provided at http://www.biogrid.jp/BioPfuga/uds-xml/. People who are interested in BioPfuga, please request to the authors.


Application of BioPfuga to hybrid-QM/MM calculations
The static and dynamic features of protein structures are now analyzed by molecular simulations. Chemical reactions at the active sites of proteins require information about dynamic features of electrons, which cannot be attained by a classical molecular mechanics simulation. Thus, we combine quantum chemical (QM) simulations, AMOSS1), with the molecular mechanic (MM) simulations, prestoX-basic2, 3), in a way that is suitable for Grid computation. We first divided the two big programs into a set of many pieces. Then, a hybrid calculation was performed following the work flow in Fig. 1. As a computational platform, we used PC clusters composed of Pentium-3 and Pentium-4 processors, and the MD part was also driven on the special-purpose computer for MD simulations, MDGrape24).


Figure 1: Work flow of BioPfuga for a hybrid-QM/MM calculation using AMOSS1) and prestoX-basic2, 3).

As an example of this hybrid-QM/MM calculation, a simple model system composed of ethanol in water was simulated (Fig. 2). Here, the ethanol molecule and two water molecules that are close to the ethanol hydroxyl group are treated as the QM region, and the Hartree-Fock molecular orbitals (MO) were analyzed using 35 basis functions of MINI-45). The other 226 water molecules were represented using a conventional classical model, TIP4P6). The Nose-Hoover algorithm was applied at constant (283 K) temperature without any truncation of the non-bonded interactions. The CAP boundary with the 13Á radius was used.


Figure 2:
A snap shot of the hybrid-QM/MM calculation for a simple model system composed of ethanol in water.
The shape of the highest occupied molecular orbital (HOMO) of the ethanol molecule and two water molecules close to the ethanol hydroxyl group is shown by green (+0.003 e/au3) and orange (-0.003 e/au3). The ball-and-stick models of the ethanol and the two water molecules are also shown with the red oxygen atoms. The other 226 water molecules treated by a classical TIP4P model are shown with the blue oxygen and white hydrogen atoms.

From the canonical ensemble of the molecular system, it was found that the gauche rotamer was as stable as the trans rotamer associated with the ethanol dihedral angle around the -C-O- covalent bond. In contrast, when the classical force field7) was used for ethanol, only the trans conformer was stable. When the more precise basis functions are applied to ethanol, the more reliable results can be obtained, and the force-field problems may be overcome. In the current computational system, we used a Gigabit Ethernet among the PCs, and it took about 0.1 s for 100 kBytes data transfer with the hexadecimal UDS-XML form plus the corresponding XML parsing and writing procedures at one MD unit step.



Issues for integration and future plans
We propose BioPfuga as a new platform to integrate several different simulation programs for complicated analyses in bioscience fields. However, it is still preliminary leaving several unresolved problems.

First, the format of UDS-XML should be improved for more general and efficient use, not only for the hybrid-QM/MM calculations, but also for other integrated simulations covering continuum models for molecular systems8), and cell and organ simulations9). More semantic information should be added to the current form, with the aid of recent technology of the semantic Web. Second, the procedure for handling the XML data now requires longer CPU time than the corresponding procedure using the simple binary form. The XML interface should be revised for more speedy procedures. Third, the current Globus has no tools for the function of dynamic spawning, and so we mainly used MPI/LAM10) in the current implementation of BioPfuga. Fourth, for the integration between the Computing Grid and the Data Grid, all the files used in the simulation calculations should use the XML form, based on the expected rapid development of XML-Database systems. Finally, a new application for managing the work-flow using a GUI is also necessary for future advanced usage of BioPfuga.

These issues are now being overcome, with collaborative studies among the members of BioGrid in the groups of the core grid, data grid, and computing grid technologies.



References
(1) Sakuma, T., Kashiwagi, H., Takada, T., and H. Nakamura, "Ab initio MO study of the chlorophyll dimer in the photosynthetic reaction center. I. A. theoretical treatment of the electrostatic field created by the surrounding proteins," Int. J. Quant. Chem., 61, 1, pp.137-151, 1997.

(2) Fukunishi, Y., Mikami, Nakamura, H., "The filling potential method: a method for estimating the free energy surface for protein-ligand docking," J. Phys. Chem. B, 2003, in press.

(3) Nakajima, N., Higo, J., Kidera, A., and Nakamura, H., "Free energy landscapes of peptides by enhanced conformational sampling," J. Mol. Biol., 296, 1, pp. 197-216, 2000. (4) Narumi, T., Susukita, R., Ebisuzaki, T., McNiven, G., and Elmegreen, B., "Molecular dynamics machine: Special-purpose computer for molecular dynamics simulations," Mol. Simulation, 21, 5/6, pp. 401-415, 1999.

(5) Huzinaga, S., Gaussian Basis Sets for Molecular Calculations, Elsevier, New York, 1984.

(6) Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., "Comparison of simple potential functions for simulating liquid water," J. Chem. Phys., 79, 2, pp. 926-035, 1983.

(7) Jorgensen, W.L., "Optimized Intermolecular Potential Functions for Liquid Alcohols," J. Phys. Chem., 90, 7, pp. 1276-1284, 1986.

(8) Nakamura, H., "Roles of electrostatic interactions in proteins," Quart. Rev. Biophys., 29, 1, pp. 1-90, 1996.

(9) Noble, D., "Modeling the heart: from genes to cells to the whole organ," Science, 295, 5560, pp. 1678-1682, 2002.

(10) Squyres, J. M., Lumsdaine, A., George, W. L., Hagedorn, J. G., and Devaney, J. E., "The Interoperable Message Passing Interface (IMPI) Extensions to LAM/MPI," in Proceedings of MPIDC2000, 2000.