Ponomarenko J.V., Furman D.P., Mischenko T.M., Katokhina L.V., Valuev V.V., Peregoedova E.L., Frolov A.S., Podkolodny N.L.1, Ponomarenko M.P., Kolchanov N.A.
Institute of Cytology & Genetics, 630090, Novosibirsk, Russia; FAX: +7(3832)356-558; E-mail: jpon@bionet.nsc.ru;
1Institute of Computational Mathematics & Mathematical Geophysics, Novosibirsk, Russia;
Recent evaluations of the genome annotation algorithms have shown the necessity to increase the recognition accuracy of the functional DNA/RNA sites (Burset, 1996; Fickett, 1997). Thus, it is timely to search for additional sources of experimental data applicable to recognition of the functional sites from their sequences. We suggest to compile the activity values of the sites and physico-chemical and conformational properties of DNA/RNA. Employing the earlier described linear-additive approximation (Kolchanov, 1998), these data allow to predict the activity of the functional sites from their sequences. First, we have described in a data base over 240 experiments on promoters, protein-binding sites, mRNA leaders, pre-mRNA processing sites, and many other DNA and RNA sites with the activities characterized quantitatively in terms of kinetic and equilibrium constants, lifetime and helical bend of DNA/protein complexes, cutting efficiencies, the reporter gene expression, transcription or translation levels, etc. In the second data base, we have compiled over 30 complete sets of dinucleotide values of propeller, twist, tip, tilt, bend, wedge, direction, inclination, rise, depth, width, dist, size, persistent length, entropy, enthalpy, free energy, melting temperature, and other DNA and RNA properties. Then, we have cited all the huge body of experimental data via an reference base on article titles, authors, journals, abstracts, figure and table numbers. Currently, this base comprises over 60 articles. Next, the linear-additive approximation (Kolchanov, 1998) was applied to determine those mean values of the DNA/RNA properties in the neighborhood of the functional sites that can be used to predict activities of the sites from their sequences. These programs have been stored in the C-code data base. Finally, the above data bases on (1) the functional DNA/RNA site activities, (2) the conformational and physico-chemical DNA\RNA properties, (3) the relevant references, and (4) the C-code programs predicting site activities from their sequences have been integrated using SRS query language (Etzold, 1993). That resulted the distributed knowledge base ACTIVITY for the functional DNA/RNA site activities, http://wwwmgs.bionet.nsc.ru/systems/Activity/.
This work was supported by Russian Human Genome, Russian Foundation for Basic Research, SB RAS Young Scientists Grants.