Searching for the repeats in regulatory regions of eukaryotic genes

Babenko V.N., Kosarev P.S., Basin V.G.

Institute of Cytology & Genetics, 630090, Novosibirsk, Russia; E-mail: bob@bionet.nsc.ru , FAX: (07)-383-235-6558

The preliminary investigation of eukaryotic promoters DNA structure revealed the relative abundance with non-perfect repeats of DNA fragments in the promoter regions of eukaryotic genes. The repeats clusters could lead to cruciform DNA structures, H-form DNA, loop emerging and other non-canonical structures leading to the DNA relaxation [1,2]. This could affect transcriptional activity of the promoters and to provide modulation of binding sites affinity for TATA - less promoters in particular [1,2].

For investigation we take the eukaryotic promoter regions from EPD promoter database [3]. Applying simulation we deduce that 1-st order Markov chains model is very near to the expected distribution of the repeats occurrences in the promoter regions. So the expected frequency of the repeat is assessed on the base of dinucleotide frequencies [4]:

(1),

where f(Ni) is the observed frequency of (oligo) nucleotide . The significance of the particular oligonucleotide abundant occurrences (significantly frequent oligonucleotides) is assessed with 1.d.f. chi-square test: , where n - observed particular oligonucleotide copies in a target DNA sequence of length L, - the total number of l-length oligonucleotides in a sequence, p is from (1), . We found that a total number of direct repeats is in particular abundance that could not be described in terms of model (1). The relative abundance is also valid when analysing the complementary and invert repeats using model (1). On the contrary, the invert complementary repeats number is well-suited to the (1) model. It could be considered as a control of the statistical model chosen, since there is no evidence today, that such repeats are functionally or structurally important.

  1. Ackerman S.L., Minden A.G., Yeung, C. The minimal self-sufficient elemment in a murine G+C rich promoter is a large element with imperfect dyad symmetry. PNAS, 1993, v. 60, pp.11865-11869.
  2. Valerio, D. Duyvesteyn, M.G.C., Dekker, B.M.M., Weeda, G., Berkvens, Th.M., L. Van der Voorn, A.J. van der Eb. Adenosine deaminase: characteruization and expression of a gene with remarkable promoter. EMBO Journal, 1985, v. 4, n. 2. pp. 437-443.
  3. Peter R.C., Juner T., Bucher P. 1998. The eukaryotic promoter database EPD. Nucl. Acids Res. V. 26. P. 353-357.
  4. Brendel, V., Beckmann, J.S., Trifonov, E.N. Linguistics of nucleotide sequences: morphology and comparison of vocabularies. Journal of Biomolecular structure and dynamics, 1986, v. 4, N. 1, pp. 11-21.