Babenko V.N., Kosarev P.S., Basin V.G.
Institute of Cytology & Genetics, 630090, Novosibirsk, Russia; E-mail: bob@bionet.nsc.ru , FAX: (07)-383-235-6558
The preliminary investigation of eukaryotic promoters DNA structure revealed the relative abundance with non-perfect repeats of DNA fragments in the promoter regions of eukaryotic genes. The repeats clusters could lead to cruciform DNA structures, H-form DNA, loop emerging and other non-canonical structures leading to the DNA relaxation [1,2]. This could affect transcriptional activity of the promoters and to provide modulation of binding sites affinity for TATA - less promoters in particular [1,2].
For investigation we take the eukaryotic promoter regions from EPD promoter
database [3]. Applying simulation we deduce that 1-st order Markov chains
model is very near to the expected distribution of the repeats occurrences
in the promoter regions. So the expected frequency of the repeat is assessed
on the base of dinucleotide frequencies [4]:
(1),
where f(Ni) is the observed frequency of (oligo) nucleotide
. The significance of the particular
oligonucleotide abundant occurrences (significantly frequent oligonucleotides)
is assessed with 1.d.f. chi-square test: ,
where n - observed particular oligonucleotide copies in a target DNA sequence
of length L, - the total
number of l-length oligonucleotides in a sequence, p is from
(1), . We found that a total
number of direct repeats is in particular abundance that could not be described
in terms of model (1). The relative abundance is also valid when analysing
the complementary and invert repeats using model (1). On the contrary,
the invert complementary repeats number is well-suited to the (1) model.
It could be considered as a control of the statistical model chosen, since
there is no evidence today, that such repeats are functionally or structurally
important.