bioin.replication.skew_array¶
-
bioin.replication.skew_array(genome)[source]¶ Compute the skew array of genome as a list. In the table containing nucleotide counts for T. petrophila (reproduced below), we noted that not just C but also G has peculiar statistics on the forward and reverse half-strands.
xxxxxxxxxxxxxxxxxxxx #C #G #A #T
xxxxx Entire strand 427419 413241 491488 491363
Reverse half-strand 219518 201634 243963 246641
Forward half-strand 207901 211607 247525 244722
xxxxxxx Difference +11617 -9973 -3362 +1919
In practice, scientists use a more accurate approach that accounts for both G and C when searching for ori. As the above figure illustrates, the difference between the total amount of guanine and the total amount of cytosine is negative on the reverse half-strand and positive on the forward half-strand.
We will keep track of the difference between the total number of occurrences of G and the total number of occurrences of C that we have encountered so far in Genome by using a skew array. This array, denoted Skew, is defined by setting Skew[i] equal to the number of occurrences of G minus the number of occurrences of C in the first i nucleotides of Genome (see figure below). We also set Skew[0] equal to zero.
The array Skew for Genome = “CATGGGCATCGGCCATACGCC”.
xxxxxx i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Skew[i] 0 -1 -1 -1 0 1 2 1 1 1 0 1 2 1 0 0 0 0 -1 0 -1 -2
xxxx Text C A T G G G C A T C G G C C A T A C G C C
Given a string Genome, we can form its skew array by setting Skew[0] equal to 0, and then ranging through the genome. At position i of Genome, if we encounter an A or a T, we set Skew[i+1] equal to Skew[i]; if we encounter a G, we set Skew[i+1] equal to Skew[i]+1; if we encounter a C, we set Skew[i+1] equal to Skew[i]-1.
Parameters: genome (str) – a DNA string genome. Returns: List, the i-th element is skew[i], which equals to the number of G minus the number of C, in the first i nucleotides of the genome, set skew[0]=0. Examples
The array Skew for Genome = “CATGGGCATCGGCCATACGCC”.
>>> genome = 'CATGGGCATCGGCCATACGCC' >>> array_skew = skew_array(genome) >>> array_skew [0, -1, -1, -1, 0, 1, 2, 1, 1, 1, 0, 1, 2, 1, 0, 0, 0, 0, -1, 0, -1, -2]