bioin.replication.skew_array

bioin.replication.skew_array(genome)[source]

Compute the skew array of genome as a list. In the table containing nucleotide counts for T. petrophila (reproduced below), we noted that not just C but also G has peculiar statistics on the forward and reverse half-strands.

xxxxxxxxxxxxxxxxxxxx #C #G #A #T

xxxxx Entire strand 427419 413241 491488 491363

Reverse half-strand 219518 201634 243963 246641

Forward half-strand 207901 211607 247525 244722

xxxxxxx Difference +11617 -9973 -3362 +1919

In practice, scientists use a more accurate approach that accounts for both G and C when searching for ori. As the above figure illustrates, the difference between the total amount of guanine and the total amount of cytosine is negative on the reverse half-strand and positive on the forward half-strand.

We will keep track of the difference between the total number of occurrences of G and the total number of occurrences of C that we have encountered so far in Genome by using a skew array. This array, denoted Skew, is defined by setting Skew[i] equal to the number of occurrences of G minus the number of occurrences of C in the first i nucleotides of Genome (see figure below). We also set Skew[0] equal to zero.

The array Skew for Genome = “CATGGGCATCGGCCATACGCC”.

xxxxxx i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Skew[i] 0 -1 -1 -1 0 1 2 1 1 1 0 1 2 1 0 0 0 0 -1 0 -1 -2

xxxx Text C A T G G G C A T C G G C C A T A C G C C

Given a string Genome, we can form its skew array by setting Skew[0] equal to 0, and then ranging through the genome. At position i of Genome, if we encounter an A or a T, we set Skew[i+1] equal to Skew[i]; if we encounter a G, we set Skew[i+1] equal to Skew[i]+1; if we encounter a C, we set Skew[i+1] equal to Skew[i]-1.

Parameters:genome (str) – a DNA string genome.
Returns:List, the i-th element is skew[i], which equals to the number of G minus the number of C, in the first i nucleotides of the genome, set skew[0]=0.

Examples

The array Skew for Genome = “CATGGGCATCGGCCATACGCC”.

>>> genome = 'CATGGGCATCGGCCATACGCC'
>>> array_skew = skew_array(genome)
>>> array_skew
    [0, -1, -1, -1, 0, 1, 2, 1, 1, 1, 0, 1, 2, 1, 0, 0, 0, 0, -1, 0, -1, -2]