bioin.motif.greedy_motif_search¶

bioin.motif.greedy_motif_search(dna, k, t)[source]¶

Calculate t k-mers from dna that have the best score (i.e. the most frequently occur t k-mers in the given dna)

Parameters:	dna (list) – matrix, has t rows (t strings in the list). k (int) – k-mer, k nuleotides in each string of the list. t (int) – t is the number of k-mers in dna to return, also equal to the row number of dna 2D matrix.
Returns:	List, (or can take it as 2D matrix, a sub matrix of dna), t k-mers.

Examples

GreedyMotifSearch, starts by setting best_motifs equal to the first k-mer from each string in Dna (each row assign a k-mer), then ranges over all possible k-mers in dna[0], the algorithm then builds a profile matrix Profile fro this lone k-mer, and sets Motifs[1] equal to the profile_most_probable k-mer in dna[1]. Then iterates by updating Profile as the profile matrix formed from Motifs[0] and Motifs[1], and sets Motifs[2] equal to the profile_most_probable k-mer in dna[2]. After finding k-mers Motifs in the first i strings of Dna, GreedyMotifSearch constructs Profile(Motifs) and sets Motifs[i] equal to the profile_most_probable k-mer from dna[i] based on this profile matrix.

>>> dna = ['GGCGTTCAGGCA', 'AAGAATCAGTCA', 'CAAGGAGTTCGC', 'CACGTCAATCAC', 'CAATAATATTCG']
>>> k = 3
>>> t = 5
>>> t_kmers = greedy_motif_search(dna, k, t)
>>> t_kmers
    ['CAG', 'CAG', 'CAA', 'CAA', 'CAA']