bioin.motif.greedy_motif_search¶
-
bioin.motif.greedy_motif_search(dna, k, t)[source]¶ Calculate t k-mers from dna that have the best score (i.e. the most frequently occur t k-mers in the given dna)
Parameters: - dna (list) – matrix, has t rows (t strings in the list).
- k (int) – k-mer, k nuleotides in each string of the list.
- t (int) – t is the number of k-mers in dna to return, also equal to the row number of dna 2D matrix.
Returns: List, (or can take it as 2D matrix, a sub matrix of dna), t k-mers.
Examples
GreedyMotifSearch, starts by setting best_motifs equal to the first k-mer from each string in Dna (each row assign a k-mer), then ranges over all possible k-mers in dna[0], the algorithm then builds a profile matrix Profile fro this lone k-mer, and sets Motifs[1] equal to the profile_most_probable k-mer in dna[1]. Then iterates by updating Profile as the profile matrix formed from Motifs[0] and Motifs[1], and sets Motifs[2] equal to the profile_most_probable k-mer in dna[2]. After finding k-mers Motifs in the first i strings of Dna, GreedyMotifSearch constructs Profile(Motifs) and sets Motifs[i] equal to the profile_most_probable k-mer from dna[i] based on this profile matrix.
>>> dna = ['GGCGTTCAGGCA', 'AAGAATCAGTCA', 'CAAGGAGTTCGC', 'CACGTCAATCAC', 'CAATAATATTCG'] >>> k = 3 >>> t = 5 >>> t_kmers = greedy_motif_search(dna, k, t) >>> t_kmers ['CAG', 'CAG', 'CAA', 'CAA', 'CAA']