bioin.motif.greedy_motif_search_with_pseudocount¶
-
bioin.motif.greedy_motif_search_with_pseudocount(dna, k, t)[source]¶ Calculate t k-mers from dna that have the best score (i.e. the most frequently occur t k-mers in the given dna). With pseudocount.
Parameters: - dna (list) – 2D matrix, has t rows.
- k (int) – k-mer.
- t (int) – t is the number of k-mers in dna to return, also equal to the row number of dna 2D matrix.
Returns: List, (or can take it as 2D matrix, a sub matrix of dna), t k-mers.
Examples
GreedyMotifSearch, starts by setting best_motifs equal to the first k-mer from each string in Dna (each row assign a k-mer), then ranges over all possible k-mers in dna[0], the algorithm then builds a profile matrix Profile fro this lone k-mer, and sets Motifs[1] equal to the profile_most_probable k-mer in dna[1]. Then iterates by updating Profile as the profile matrix formed from Motifs[0] and Motifs[1], and sets Motifs[2] equal to the profile_most_probable k-mer in dna[2]. After finding k-mers Motifs in the first i strings of Dna, GreedyMotifSearch constructs Profile(Motifs) and sets Motifs[i] equal to the profile_most_probable k-mer from dna[i] based on this profile matrix.
>>> dna = ["GGCGTTCAGGCA", "AAGAATCAGTCA", "CAAGGAGTTCGC", "CACGTCAATCAC", "CAATAATATTCG"] >>> k = 3 ["TTC", "ATC", "TTC", "ATC", "TTC"] >>> t_kmers_pseudo= greedy_motif_search_with_pseudocount(dna, k, t) >>> t_kmers_pseudo ["TTC", "ATC", "TTC", "ATC", "TTC"]