bioin.motif.greedy_motif_search_with_pseudocount

bioin.motif.greedy_motif_search_with_pseudocount(dna, k, t)[source]

Calculate t k-mers from dna that have the best score (i.e. the most frequently occur t k-mers in the given dna). With pseudocount.

Parameters:
  • dna (list) – 2D matrix, has t rows.
  • k (int) – k-mer.
  • t (int) – t is the number of k-mers in dna to return, also equal to the row number of dna 2D matrix.
Returns:

List, (or can take it as 2D matrix, a sub matrix of dna), t k-mers.

Examples

GreedyMotifSearch, starts by setting best_motifs equal to the first k-mer from each string in Dna (each row assign a k-mer), then ranges over all possible k-mers in dna[0], the algorithm then builds a profile matrix Profile fro this lone k-mer, and sets Motifs[1] equal to the profile_most_probable k-mer in dna[1]. Then iterates by updating Profile as the profile matrix formed from Motifs[0] and Motifs[1], and sets Motifs[2] equal to the profile_most_probable k-mer in dna[2]. After finding k-mers Motifs in the first i strings of Dna, GreedyMotifSearch constructs Profile(Motifs) and sets Motifs[i] equal to the profile_most_probable k-mer from dna[i] based on this profile matrix.

>>> dna = ["GGCGTTCAGGCA", "AAGAATCAGTCA", "CAAGGAGTTCGC", "CACGTCAATCAC", "CAATAATATTCG"]
>>> k = 3 ["TTC", "ATC", "TTC", "ATC", "TTC"]
>>> t_kmers_pseudo= greedy_motif_search_with_pseudocount(dna, k, t)
>>> t_kmers_pseudo
    ["TTC", "ATC", "TTC", "ATC", "TTC"]