bioin.motif.profile_most_probable_kmer

bioin.motif.profile_most_probable_kmer(text, k, motif_profile)[source]

Find a profile-most probable k-mer in a string.

Parameters:
  • text (str) – genome string.
  • k (int) – length of the k-mer.
  • motif_profile (dict) – a 4 * k matrix profile (i.e. profile matrix).
Returns:

String, a profile-most probable k-mer in text.

Examples

Given a profile matrix (i.e. motif_profile or Profile), we can compute the probability of every k-mer in a string Text and find a profile-most probable k-mer in Text, i.e., a k-mer that was most likely to have been generated by Profile among all k-mers in Text. For the NF-κB profile matrix, “ACGGGGATTACC” is the Profile-most probable 12-mer in “ggtACGGGGATTACCt”. Indeed, every other 12-mer in this string has probability 0. In general, if there are multiple Profile-most probable k-mers in Text, then we select the first such k-mer occurring in Text.

>>> text = 'ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT'
>>> k = 5
>>> motif_profile = {'A': [0.2, 0.2, 0.3, 0.2, 0.3], 'C': [0.4, 0.3, 0.1, 0.5, 0.1], 'G': [0.3, 0.3, 0.5, 0.2, 0.4], 'T': [0.1, 0.2, 0.1, 0.1, 0.2]}
>>> kmer = profile_most_probable_kmer(text, k, motif_profile)
>>> kmer
    "CCGAG"