bioin.motif.profile_most_probable_kmer¶
-
bioin.motif.profile_most_probable_kmer(text, k, motif_profile)[source]¶ Find a profile-most probable k-mer in a string.
Parameters: - text (str) – genome string.
- k (int) – length of the k-mer.
- motif_profile (dict) – a 4 * k matrix profile (i.e. profile matrix).
Returns: String, a profile-most probable k-mer in text.
Examples
Given a profile matrix (i.e. motif_profile or Profile), we can compute the probability of every k-mer in a string Text and find a profile-most probable k-mer in Text, i.e., a k-mer that was most likely to have been generated by Profile among all k-mers in Text. For the NF-κB profile matrix, “ACGGGGATTACC” is the Profile-most probable 12-mer in “ggtACGGGGATTACCt”. Indeed, every other 12-mer in this string has probability 0. In general, if there are multiple Profile-most probable k-mers in Text, then we select the first such k-mer occurring in Text.
>>> text = 'ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT' >>> k = 5 >>> motif_profile = {'A': [0.2, 0.2, 0.3, 0.2, 0.3], 'C': [0.4, 0.3, 0.1, 0.5, 0.1], 'G': [0.3, 0.3, 0.5, 0.2, 0.4], 'T': [0.1, 0.2, 0.1, 0.1, 0.2]} >>> kmer = profile_most_probable_kmer(text, k, motif_profile) >>> kmer "CCGAG"