bioin.replication.approximate_pattern_count

bioin.replication.approximate_pattern_count(pattern, text, d)[source]

Compute the number of occurrences of pattern in text with at most d mismatches. Given input strings Text and Pattern as well as an integer d, we extend the definition of pattern_count to the function approximate_pattern_count(Pattern, Text, d). This function computes the number of occurrences of Pattern in Text with at most d mismatches. For example, approximate_pattern_count(‘AAAAA’, ‘AACAAGCATAAACATTAAAGAG’, 1) = 4.

This is because AAAAA appears four times in this string with at most one mismatch: AACAA, ATAAA, AAACA, and AAAGA. Notice that two of these occurrences overlap.

Parameters:
  • pattern (str) – a sub DNA string.
  • text (str) – a DNA string.
  • d (int) – the number of maximum mismatches.
Returns:

Integer, the number of occurrences of pattern in text with at most d mismatches.

Examples

The number of times Pattern appears in Text with at most d mismatches.

>>> pattern = 'GAGG'
>>> text = 'TTTAGAGCCTTCAGAGG'
>>> d = 2
>>> approx_count = approximate_pattern_count(pattern, text, d)
>>> approx_count
    4