bioin.replication.approximate_pattern_matching¶
-
bioin.replication.approximate_pattern_matching(pattern, text, d)[source]¶ Find all approximate occurences of a pattern in a string. We say that a k-mer Pattern appears as a substring of Text with at most d mismatches if there is some k-mer substring Pattern’ of Text having d or fewer mismatches with Pattern; that is, HammingDistance(Pattern, Pattern’) ≤ d. Our observation that a DnaA box may appear with slight variations leads to the following generalization of the Pattern Matching Problem.
Parameters: - text (str) – a DNA string genome.
- pattern (str) – a substring of DNA genome.
- d (int) – at most d mismatches between pattern and a substring in text.
Returns: List, all starting positions where pattern appears as a substring of text with at most d mismatches.
Examples
Positions that a pattern appears as a substring of text with at most d mismatches fulfilled.
>>> pattern = 'ATTCTGGA' >>> text = 'CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAAT' >>> d = 3 >>> positions = approximate_pattern_matching(pattern, text, d) >>> positions [6, 7, 26, 27]