bioin.motif.GibbsSampler

bioin.motif.GibbsSampler(Dna, k, t, N)[source]

Using GibbsSampler method to return the best motifs of t k-mers in each of the strings in Dna.

Parameters:
  • Dna (list) – matrix, a collection of strings dna, has t rows.
  • k (int) – k-mer.
  • t (int) – int, t is the number of k-mers in dna to return, also equal to the row number of dna 2D matrix.
  • N (int) – the number of iterations that we plan to run the program.
Returns:
List, string matrix represent the best motifs, t k-mers of each row of the strings in Dna.

Examples

Although GibbsSampler performs well in many cases, it may converge to a suboptimal solution, particularly for difficult search problems with elusive motifs. A local optimum is a solution that is optimal within a small neighboring set of solutions, which is in contrast to a global optimum, or the optimal solution among all possible solutions. Since GibbsSampler explores just a small subset of solutions, it may “get stuck” in a local optimum. For this reason, similarly to RandomizedMotifSearch, it should be run many times with the hope that one of these runs will produce the best-scoring motifs. Yet convergence to a local optimum is just one of many issues we must consider in motif finding.

>>> Dna = ['CGCCCCTCTCGGGGGTGTTCAGTAAACGGCCA', 'GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG', 'TAGTACCGAGACCGAAAGAAGTATACAGGCGT', 'TAGATCAAGTTTCAGGTGCACGTCGGTGAACC', 'AATCCACCAGCTCCACGTGCAATGTTGGCCTA']
>>> k = 8
>>> t = 5
>>> N = 100
>>> best_motif_gibs = GibbsSampler(Dna, k, t, N)
>>> best_motif_gibs
    ['AACGGCCA', 'AAGTGCCA', 'TAGTACCG', 'AAGTTTCA', 'ACGTGCAA']