Package Bio :: Package SeqUtils :: Module MeltingTemp
[hide private]
[frames] | no frames]

Source Code for Module Bio.SeqUtils.MeltingTemp

   1  # Copyright 2004-2008 by Sebastian Bassi. 
   2  # Copyright 2013 by Markus Piotrowski. 
   3  # All rights reserved. 
   4  # This code is part of the Biopython distribution and governed by its 
   5  # license.  Please see the LICENSE file that should have been included 
   6  # as part of this package. 
   7   
   8  """Calculate the melting temperature of nucleotide sequences. 
   9   
  10  This module contains three different methods to calculate the melting 
  11  temperature of oligonucleotides: 
  12   
  13  1. Tm_Wallace: 'Rule of thumb' 
  14  2. Tm_GC: Empirical formulas based on GC content. Salt and mismatch corrections 
  15     can be included. 
  16  3. Tm_NN: Calculation based on nearest neighbor thermodynamics. Several tables 
  17     for DNA/DNA, DNA/RNA and RNA/RNA hybridizations are included. 
  18     Correction for mismatches, dangling ends, salt concentration and other 
  19     additives are available. 
  20   
  21  Tm_staluc is the 'old' NN calculation and is kept for compatibility. It is, 
  22  however, recommended to use Tm_NN instead, since Tm_staluc may be depreceated 
  23  in the future. Also, Tm_NN has much more options. Using Tm_staluc and Tm_NN 
  24  with default parameters gives (essentially) the same results. 
  25   
  26  General parameters for most Tm methods: 
  27   - seq -- A Biopython sequence object or a string. 
  28   - check -- Checks if the sequence is valid for the given method (default=True). 
  29     In general, whitespaces and non-base characters are removed and 
  30     characters are converted to uppercase. RNA will be backtranscribed. 
  31   - strict - Do not allow base characters or neighbor duplex keys (e.g. 'AT/NA'). 
  32     that could not or not unambigiously be evaluated for the respective 
  33     method (default=True). Note that W (= A or T) and S (= C or G) are 
  34     not ambiguous for Tm_Wallace and Tm_GC. If 'False', average values 
  35     (if applicable) will be used. 
  36   
  37  This module is not able to detect self-complementary and it will not use 
  38  alignment tools to align an oligonucleotide sequence to its target sequence. 
  39  Thus it can not detect dangling-ends and mismatches by itself (don't even think 
  40  about bulbs and loops). These parameters have to be handed over to the 
  41  respective method. 
  42   
  43  Other public methods of this module: 
  44   - make_table     : To create a table with thermodynamic data. 
  45   - salt_correction: To adjust Tm to a given salt concentration by different 
  46     formulas. This method is called from Tm_GC and Tm_NN but may 
  47     also be accessed 'manually'. It returns a correction term, not 
  48     a corrected Tm! 
  49   - chem_correction: To adjust Tm regarding the chemical additives DMSO and 
  50     formaldehyde. The method returns a corrected Tm. Chemical 
  51     correction is not an integral part of the Tm methods and must 
  52     be called additionally. 
  53   
  54  Examples: 
  55   
  56      >>> from Bio.SeqUtils import MeltingTemp as mt 
  57      >>> from Bio.Seq import Seq 
  58      >>> mystring = 'CGTTCCAAAGATGTGGGCATGAGCTTAC' 
  59      >>> myseq = Seq(mystring) 
  60      >>> print('%0.2f' % mt.Tm_Wallace(mystring)) 
  61      84.00 
  62      >>> print('%0.2f' % mt.Tm_Wallace(myseq)) 
  63      84.00 
  64      >>> print('%0.2f' % mt.Tm_GC(myseq)) 
  65      58.73 
  66      >>> print('%0.2f' % mt.Tm_NN(myseq)) 
  67      60.32 
  68   
  69  Tm_NN with default values gives same result as 'old' Tm_staluc. However, values 
  70  differ for RNA, since Tm_staluc had some errors for RNA calculation. These 
  71  errors have been fixed in this version. 
  72   
  73  New Tm_NN can do slightly more: 
  74  Using different thermodynamic tables, e.g. from Breslauer '86 or Sugimoto '96: 
  75   
  76      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN1))  # Breslauer '86 
  77      72.19 
  78      >>> print('%0.2f' % mt.Tm_NN(myseq, nn_table=mt.DNA_NN2))  # Sugimoto '96 
  79      65.47 
  80   
  81  Several types of salc correction (for Tm_NN and Tm_GC): 
  82   
  83      >>> for i in range(1, 8): 
  84      ...     print("Type: %d, Tm: %0.2f" % (i, Tm_NN(myseq, saltcorr=i))) 
  85      ... 
  86      Type: 1, Tm: 54.27 
  87      Type: 2, Tm: 54.02 
  88      Type: 3, Tm: 59.60 
  89      Type: 4, Tm: 60.64 
  90      Type: 5, Tm: 60.32 
  91      Type: 6, Tm: 59.78 
  92      Type: 7, Tm: 59.78 
  93   
  94  Correction for other monovalent cations (K+, Tris), Mg2+ and dNTPs according 
  95  to von Ahsen et al. (2001) or Owczarzy et al. (2008) (for Tm_NN and Tm_GC): 
  96   
  97      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10)) 
  98      60.79 
  99      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5)) 
 100      67.39 
 101      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, saltcorr=7)) 
 102      66.81 
 103      >>> print('%0.2f' % mt.Tm_NN(myseq, Na=50, Tris=10, Mg=1.5, dNTPs=0.6, 
 104      ...                          saltcorr=7)) 
 105      66.04 
 106   
 107  Dangling ends and mismatches, e.g.:: 
 108   
 109      Oligo:     CGTTCCaAAGATGTGGGCATGAGCTTAC         CGTTCCaAAGATGTGGGCATGAGCTTAC 
 110                 ::::::X:::::::::::::::::::::   or    ::::::X::::::::::::::::::::: 
 111      Template:  GCAAGGcTTCTACACCCGTACTCGAATG        TGCAAGGcTTCTACACCCGTACTCGAATGC 
 112   
 113  Here: 
 114   
 115      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 116      60.32 
 117      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', 
 118      ...                    c_seq='GCAAGGCTTCTACACCCGTACTCGAATG')) 
 119      55.39 
 120      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC', shift=1, 
 121      ...                   c_seq='TGCAAGGCTTCTACACCCGTACTCGAATGC')) 
 122      55.69 
 123   
 124  Make your own tables, or update/extend existing tables. E.g., add values for 
 125  locked nucleotides. Here, 'locked A' (and its complement) should be represented 
 126  by '1': 
 127   
 128      >>> mytable = mt.make_table(oldtable=mt.DNA_NN3, 
 129      ...                         values={'A1/T1':(-6.608, -17.235), 
 130      ...                         '1A/1T':(-6.893, -15.923)}) 
 131      >>> print('%0.2f' % mt.Tm_NN('CGTTCCAAAGATGTGGGCATGAGCTTAC')) 
 132      60.32 
 133      >>> print('%0.2f' % mt.Tm_NN('CGTTCCA1AGATGTGGGCATGAGCTTAC', 
 134      ...                           nn_table=mytable, check=False)) 
 135      ... # 'check' must be False, otherwise '1' would be discarded 
 136      62.53 
 137   
 138  """ 
 139   
 140  from __future__ import print_function 
 141   
 142  import math 
 143  import warnings 
 144   
 145  from Bio import SeqUtils, Seq 
 146  from Bio import BiopythonWarning 
 147   
 148  __docformat__ = "restructuredtext en" 
 149   
 150  # Thermodynamic lookup tables (dictionaries): 
 151  # Enthalpy (dH) and entropy (dS) values for nearest neighbors and initiation 
 152  # process. Calculation of duplex initiation is quite different in several 
 153  # papers; to allow for a general calculation, all different initiation 
 154  # parameters are included in all tables and non-applicable parameters are set 
 155  # to zero. 
 156  # The key is either an initiation type (e.g., 'init_A/T') or a nearest neighbor 
 157  # duplex sequence (e.g., GT/CA, to read 5'GT3'-3'CA5'). The values are tuples 
 158  # of dH (kcal/mol), dS (cal/mol K). 
 159   
 160  # DNA/DNA 
 161  # Breslauer et al. (1986), Proc Natl Acad Sci USA 83: 3746-3750 
 162  DNA_NN1 = { 
 163      'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 164      'init_oneG/C': (0, -16.8), 'init_allA/T': (0, -20.1), 'init_5T/A': (0, 0), 
 165      'sym': (0, -1.3), 
 166      'AA/TT': (-9.1, -24.0), 'AT/TA': (-8.6, -23.9), 'TA/AT': (-6.0, -16.9), 
 167      'CA/GT': (-5.8, -12.9), 'GT/CA': (-6.5, -17.3), 'CT/GA': (-7.8, -20.8), 
 168      'GA/CT': (-5.6, -13.5), 'CG/GC': (-11.9, -27.8), 'GC/CG': (-11.1, -26.7), 
 169      'GG/CC': (-11.0, -26.6)} 
 170   
 171  # Sugimoto et al. (1996), Nuc Acids Res 24 : 4501-4505 
 172  DNA_NN2 = { 
 173      'init': (0.6, -9.0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 174      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 175      'sym': (0, -1.4), 
 176      'AA/TT': (-8.0, -21.9), 'AT/TA': (-5.6, -15.2), 'TA/AT': (-6.6, -18.4), 
 177      'CA/GT': (-8.2, -21.0), 'GT/CA': (-9.4, -25.5), 'CT/GA': (-6.6, -16.4), 
 178      'GA/CT': (-8.8, -23.5), 'CG/GC': (-11.8, -29.0), 'GC/CG': (-10.5, -26.4), 
 179      'GG/CC': (-10.9, -28.4)} 
 180   
 181  # Allawi and SantaLucia (1997), Biochemistry 36: 10581-10594 
 182  DNA_NN3 = { 
 183      'init': (0, 0), 'init_A/T': (2.3, 4.1), 'init_G/C': (0.1, -2.8), 
 184      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 185      'sym': (0, -1.4), 
 186      'AA/TT': (-7.9, -22.2), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -21.3), 
 187      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 188      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 189      'GG/CC': (-8.0, -19.9)} 
 190   
 191  # SantaLucia & Hicks (2004), Annu. Rev. Biophys. Biomol. Struct 33: 415-440 
 192  DNA_NN4 = { 
 193      'init': (0.2, -5.7), 'init_A/T': (2.2, 6.9), 'init_G/C': (0, 0), 
 194      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 195      'sym': (0, -1.4), 
 196      'AA/TT': (-7.6, -21.3), 'AT/TA': (-7.2, -20.4), 'TA/AT': (-7.2, -20.4), 
 197      'CA/GT': (-8.5, -22.7), 'GT/CA': (-8.4, -22.4), 'CT/GA': (-7.8, -21.0), 
 198      'GA/CT': (-8.2, -22.2), 'CG/GC': (-10.6, -27.2), 'GC/CG': (-9.8, -24.4), 
 199      'GG/CC': (-8.0, -19.0)} 
 200   
 201  # RNA/RNA 
 202  # Freier et al. (1986), Proc Natl Acad Sci USA 83: 9373-9377 
 203  RNA_NN1 = { 
 204      'init': (0, -10.8), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 205      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 206      'sym': (0, -1.4), 
 207      'AA/TT': (-6.6, -18.4), 'AT/TA': (-5.7, -15.5), 'TA/AT': (-8.1, -22.6), 
 208      'CA/GT': (-10.5, -27.8), 'GT/CA': (-10.2, -26.2), 'CT/GA': (-7.6, -19.2), 
 209      'GA/CT': (-13.3, -35.5), 'CG/GC': (-8.0, -19.4), 'GC/CG': (-14.2, -34.9), 
 210      'GG/CC': (-12.2, -29.7)} 
 211   
 212  # Xia et al (1998), Biochemistry 37: 14719-14735 
 213  RNA_NN2 = { 
 214      'init': (3.61, -1.5), 'init_A/T': (3.72, 10.5), 'init_G/C': (0, 0), 
 215      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 216      'sym': (0, -1.4), 
 217      'AA/TT': (-6.82, -19.0), 'AT/TA': (-9.38, -26.7), 'TA/AT': (-7.69, -20.5), 
 218      'CA/GT': (-10.44, -26.9), 'GT/CA': (-11.40, -29.5), 
 219      'CT/GA': (-10.48, -27.1), 'GA/CT': (-12.44, -32.5), 
 220      'CG/GC': (-10.64, -26.7), 'GC/CG': (-14.88, -36.9), 
 221      'GG/CC': (-13.39, -32.7)} 
 222   
 223  # Chen et al. (2012), Biochemistry 51: 3508-3522 
 224  RNA_NN3 = { 
 225      'init': (6.40, 6.99), 'init_A/T': (3.85, 11.04), 'init_G/C': (0, 0), 
 226      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 227      'sym': (0, -1.4), 
 228      'AA/TT': (-7.09, -19.8), 'AT/TA': (-9.11, -25.8), 'TA/AT': (-8.50, -22.9), 
 229      'CA/GT': (-11.03, -28.8), 'GT/CA': (-11.98, -31.3), 
 230      'CT/GA': (-10.90, -28.5), 'GA/CT': (-13.21, -34.9), 
 231      'CG/GC': (-10.88, -27.4), 'GC/CG': (-16.04, -40.6), 
 232      'GG/CC': (-14.18, -35.0), 'GT/TG': (-13.83, -46.9), 
 233      'GG/TT': (-17.82, -56.7), 'AG/TT': (-3.96, -11.6), 
 234      'TG/AT': (-0.96, -1.8), 'TT/AG': (-10.38, -31.8), 'TG/GT': (-12.64, -38.9), 
 235      'AT/TG': (-7.39, -21.0), 'CG/GT': (-5.56, -13.9), 'CT/GG': (-9.44, -24.7), 
 236      'GG/CT': (-7.03, -16.8), 'GT/CG': (-11.09, -28.8)} 
 237   
 238  # RNA/DNA 
 239  # Sugimoto et al. (1995), Biochemistry 34: 11211-11216 
 240  R_DNA_NN1 = { 
 241      'init': (1.9, -3.9), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 
 242      'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 'init_5T/A': (0, 0), 
 243      'sym': (0, 0), 
 244      'AA': (-11.5, -36.4), 'AC': (-7.8, -21.6), 'AG': (-7.0, -19.7), 
 245      'AT': (-8.3, -23.9), 'CA': (-10.4, -28.4), 'CC': (-12.8, -31.9), 
 246      'CG': (-16.3, -47.1), 'CT': (-9.1, -23.5), 'GA': (-8.6, -22.9), 
 247      'GC': (-8.0, -17.1), 'GG': (-9.3, -23.2), 'GT': (-5.9, -12.3), 
 248      'TA': (-7.8, -23.2), 'TC': (-5.5, -13.5), 'TG': (-9.0, -26.1), 
 249      'TT': (-7.8, -21.9)} 
 250   
 251  # Internal mismatch and inosine table (DNA) 
 252  # Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 
 253  # Allawi & SantaLucia (1998), Biochemistry 37: 9435-9444 
 254  # Allawi & SantaLucia (1998), Biochemistry 37: 2170-2179 
 255  # Allawi & SantaLucia (1998), Nucl Acids Res 26: 2694-2701 
 256  # Peyret et al. (1999), Biochemistry 38: 3468-3477 
 257  # Watkins & SantaLucia (2005), Nucl Acids Res 33: 6258-6267 
 258  DNA_IMM1 = { 
 259      'AG/TT': (1.0, 0.9), 'AT/TG': (-2.5, -8.3), 'CG/GT': (-4.1, -11.7), 
 260      'CT/GG': (-2.8, -8.0), 'GG/CT': (3.3, 10.4), 'GG/TT': (5.8, 16.3), 
 261      'GT/CG': (-4.4, -12.3), 'GT/TG': (4.1, 9.5), 'TG/AT': (-0.1, -1.7), 
 262      'TG/GT': (-1.4, -6.2), 'TT/AG': (-1.3, -5.3), 'AA/TG': (-0.6, -2.3), 
 263      'AG/TA': (-0.7, -2.3), 'CA/GG': (-0.7, -2.3), 'CG/GA': (-4.0, -13.2), 
 264      'GA/CG': (-0.6, -1.0), 'GG/CA': (0.5, 3.2), 'TA/AG': (0.7, 0.7), 
 265      'TG/AA': (3.0, 7.4), 
 266      'AC/TT': (0.7, 0.2), 'AT/TC': (-1.2, -6.2), 'CC/GT': (-0.8, -4.5), 
 267      'CT/GC': (-1.5, -6.1), 'GC/CT': (2.3, 5.4), 'GT/CC': (5.2, 13.5), 
 268      'TC/AT': (1.2, 0.7), 'TT/AC': (1.0, 0.7), 
 269      'AA/TC': (2.3, 4.6), 'AC/TA': (5.3, 14.6), 'CA/GC': (1.9, 3.7), 
 270      'CC/GA': (0.6, -0.6), 'GA/CC': (5.2, 14.2), 'GC/CA': (-0.7, -3.8), 
 271      'TA/AC': (3.4, 8.0), 'TC/AA': (7.6, 20.2), 
 272      'AA/TA': (1.2, 1.7), 'CA/GA': (-0.9, -4.2), 'GA/CA': (-2.9, -9.8), 
 273      'TA/AA': (4.7, 12.9), 'AC/TC': (0.0, -4.4), 'CC/GC': (-1.5, -7.2), 
 274      'GC/CC': (3.6, 8.9), 'TC/AC': (6.1, 16.4), 'AG/TG': (-3.1, -9.5), 
 275      'CG/GG': (-4.9, -15.3), 'GG/CG': (-6.0, -15.8), 'TG/AG': (1.6, 3.6), 
 276      'AT/TT': (-2.7, -10.8), 'CT/GT': (-5.0, -15.8), 'GT/CT': (-2.2, -8.4), 
 277      'TT/AT': (0.2, -1.5), 
 278      'AI/TC': (-8.9, -25.5), 'TI/AC': (-5.9, -17.4), 'AC/TI': (-8.8, -25.4), 
 279      'TC/AI': (-4.9, -13.9), 'CI/GC': (-5.4, -13.7), 'GI/CC': (-6.8, -19.1), 
 280      'CC/GI': (-8.3, -23.8), 'GC/CI': (-5.0, -12.6), 
 281      'AI/TA': (-8.3, -25.0), 'TI/AA': (-3.4, -11.2), 'AA/TI': (-0.7, -2.6), 
 282      'TA/AI': (-1.3, -4.6), 'CI/GA': (2.6, 8.9), 'GI/CA': (-7.8, -21.1), 
 283      'CA/GI': (-7.0, -20.0), 'GA/CI': (-7.6, -20.2), 
 284      'AI/TT': (0.49, -0.7), 'TI/AT': (-6.5, -22.0), 'AT/TI': (-5.6, -18.7), 
 285      'TT/AI': (-0.8, -4.3), 'CI/GT': (-1.0, -2.4), 'GI/CT': (-3.5, -10.6), 
 286      'CT/GI': (0.1, -1.0), 'GT/CI': (-4.3, -12.1), 
 287      'AI/TG': (-4.9, -15.8), 'TI/AG': (-1.9, -8.5), 'AG/TI': (0.1, -1.8), 
 288      'TG/AI': (1.0, 1.0), 'CI/GG': (7.1, 21.3), 'GI/CG': (-1.1, -3.2), 
 289      'CG/GI': (5.8, 16.9), 'GG/CI': (-7.6, -22.0), 
 290      'AI/TI': (-3.3, -11.9), 'TI/AI': (0.1, -2.3), 'CI/GI': (1.3, 3.0), 
 291      'GI/CI': (-0.5, -1.3)} 
 292   
 293  # Terminal mismatch table (DNA) 
 294  # SantaLucia & Peyret (2001) Patent Application WO 01/94611 
 295  DNA_TMM1 = { 
 296      'AA/TA': (-3.1, -7.8), 'TA/AA': (-2.5, -6.3), 'CA/GA': (-4.3, -10.7), 
 297      'GA/CA': (-8.0, -22.5), 
 298      'AC/TC': (-0.1, 0.5), 'TC/AC': (-0.7, -1.3), ' CC/GC': (-2.1, -5.1), 
 299      'GC/CC': (-3.9, -10.6), 
 300      'AG/TG': (-1.1, -2.1), 'TG/AG': (-1.1, -2.7), 'CG/GG': (-3.8, -9.5), 
 301      'GG/CG': (-0.7, -19.2), 
 302      'AT/TT': (-2.4, -6.5), 'TT/AT': (-3.2, -8.9), 'CT/GT': (-6.1, -16.9), 
 303      'GT/CT': (-7.4, -21.2), 
 304      'AA/TC': (-1.6, -4.0), 'AC/TA': (-1.8, -3.8), 'CA/GC': (-2.6, -5.9), 
 305      'CC/GA': (-2.7, -6.0), 'GA/CC': (-5.0, -13.8), 'GC/CA': (-3.2, -7.1), 
 306      'TA/AC': (-2.3, -5.9), 'TC/AA': (-2.7, -7.0), 
 307      'AC/TT': (-0.9, -1.7), 'AT/TC': (-2.3, -6.3), 'CC/GT': (-3.2, -8.0), 
 308      'CT/GC': (-3.9, -10.6), 'GC/CT': (-4.9, -13.5), 'GT/CC': (-3.0, -7.8), 
 309      'TC/AT': (-2.5, -6.3), 'TT/AC': (-0.7, -1.2), 
 310      'AA/TG': (-1.9, -4.4), 'AG/TA': (-2.5, -5.9), 'CA/GG': (-3.9, -9.6), 
 311      'CG/GA': (-6.0, -15.5), 'GA/CG': (-4.3, -11.1), ' GG/CA': (-4.6, -11.4), 
 312      'TA/AG': (-2.0, -4.7), 'TG/AA': (-2.4, -5.8), 
 313      'AG/TT': (-3.2, -8.7), 'AT/TG': (-3.5, -9.4), 'CG/GT': (-3.8, -9.0), 
 314      'CT/GG': (-6.6, -18.7), 'GG/CT': (-5.7, -15.9), 'GT/CG': (-5.9, -16.1), 
 315      'TG/AT': (-3.9, -10.5), 'TT/AG': (-3.6, -9.8)} 
 316   
 317  # Dangling ends table (DNA) 
 318  # Bommarito et al. (2000), Nucl Acids Res 28: 1929-1934 
 319  DNA_DE1 = { 
 320      'AA/.T': (0.2, 2.3), 'AC/.G': (-6.3, -17.1), 'AG/.C': (-3.7, -10.0), 
 321      'AT/.A': (-2.9, -7.6), 'CA/.T': (0.6, 3.3), 'CC/.G': (-4.4, -12.6), 
 322      'CG/.C': (-4.0, -11.9), 'CT/.A': (-4.1, -13.0), 'GA/.T': (-1.1, -1.6), 
 323      'GC/.G': (-5.1, -14.0), 'GG/.C': (-3.9, -10.9), 'GT/.A': (-4.2, -15.0), 
 324      'TA/.T': (-6.9, -20.0), 'TC/.G': (-4.0, -10.9), 'TG/.C': (-4.9, -13.8), 
 325      'TT/.A': (-0.2, -0.5), 
 326      '.A/AT': (-0.7, -0.8), '.C/AG': (-2.1, -3.9), '.G/AC': (-5.9, -16.5), 
 327      '.T/AA': (-0.5, -1.1), '.A/CT': (4.4, 14.9), '.C/CG': (-0.2, -0.1), 
 328      '.G/CC': (-2.6, -7.4), '.T/CA': (4.7, 14.2), '.A/GT': (-1.6, -3.6), 
 329      '.C/GG': (-3.9, -11.2), '.G/GC': (-3.2, -10.4), '.T/GA': (-4.1, -13.1), 
 330      '.A/TT': (2.9, 10.4), '.C/TG': (-4.4, -13.1), '.G/TC': (-5.2, -15.0), 
 331      '.T/TA': (-3.8, -12.6)} 
 332   
 333  # Dangling ends table (RNA) 
 334  # Turner & Mathews (2010), Nucl Acids Res 38: D280-D282 
 335  RNA_DE1 = { 
 336      'AA/T.': (-4.9, -13.2), 'AC/T.': (-0.9, -1.3), 'AG/T.': (-5.5, -15.1), 
 337      'AT/T.': (-2.3, -5.5), 
 338      'CA/G.': (-9.0, -23.5), 'CC/G.': (-4.1, -10.6), 'CG/G.': (-8.6, -22.2), 
 339      'CT/G.': (-7.5, -20.31), 
 340      'GA/C.': (-7.4, -20.3), 'GC/C.': (-2.8, -7.7), 'GG/C.': (-6.4, -16.4), 
 341      'GT/C.': (-3.6, -9.7), 
 342      'GA/T.': (-4.9, -13.2), 'GC/T.': (-0.9, -1.3), 'GG/T.': (-5.5, -15.1), 
 343      'GT/T.': (-2.3, -5.5), 
 344      'TA/A.': (-5.7, -16.1), 'TC/A.': (-0.7, -1.9), 'TG/A.': (-5.8, -16.4), 
 345      'TT/A.': (-2.2, -6.8), 
 346      'TA/G.': (-5.7, -16.1), 'TC/G.': (-0.7, -1.9), 'TG/G.': (-5.8, -16.4), 
 347      'TT/G.': (-2.2, -6.8), 
 348      'A./TA': (-0.5, -0.6), 'A./TC': (6.9, 22.6), 'A./TG': (0.6, 2.6), 
 349      'A./TT': (0.6, 2.6), 
 350      'C./GA': (-1.6, -4.5), 'C./GC': (0.7, 3.2), 'C./GG': (-4.6, -14.8), 
 351      'C./GT': (-0.4, -1.3), 
 352      'G./CA': (-2.4, -6.1), 'G./CC': (3.3, 11.6), 'G./CG': (0.8, 3.2), 
 353      'G./CT': (-1.4, -4.2), 
 354      'G./TA': (-0.5, -0.6), 'G./TC': (6.9, 22.6), 'G./TG': (0.6, 2.6), 
 355      'G./TT': (0.6, 2.6), 
 356      'T./AA': (1.6, 6.1), 'T./AC': (2.2, 8.1), 'T./AG': (0.7, 3.5), 
 357      'T./AT': (3.1, 10.6), 
 358      'T./GA': (1.6, 6.1), 'T./GC': (2.2, 8.1), 'T./GG': (0.7, 3.5), 
 359      'T./GT': (3.1, 10.6)} 
 360   
 361   
362 -def make_table(oldtable=None, values=None):
363 """Return a table with thermodynamic parameters (as dictionary). 364 365 Parameters: 366 oldtable: An existing dictionary with thermodynamic parameters. 367 values: A dictionary with new or updated values. 368 369 E.g., to replace the initiation parameters in the Sugimoto '96 dataset with 370 the initiation parameters from Allawi & SantaLucia '97: 371 372 >>> from Bio.SeqUtils.MeltingTemp import make_table, DNA_NN2 373 >>> table = DNA_NN2 # Sugimoto '96 374 >>> table['init_A/T'] 375 (0, 0) 376 >>> newtable = make_table(oldtable=DNA_NN2, values={'init': (0, 0), 377 ... 'init_A/T': (2.3, 4.1), 378 ... 'init_G/C': (0.1, -2.8)}) 379 >>> print("%0.1f, %0.1f" % newtable['init_A/T']) 380 2.3, 4.1 381 382 """ 383 if oldtable is None: 384 table = {'init': (0, 0), 'init_A/T': (0, 0), 'init_G/C': (0, 0), 385 'init_oneG/C': (0, 0), 'init_allA/T': (0, 0), 386 'init_5T/A': (0, 0), 'sym': (0, 0), 'AA/TT': (0, 0), 387 'AT/TA': (0, 0), 'TA/AT': (0, 0), 'CA/GT': (0, 0), 388 'GT/CA': (0, 0), 'CT/GA': (0, 0), 'GA/CT': (0, 0), 389 'CG/GC': (0, 0), 'GC/CG': (0, 0), 'GG/CC': (0, 0)} 390 else: 391 table = oldtable.copy() 392 if values: 393 table.update(values) 394 return table
395 396
397 -def _check(seq, method):
398 """Return a sequence which fullfils the requirements of the given method (PRIVATE). 399 400 All Tm methods in this package require the sequence in uppercase format. 401 Most methods make use of the length of the sequence (directly or 402 indirectly), which can only be expressed as len(seq) if the sequence does 403 not contain whitespaces and other non-base characters. RNA sequences are 404 backtranscribed to DNA. This method is private. 405 406 Arguments: 407 seq: The sequence as given by the user (passed as string). 408 method: Tm_Wallace, Tm_GC or Tm_NN. 409 410 >>> from Bio.SeqUtils import MeltingTemp as mt 411 >>> mt._check('10 ACGTTGCAAG tccatggtac', 'Tm_NN') 412 'ACGTTGCAAGTCCATGGTAC' 413 414 """ 415 seq = ''.join(seq.split()).upper() 416 seq = str(Seq.Seq(seq).back_transcribe()) 417 if method == 'Tm_Wallace': 418 return seq 419 if method == 'Tm_GC': 420 baseset = ('A', 'B', 'C', 'D', 'G', 'H', 'I', 'K', 'M', 'N', 'R', 'S', 421 'T', 'V', 'W', 'X', 'Y') 422 if method == 'Tm_NN': 423 baseset = ('A', 'C', 'G', 'T', 'I') 424 seq = ''.join([base for base in seq if base in baseset]) 425 return seq
426 427
428 -def salt_correction(Na=0, K=0, Tris=0, Mg=0, dNTPs=0, method=1, seq=None):
429 """Calculate a term to correct Tm for salt ions. 430 431 Depending on the Tm calculation, the term will correct Tm or entropy. To 432 calculate corrected Tm values, different operations need to be applied: 433 434 - methods 1-4: Tm(new) = Tm(old) + corr 435 - method 5: deltaH(new) = deltaH(old) + corr 436 - methods 6+7: Tm(new) = 1/(1/Tm(old) + corr) 437 438 Parameters: 439 - Na, K, Tris, Mg, dNTPS: Millimolar concentration of respective ion. To have 440 a simple 'salt correction', just pass Na. If any of K, Tris, Mg and 441 dNTPS is non-zero, a 'sodium-equivalent' concentration is calculated 442 according to von Ahsen et al. (2001, Clin Chem 47: 1956-1961): 443 [Na_eq] = [Na+] + [K+] + [Tris]/2 + 120*([Mg2+] - [dNTPs])^0.5 444 If [dNTPs] >= [Mg2+]: [Na_eq] = [Na+] + [K+] + [Tris]/2 445 - method: Which method to be applied. Methods 1-4 correct Tm, method 5 446 corrects deltaS, methods 6 and 7 correct 1/Tm. The methods are: 447 448 1. 16.6 x log[Na+] 449 (Schildkraut & Lifson (1965), Biopolymers 3: 195-208) 450 2. 16.6 x log([Na+]/(1.0 + 0.7*[Na+])) 451 (Wetmur (1991), Crit Rev Biochem Mol Biol 126: 227-259) 452 3. 12.5 x log(Na+] 453 (SantaLucia et al. (1996), Biochemistry 35: 3555-3562 454 4. 11.7 x log[Na+] 455 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465 456 5. Correction for deltaS: 0.368 x (N-1) x ln[Na+] 457 (SantaLucia (1998), Proc Natl Acad Sci USA 95: 1460-1465) 458 6. (4.29(%GC)-3.95)x1e-5 x ln[Na+] + 9.40e-6 x ln[Na+]^2 459 (Owczarzy et al. (2004), Biochemistry 43: 3537-3554) 460 7. Complex formula with decision tree and 7 empirical constants. 461 Mg2+ is corrected for dNTPs binding (if present) 462 (Owczarzy et al. (2008), Biochemistry 47: 5336-5353) 463 464 Examples: 465 466 >>> from Bio.SeqUtils import MeltingTemp as mt 467 >>> print('%0.2f' % mt.salt_correction(Na=50, method=1)) 468 -21.60 469 >>> print('%0.2f' % mt.salt_correction(Na=50, method=2)) 470 -21.85 471 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, method=2)) 472 -16.45 473 >>> print('%0.2f' % mt.salt_correction(Na=100, Tris=20, Mg=1.5, 474 ... method=2)) 475 -10.99 476 477 """ 478 if method in (5, 6, 7) and not seq: 479 raise ValueError('sequence is missing (is needed to calculate ' + 480 'GC content or sequence length).') 481 if seq: 482 seq = str(seq) 483 corr = 0 484 if not method: 485 return corr 486 Mon = Na + K + Tris / 2.0 # Note: all these values are millimolar 487 mg = Mg * 1e-3 # Lowercase ions (mg, mon, dntps) are molar 488 # Na equivalent according to von Ahsen et al. (2001): 489 if sum((K, Mg, Tris, dNTPs)) > 0 and not method == 7: 490 if dNTPs < Mg: 491 # dNTPs bind Mg2+ strongly. If [dNTPs] is larger or equal than 492 # [Mg2+], free Mg2+ is considered not to be relevant. 493 Mon += 120 * math.sqrt(Mg - dNTPs) 494 mon = Mon * 1e-3 495 # Note: math.log = ln(), math.log10 = log() 496 if method in range(1, 7) and not mon: 497 raise ValueError('Total ion concentration of zero is not allowed in ' + 498 'this method.') 499 if method == 1: 500 corr = 16.6 * math.log10(mon) 501 if method == 2: 502 corr = 16.6 * math.log10((mon) / (1.0 + 0.7 * (mon))) 503 if method == 3: 504 corr = 12.5 * math.log10(mon) 505 if method == 4: 506 corr = 11.7 * math.log10(mon) 507 if method == 5: 508 corr = 0.368 * (len(seq) - 1) * math.log(mon) 509 if method == 6: 510 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * 1e-5 * math.log(mon) + \ 511 9.40e-6 * math.log(mon) ** 2 512 if method == 7: 513 a, b, c, d = 3.92, -0.911, 6.26, 1.42 514 e, f, g = -48.2, 52.5, 8.31 515 if dNTPs > 0: 516 dntps = dNTPs * 1e-3 517 ka = 3e4 # Dissociation constant for Mg:dNTP 518 # Free Mg2+ calculation: 519 mg = (-(ka * dntps - ka * mg + 1.0) + 520 math.sqrt((ka * dntps - ka * mg + 1.0) ** 2 + 4.0 * ka * mg)) / (2.0 * ka) 521 if Mon > 0: 522 R = math.sqrt(mg) / mon 523 if R < 0.22: 524 corr = (4.29 * SeqUtils.GC(seq) / 100 - 3.95) * \ 525 1e-5 * math.log(mon) + 9.40e-6 * math.log(mon) ** 2 526 return corr 527 elif R < 6.0: 528 a = 3.92 * (0.843 - 0.352 * math.sqrt(mon) * math.log(mon)) 529 d = 1.42 * (1.279 - 4.03e-3 * math.log(mon) - 530 8.03e-3 * math.log(mon) ** 2) 531 g = 8.31 * (0.486 - 0.258 * math.log(mon) + 532 5.25e-3 * math.log(mon) ** 3) 533 corr = (a + b * math.log(mg) + (SeqUtils.GC(seq) / 100) * 534 (c + d * math.log(mg)) + (1 / (2.0 * (len(seq) - 1))) * 535 (e + f * math.log(mg) + g * math.log(mg) ** 2)) * 1e-5 536 if method > 7: 537 raise ValueError('Allowed values for parameter \'method\' are 1-7.') 538 return corr
539 540
541 -def chem_correction(Tm, DMSO=0, fmd=0, DMSOfactor=0.75, fmdfactor=0.65, 542 fmdmethod=1, GC=None):
543 """Correct a given Tm for DMSO and formamide. 544 545 Please note that these corrections are +/- rough approximations. 546 547 Arguments: 548 - Tm: Melting temperature. 549 - DMSO: Percent DMSO. 550 - fmd: Formamide concentration in %(fmdmethod=1) or in molar (fmdmethod=2). 551 - DMSOfactor: How much should Tm decreases per percent DMSO. Default=0.65 552 (von Ahsen et al. 2001). Other published values are 0.5, 0.6 and 0.675. 553 - fmdfactor: How much should Tm decrease per percent formamide. Default=0.65. 554 Several papers report factors between 0.6 and 0.72. 555 - fmdmethod: 556 557 1. Tm = Tm - factor(%formamide) (Default) 558 2. Tm = Tm + (0.453(f(GC)) - 2.88) x [formamide] 559 560 Here f(GC) is fraction of GC. 561 Note (again) that in fmdmethod=1 formamide concentration is given in %, 562 while in fmdmethod=2 it is given in molar. 563 - GC: GC content in percent. 564 565 Examples: 566 567 >>> from Bio.SeqUtils import MeltingTemp as mt 568 >>> mt.chem_correction(70) 569 70 570 >>> print('%0.2f' % mt.chem_correction(70, DMSO=3)) 571 67.75 572 >>> print('%0.2f' % mt.chem_correction(70, fmd=5)) 573 66.75 574 >>> print('%0.2f' % mt.chem_correction(70, fmdmethod=2, fmd=1.25, 575 ... GC=50)) 576 66.68 577 578 """ 579 if DMSO: 580 Tm -= DMSOfactor * DMSO 581 if fmd: 582 # McConaughy et al. (1969), Biochemistry 8: 3289-3295 583 if fmdmethod == 1: 584 Tm -= fmdfactor * fmd # Note: fmd is percent 585 # Blake & Delcourt (1996), Nucl Acids Res 11: 2095-2103 586 if fmdmethod == 2: 587 if GC is None or GC < 0: 588 raise ValueError('\'GC\' is missing or negative') 589 Tm += (0.453 * (GC / 100.0) - 2.88) * fmd # Note: fmd is molar 590 if fmdmethod not in (1, 2): 591 raise ValueError('\'fmdmethod\' must be 1 or 2') 592 return Tm
593 594
595 -def Tm_Wallace(seq, check=True, strict=True):
596 """Calculate and return the Tm using the 'Wallace rule'. 597 598 Tm = 4 degC * (G + C) + 2 degC * (A+T) 599 600 The Wallace rule (Thein & Wallace 1986, in Human genetic diseases: a 601 practical approach, 33-50) is often used as rule of thumb for approximate 602 Tm calculations for primers of 14 to 20 nt length. 603 604 Non-DNA characters (e.g., E, F, J, !, 1, etc) are ignored by this method. 605 606 Examples: 607 608 >>> from Bio.SeqUtils import MeltingTemp as mt 609 >>> mt.Tm_Wallace('ACGTTGCAATGCCGTA') 610 48.0 611 >>> mt.Tm_Wallace('ACGT TGCA ATGC CGTA') 612 48.0 613 >>> mt.Tm_Wallace('1ACGT2TGCA3ATGC4CGTA') 614 48.0 615 616 """ 617 seq = str(seq) 618 if check: 619 seq = _check(seq, 'Tm_Wallace') 620 621 Tm = 2 * (sum(map(seq.count, ('A', 'T', 'W')))) + \ 622 4 * (sum(map(seq.count, ('C', 'G', 'S')))) 623 624 # Intermediate values for ambiguous positions: 625 tmp = 3 * (sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y')))) + \ 626 10 / 3.0 * (sum(map(seq.count, ('B', 'V')))) + \ 627 8 / 3.0 * (sum(map(seq.count, ('D', 'H')))) 628 if strict and tmp: 629 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 630 'allowed when strict=True') 631 else: 632 Tm += tmp 633 return Tm
634 635
636 -def Tm_GC(seq, check=True, strict=True, valueset=7, userset=None, Na=50, K=0, 637 Tris=0, Mg=0, dNTPs=0, saltcorr=0, mismatch=True):
638 """Return the Tm using empirical formulas based on GC content. 639 640 General format: Tm = A + B(%GC) - C/N + salt correction - D(%mismatch) 641 642 A, B, C, D: empirical constants, N: primer length 643 D (amount of decrease in Tm per % mismatch) is often 1, but sometimes other 644 values have been used (0.6-1.5). Use 'X' to indicate the mismatch position 645 in the sequence. Note that this mismatch correction is a rough estimate. 646 647 >>> from Bio.SeqUtils import MeltingTemp as mt 648 >>> print("%0.2f" % mt.Tm_GC('CTGCTGATXGCACGAGGTTATGG', valueset=2)) 649 69.20 650 651 Arguments: 652 - valueset: A few often cited variants are included: 653 654 1. Tm = 69.3 + 0.41(%GC) - 650/N 655 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Chester & Marshak 1993), 656 Anal Biochem 209: 284-290) 657 2. Tm = 81.5 + 0.41(%GC) - 675/N - %mismatch 658 'QuikChange' formula. Recommended (by the manufacturer) for the design 659 of primers for QuikChange mutagenesis. 660 3. Tm = 81.5 + 0.41(%GC) - 675/N + 16.6 x log[Na+] 661 (Marmur & Doty 1962, J Mol Biol 5: 109-118; Schildkraut & Lifson 1965, 662 Biopolymers 3: 195-208) 663 4. Tm = 81.5 + 0.41(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) - %mismatch 664 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). This is the 665 standard formula in approximative mode of MELTING 4.3. 666 5. Tm = 78 + 0.7(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) - %mismatch 667 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA. 668 6. Tm = 67 + 0.8(%GC) - 500/N + 16.6 x log([Na+]/(1.0 + 0.7 x [Na+])) - %mismatch 669 (Wetmur 1991, Crit Rev Biochem Mol Biol 126: 227-259). For RNA/DNA 670 hybrids. 671 7. Tm = 81.5 + 0.41(%GC) - 600/N + 16.6 x log[Na+] 672 Used by Primer3Plus to calculate the product Tm. Default set. 673 8. Tm = 77.1 + 0.41(%GC) - 528/N + 11.7 x log[Na+] 674 (von Ahsen et al. 2001, Clin Chem 47: 1956-1961). Recommended 'as a 675 tradeoff between accuracy and ease of use'. 676 677 - userset: Tuple of four values for A, B, C, and D. Usersets override 678 valuesets. 679 - Na, K, Tris, Mg, dNTPs: Concentration of the respective ions [mM]. If any 680 of K, Tris, Mg and dNTPS is non-zero, a 'sodium-equivalent' 681 concentration is calculated and used for salt correction (von Ahsen et 682 al., 2001). 683 - saltcorr: Type of salt correction (see method salt_correction). Default=5. 684 0 or None means no salt correction. 685 - mismatch: If 'True' (default) every 'X' in the sequence is counted as 686 mismatch. 687 688 """ 689 if saltcorr == 5: 690 raise ValueError('salt-correction method 5 not applicable' + 691 'to Tm_GC') 692 seq = str(seq) 693 if check: 694 seq = _check(seq, 'Tm_GC') 695 pGC = SeqUtils.GC(seq) 696 # Ambiguous bases: add 0.5, 0.67 or 0.33% depending on G+C probability: 697 tmp = sum(map(seq.count, ('K', 'M', 'N', 'R', 'Y'))) * 50.0 / len(seq) + \ 698 sum(map(seq.count, ('B', 'V'))) * 66.67 / len(seq) + \ 699 sum(map(seq.count, ('D', 'H'))) * 33.33 / len(seq) 700 if strict and tmp: 701 raise ValueError('ambiguous bases B, D, H, K, M, N, R, V, Y not ' + 702 'allowed when strict=True') 703 else: 704 pGC += tmp 705 if userset: 706 A, B, C, D = userset 707 else: 708 if valueset == 1: 709 A, B, C, D = (69.3, 0.41, 650, 1) 710 saltcorr = 0 711 if valueset == 2: 712 A, B, C, D = (81.5, 0.41, 675, 1) 713 saltcorr = 0 714 if valueset == 3: 715 A, B, C, D = (81.5, 0.41, 675, 1) 716 saltcorr = 2 717 if valueset == 4: 718 A, B, C, D = (81.5, 0.41, 500, 1) 719 saltcorr = 3 720 if valueset == 5: 721 A, B, C, D = (78.0, 0.7, 500, 1) 722 saltcorr = 3 723 if valueset == 6: 724 A, B, C, D = (67.0, 0.8, 500, 1) 725 saltcorr = 3 726 if valueset == 7: 727 A, B, C, D = (81.5, 0.41, 600, 1) 728 saltcorr = 2 729 if valueset == 8: 730 A, B, C, D = (77.1, 0.41, 528, 1) 731 saltcorr = 4 732 if valueset > 8: 733 raise ValueError('allowed values for parameter \'valueset\' are 0-8.') 734 735 Tm = A + B * pGC - C / (len(seq) * 1.0) 736 if saltcorr: 737 Tm += salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, dNTPs=dNTPs, 738 seq=seq, method=saltcorr) 739 if mismatch: 740 Tm -= D * (seq.count('X') * 100.0 / len(seq)) 741 return Tm
742 743
744 -def Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=DNA_NN3, 745 tmm_table=DNA_TMM1, imm_table=DNA_IMM1, de_table=DNA_DE1, 746 dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, 747 dNTPs=0, saltcorr=5):
748 """Return the Tm using nearest neighbor thermodynamics. 749 750 Arguments: 751 - seq: The primer/probe sequence as string or Biopython sequence object. For 752 RNA/DNA hybridizations seq must be the RNA sequence. 753 - c_seq: Complementary sequence. The sequence of the template/target in 754 3'->5' direction. c_seq is necessary for mismatch correction and 755 dangling-ends correction. Both corrections will automatically be 756 applied if mismatches or dangling ends are present. Default=None. 757 - shift: Shift of the primer/probe sequence on the template/target sequence, 758 e.g.:: 759 760 shift=0 shift=1 shift= -1 761 Primer (seq): 5' ATGC... 5' ATGC... 5' ATGC... 762 Template (c_seq): 3' TACG... 3' CTACG... 3' ACG... 763 764 The shift parameter is necessary to align seq and c_seq if they have 765 different lengths or if they should have dangling ends. Default=0 766 - table: Thermodynamic NN values, eight tables are implemented: 767 For DNA/DNA hybridizations: 768 769 - DNA_NN1: values from Breslauer et al. (1986) 770 - DNA_NN2: values from Sugimoto et al. (1996) 771 - DNA_NN3: values from Allawi & SantaLucia (1997) (default) 772 - DNA_NN4: values from SantaLucia & Hicks (2004) 773 774 For RNA/RNA hybridizations: 775 776 - RNA_NN1: values from Freier et al. (1986) 777 - RNA_NN2: values from Xia et al. (1998) 778 - RNA_NN3: valuse from Chen et al. (2012) 779 780 For RNA/DNA hybridizations: 781 782 - R_DNA_NN1: values from Sugimoto et al. (1995) 783 784 Use the module's maketable method to make a new table or to update one 785 one of the implemented tables. 786 - tmm_table: Thermodynamic values for terminal mismatches. Default: DNA_TMM1 787 (SantaLucia & Peyret, 2001) 788 - imm_table: Thermodynamic values for internal mismatches, may include 789 insosine mismatches. Default: DNA_IMM1 (Allawi & SantaLucia, 1997-1998; 790 Peyret et al., 1999; Watkins & SantaLucia, 2005) 791 - de_table: Thermodynamic values for dangling ends: 792 793 - DNA_DE1: for DNA. Values from Bommarito et al. (2000). Default 794 - RNA_DE1: for RNA. Values from Turner & Mathews (2010) 795 796 - dnac1: Concentration of the higher concentrated strand [nM]. Typically this 797 will be the primer (for PCR) or the probe. Default=25. 798 - dnac2: Concentration of the lower concentrated strand [nM]. In PCR this is 799 the template strand which concentration is typically very low and may 800 be ignored (dnac2=0). In oligo/oligo hybridization experiments, dnac1 801 equals dnac1. Default=25. 802 MELTING and Primer3Plus use k = [Oligo(Total)]/4 by default. To mimic 803 this behaviour, you have to divide [Oligo(Total)] by 2 and assign this 804 concentration to dnac1 and dnac2. E.g., Total oligo concentration of 805 50 nM in Primer3Plus means dnac1=25, dnac2=25. 806 - selfcomp: Is the sequence self-complementary? Default=False. If 'True' the 807 primer is thought binding to itself, thus dnac2 is not considered. 808 - Na, K, Tris, Mg, dNTPs: See method 'Tm_GC' for details. Defaults: Na=50, 809 K=0, Tris=0, Mg=0, dNTPs=0. 810 - saltcorr: See method 'Tm_GC'. Default=5. 0 means no salt correction. 811 812 """ 813 seq = str(seq) 814 if not c_seq: 815 # c_seq must be provided by user if dangling ends or mismatches should 816 # be taken into account. Otherwise take perfect complement. 817 c_seq = Seq.Seq(seq).complement() 818 c_seq = str(c_seq) 819 if check: 820 seq = _check(seq, 'Tm_NN') 821 c_seq = _check(c_seq, 'Tm_NN') 822 tmpseq = seq 823 tmp_cseq = c_seq 824 deltaH = 0 825 deltaS = 0 826 dH = 0 # Names for indexes 827 dS = 1 # 0 and 1 828 829 # Dangling ends? 830 if shift or len(seq) != len(c_seq): 831 # Align both sequences using the shift parameter 832 if shift > 0: 833 tmpseq = '.' * shift + seq 834 if shift < 0: 835 tmp_cseq = '.' * abs(shift) + c_seq 836 if len(tmp_cseq) > len(tmpseq): 837 tmpseq += (len(tmp_cseq) - len(tmpseq)) * '.' 838 if len(tmp_cseq) < len(tmpseq): 839 tmp_cseq += (len(tmpseq) - len(tmp_cseq)) * '.' 840 # Remove 'over-dangling' ends 841 while tmpseq.startswith('..') or tmp_cseq.startswith('..'): 842 tmpseq = tmpseq[1:] 843 tmp_cseq = tmp_cseq[1:] 844 while tmpseq.endswith('..') or tmp_cseq.endswith('..'): 845 tmpseq = tmpseq[:-1] 846 tmp_cseq = tmp_cseq[:-1] 847 # Now for the dangling ends 848 if tmpseq.startswith('.') or tmp_cseq.startswith('.'): 849 left_de = tmpseq[:2] + '/' + tmp_cseq[:2] 850 deltaH += de_table[left_de][dH] 851 deltaS += de_table[left_de][dS] 852 tmpseq = tmpseq[1:] 853 tmp_cseq = tmp_cseq[1:] 854 if tmpseq.endswith('.') or tmp_cseq.endswith('.'): 855 right_de = tmp_cseq[-2:][::-1] + '/' + tmpseq[-2:][::-1] 856 deltaH += de_table[right_de][dH] 857 deltaS += de_table[right_de][dS] 858 tmpseq = tmpseq[:-1] 859 tmp_cseq = tmp_cseq[:-1] 860 861 # Now for terminal mismatches 862 left_tmm = tmp_cseq[:2][::-1] + '/' + tmpseq[:2][::-1] 863 if left_tmm in tmm_table: 864 deltaH += tmm_table[left_tmm][dH] 865 deltaS += tmm_table[left_tmm][dS] 866 tmpseq = tmpseq[1:] 867 tmp_cseq = tmp_cseq[1:] 868 right_tmm = tmpseq[-2:] + '/' + tmp_cseq[-2:] 869 if right_tmm in tmm_table: 870 deltaH += tmm_table[right_tmm][dH] 871 deltaS += tmm_table[right_tmm][dS] 872 tmpseq = tmpseq[:-1] 873 tmp_cseq = tmp_cseq[:-1] 874 875 # Now everything 'unusual' at the ends is handled and removed and we can 876 # look at the initiation. 877 # One or several of the following initiation types may apply: 878 879 # Type: General initiation value 880 deltaH += nn_table['init'][dH] 881 deltaS += nn_table['init'][dS] 882 883 # Type: Duplex with no (allA/T) or at least one (oneG/C) GC pair 884 if SeqUtils.GC(seq) == 0: 885 deltaH += nn_table['init_allA/T'][dH] 886 deltaS += nn_table['init_allA/T'][dS] 887 else: 888 deltaH += nn_table['init_oneG/C'][dH] 889 deltaS += nn_table['init_oneG/C'][dS] 890 891 # Type: Penalty if 5' end is T 892 if seq.startswith('T'): 893 deltaH += nn_table['init_5T/A'][dH] 894 deltaS += nn_table['init_5T/A'][dS] 895 if seq.endswith('A'): 896 deltaH += nn_table['init_5T/A'][dH] 897 deltaS += nn_table['init_5T/A'][dS] 898 899 # Type: Different values for G/C or A/T terminal basepairs 900 ends = seq[0] + seq[-1] 901 AT = ends.count('A') + ends.count('T') 902 GC = ends.count('G') + ends.count('C') 903 deltaH += nn_table['init_A/T'][dH] * AT 904 deltaS += nn_table['init_A/T'][dS] * AT 905 deltaH += nn_table['init_G/C'][dH] * GC 906 deltaS += nn_table['init_G/C'][dS] * GC 907 908 # Finally, the 'zipping' 909 for basenumber in range(len(tmpseq) - 1): 910 neighbors = tmpseq[basenumber:basenumber + 2] + '/' + \ 911 tmp_cseq[basenumber:basenumber + 2] 912 if neighbors in imm_table: 913 deltaH += imm_table[neighbors][dH] 914 deltaS += imm_table[neighbors][dS] 915 elif neighbors[::-1] in imm_table: 916 deltaH += imm_table[neighbors[::-1]][dH] 917 deltaS += imm_table[neighbors[::-1]][dS] 918 elif neighbors in nn_table: 919 deltaH += nn_table[neighbors][dH] 920 deltaS += nn_table[neighbors][dS] 921 elif neighbors[::-1] in nn_table: 922 deltaH += nn_table[neighbors[::-1]][dH] 923 deltaS += nn_table[neighbors[::-1]][dS] 924 else: 925 # We haven't found the key... 926 if strict: 927 raise ValueError('no data for neighbors \'' + neighbors + '\'') 928 else: 929 warnings.warn('no data for neighbors \'' + neighbors + 930 '\'. Calculation will be wrong', 931 BiopythonWarning) 932 933 k = (dnac1 - (dnac2 / 2.0)) * 1e-9 934 if selfcomp: 935 k = dnac1 * 1e-9 936 deltaH += nn_table['sym'][dH] 937 deltaS += nn_table['sym'][dS] 938 R = 1.987 # universal gas constant in Cal/degrees C*Mol 939 if saltcorr: 940 corr = salt_correction(Na=Na, K=K, Tris=Tris, Mg=Mg, dNTPs=dNTPs, 941 method=saltcorr, seq=seq) 942 if saltcorr == 5: 943 deltaS += corr 944 Tm = (1000 * deltaH) / (deltaS + (R * (math.log(k)))) - 273.15 945 if saltcorr in (1, 2, 3, 4): 946 Tm += corr 947 if saltcorr in (6, 7): 948 # Tm = 1/(1/Tm + corr) 949 Tm = (1 / (1 / (Tm + 273.15) + corr) - 273.15) 950 951 return Tm
952 953 __docformat__ = "restructuredtext en" 954 955
956 -def Tm_staluc(s, dnac=50, saltc=50, rna=0):
957 """Returns DNA/DNA tm using nearest neighbor thermodynamics (OBSOLETE). 958 959 This method may be depreceated in the future. Use Tm_NN instead. Tm_NN 960 with default values gives the same result as Tm_staluc. 961 962 s is the sequence as string or Seq object 963 dnac is DNA concentration [nM] 964 saltc is salt concentration [mM]. 965 rna=0 is for DNA/DNA (default), use 1 for RNA/RNA hybridisation. 966 967 For DNA/DNA, see Allawi & SantaLucia (1997), Biochemistry 36: 10581-10594 968 For RNA/RNA, see Xia et al (1998), Biochemistry 37: 14719-14735 969 970 Example: 971 972 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA')) 973 59.87 974 >>> print("%0.2f" % Tm_staluc('CAGTCAGTACGTACGTGTACTGCCGTA', rna=True)) 975 77.90 976 977 You can also use a Seq object instead of a string, 978 979 >>> from Bio.Seq import Seq 980 >>> from Bio.Alphabet import generic_nucleotide 981 >>> s = Seq('CAGTCAGTACGTACGTGTACTGCCGTA', generic_nucleotide) 982 >>> print("%0.2f" % Tm_staluc(s)) 983 59.87 984 >>> print("%0.2f" % Tm_staluc(s, rna=True)) 985 77.90 986 987 """ 988 # Original method was by Sebastian Bassi <sbassi@genesdigitales.com>. It is 989 # now superseded by Tm_NN. 990 991 warnings.warn('Tm_staluc may be depreciated in the future. Use Tm_NN ' + 992 'instead.', PendingDeprecationWarning) 993 if not rna: 994 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc) 995 elif rna == 1: 996 return Tm_NN(s, dnac1=dnac / 2.0, dnac2=dnac / 2.0, Na=saltc, 997 nn_table=RNA_NN2) 998 else: 999 raise ValueError("rna=%r not supported" % rna)
1000 1001
1002 -def _test():
1003 """Run the module's doctests (PRIVATE).""" 1004 import doctest 1005 print("Running doctests...") 1006 doctest.testmod() 1007 print("Done")
1008 1009 if __name__ == "__main__": 1010 _test() 1011