Dynamic evolution of venom proteins in squamate reptiles. Nicholas R. Casewell, Gavin A. Huttley and Wolfgang Wüster

Dynamic evolution of venom proteins in squamate reptiles Nicholas R. Casewell, Gavin A. Huttley and Wolfgang Wüster Supplementary Information Supplementary Figure S1. Phylogeny of the Toxicofera and evolution of the venom system under the single early origin (SEO) hypothesis of Fry et al. [10]. Simplified from Fry et al. [10] and Vidal and Hedges [52]. Key: LJ = lower jaw, UJ = upper jaw, red = venom, blue = non-venom.

Supplementary Figure S2. Bayesian DNA gene tree of the crotamine toxin family. Multiple support values are given at key nodes in the following order: Bayesian DNA, maximum likelihood (ML) DNA, Bayesian amino acid (aa), ML aa. x indicates no support for the node in that analysis. Tips of the tree coloured in red indicate Toxicoferan sequences sourced from the venom gland and blue those sourced from non-venom gland tissues ( physiological non-toxins). Pie charts represent the bpp of ancestral state reconstructions at that node: red = venom, blue = non-venom. The numbered codes for each sequence presented in the genetree represent GenBank GI accession numbers.

Supplementary Figure S3. Bayesian DNA gene tree of the CVF toxin family. Multiple support values are given at key nodes in the following order: Bayesian DNA, ML DNA, Bayesian aa, ML aa. x indicates no support for the node in that analysis. Tips of the tree coloured in red indicate Toxicoferan sequences sourced from the venom gland and blue those sourced from non-venom gland tissues ( physiological non-toxins). Pie charts represent the bpp of ancestral state reconstructions at that node: red = venom, blue = non-venom. The numbered codes for each sequence presented in the genetree represent GenBank GI accession numbers.

Supplementary Figure S4. Bayesian DNA gene tree of the hyaluronidase toxin family. Multiple support values are given at key nodes in the following order: Bayesian DNA, ML DNA, Bayesian aa, ML aa. x indicates no support for the node in that analysis. Tips of the tree coloured in red indicate Toxicoferan sequences sourced from the venom gland and blue those sourced from nonvenom gland tissues ( physiological non-toxins). Pie charts represent the bpp of ancestral state reconstructions at that node: red = venom, blue = non-venom. The numbered codes for each sequence presented in the genetree represent GenBank GI accession numbers.

Supplementary Figure S5. Bayesian DNA gene tree of the NGF toxin family. Multiple support values are given at key nodes in the following order: Bayesian DNA, ML DNA, Bayesian aa, ML aa. x indicates no support for the node in that analysis. Tips of the tree coloured in red indicate Toxicoferan sequences sourced from the venom gland and blue those sourced from non-venom gland tissues ( physiological non-toxins). Pie charts represent the bpp of ancestral state reconstructions at that node: red = venom, blue = non-venom. The numbered codes for each sequence presented in the genetree represent GenBank GI accession numbers.

Supplementary Figure S6. Bayesian DNA gene tree of the veficolin toxin family. Multiple support values are given at key nodes in the following order: Bayesian DNA, ML DNA, Bayesian aa, ML aa. x indicates no support for the node in that analysis. Tips of the tree coloured in red indicate Toxicoferan sequences sourced from the venom gland and blue those sourced from non-venom gland tissues ( physiological non-toxins). Pie charts represent the bpp of ancestral state reconstructions at that node: red = venom, blue = non-venom. The numbered codes for each sequence presented in the genetree represent GenBank GI accession numbers.

Supplementary Table S1. Comparison of Bayes factors generated by Bayesian analysis of codon partitioned and unpartitioned DNA datasets. Codon Unpartitioned: Unpartitioned: Gene family partitioned (H 0 ) mixed model (H A ) model selected (H B ) Bayes factors 2(H 0 H A ) 2(H 0 H B ) Crotamine -2540.18-2614.77-2544.88 149.18 ** 9.40 * CVF -21616.34-22825.84-21617.95 2419.00 *** 3.22 * Cystatin -8041.55-8244.54-8049.09 405.97 *** 15.08 * Hyaluronidase -16704.83-17827.16 N/A 2244.65 *** - Lectin -12743.79-13229.68-12646.24 971.78 *** 4.90 * Kallikrein -25130.34-26321.00-25138.82 2381.32 *** 16.96 * Natriuretic -21010.74-21518.12-21012.66 1014.75 *** 3.84 * NGF -13985.25-14620.33 N/A 1270.17 *** - Veficolin -11122.39-11735.18-11124.01 1225.57 *** 3.24 * Codon partitions utilised are displayed in Supplementary Table S2. Unpartitioned datasets were analysed with: (i) mixed models of sequence evolution (H A ) and (ii) models selected by MrModelTest v2.3 (H B ) [15]. The GTR + I + Γ models was selected by MrModelTest for each gene family except crotamine where GTR + Γ was selected. The marginal log-likelihoods produced by the codon partitioned datasets (H 0 ) were compared with unpartitioned datasets to generate Bayes factors (2logB10) = 2(H 0 H A or H B ). Interpretation of the differences between Bayes factors is taken from Kass and Raftery [40] - *** very strong, ** strong, * positive, NE little to no evidence. In all gene families Bayes factors advocate the use of codon partitioned datasets implementing multiple models of sequence evolution. N/A indicates where codon partitioned models are identical at each position and also the same as the unpartitioned model.

Supplementary Table S2. Estimated models of sequence evolution for DNA and amino acid datasets determined by MrModelTest and ModelGenerator. Data Type Dataset Codon Position Model DNA Crotamine CVF Cystatin Hyaluronidase Kallikrein Lectin Natriuretic NGF Veficolin 1 SYM + Γ 3 HKY + Γ 1 GTR + I + Γ 3 GTR + Γ 1 HKY + Γ 3 HKY + Γ 1 GTR + I + Γ 3 GTR + I + Γ 1 GTR + I + Γ 3 HKY + I + Γ 1 GTR + I + Γ 3 GTR + Γ 1 GTR + Γ 2 GTR + Γ 3 GTR + Γ 1 GTR + I + Γ 3 GTR + I + Γ 1 GTR + Γ 3 GTR + Γ Amino acid Crotamine - WAG + Γ CVF - WAG + Ι + Γ Cystatin - WAG + Γ Hyaluronidase - WAG + Ι + Γ Kallikrein - WAG + Γ Lectin - WAG + Ι + Γ Natriuretic - WAG + Γ NGF - WAG + Ι + Γ Veficolin - WAG + Γ

Supplementary Table S3. Test statistics for positive selection analyses undertaken on non-toxin branches observed in the gene trees Gene family Non-toxin branch Test statistic (lnl) Free parameters Null Alt Null Alt LR Significance cut-off P Crotamine CVF FS3E17002I7SX contig79280 N. naja 213372-1152.425-1152.367 74 77 0.117 0.050 0.990-3789.094-3789.094 74 77 0.000 1.000-3743.629-3742.759 74 77 1.739 0.017 0.628 P. bivittatus contig25850-4194.289-4194.289 74 77 0.000 1.000 Cystatin contig75753 FT7MHCY04JXDK contig09116 contig56006-1090.813-1089.864 72 75 1.896 0.594-1360.246-1356.857 72 75 6.779 0.079-1253.044-1249.724 72 75 6.640 0.010 0.084-1331.481-1329.206 72 75 4.548 0.208 contig66504-1319.006-1317.102 72 75 3.808 0.283 Hyaluronidase Kallikrein Lectin Natriuretic NGF contig26789 contig77142 P. bivittatus contig02483 contig03054 contig10774 B. jararaca 16124242 P. bivittatus contig15144 FT7MHCY03HENJ -5597.587-5595.536 82 85 4.101 0.050 0.251-6004.159-6003.059 84 87 2.199 0.532-5974.993-5972.765 84 87 4.455 0.025 0.216-3740.909-3736.461 81 84 8.895 0.031-3813.828-3806.575 81 84 14.50 6 0.050 0.002-4243.996-4243.163 84 87 1.666 0.050 0.645-1872.165-1872.165 78 81 0.000 1.000-1948.934-1948.934 78 81 0.000 0.025 1.000 Veficolin contig79448 contig06062-2382.297 2380.974 70 73 2.645 0.450-2255.387-2253.109 70 73 4.556 0.025 Positive selection tests were calculated for non-toxin branches observed nested within toxin clades of the gene trees. The probability (P) of the test is calculated using likelihood ratio (LR) tests of the 0.207

test statistic generated from null and alternative (alt) models and three degrees of freedom. Significance of the test is indicated by bold and underline font where the P-value of the test falls below the sequential Bonferroni correction significance cut-off.

Supplementary Table S4. Additional non-venom gland sequences previously identified from snake physiological tissues that were incorporated into the toxin family datasets. Dataset Species Accession number Source of sequence Crotamine CVF Natriuretic Crotalus durissus Naja naja Bothrops jararaca Pseudonaja textilis 14915765 Liver 213372 Liver 16124242 Brain 157169075 Heart

Supplementary Table S5. DNA sequences excluded from phylogenetic analyses due to significant (P<0.05) evidence of recombination by the RDP, GENECONV or Bootscan methods. Dataset Species Accession number Source of sequence Crotamine Lectin Kallikrein Crotalus durissus Enhydris polylepis Enhydris polylepis Crotalus durissus 14915765 Liver 156485255 Venom gland 156485259 Venom gland 76365437 Venom gland