Supplementary Information

Size: px
Start display at page:

Download "Supplementary Information"

Transcription

1 Supplementary Information The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle specific body plan Zhuo Wang 1 *, Juan Pascual-Anaya 2 *, Amonida Zadissa 3 *, Wenqi Li 4 *, Yoshihito Niimura 5, Zhiyong Huang 1, Chunyi Li 4, Simon White 3, Zhiqiang Xiong 1, Dongming Fang 1, Bo Wang 1, Yao Ming 1, Yan Chen 1, Yuan Zheng 1, Shigehiro Kuraku 2, Miguel Pignatelli 6, Javier Herrero 6, Kathryn Beal 6, Masafumi Nozawa 7, Juan Wang 1, Hongyan Zhang 4, Lili Yu 1, Shuji Shigenobu 7, Junyi Wang 1, Jiannan Liu 4, Paul Flicek 6, Steve Searle 3, Jun Wang 1,8,9, Shigeru Kuratani 2, Ye Yin 4, Bronwen Aken 3, Guojie Zhang 1,10,11, Naoki Irie 2 *: Equally contributed co-first authors. : To whom correspondence and requests for materials should be addressed. 1 BGI-Shenzhen: Beishan Industrial Zone, Yantian District, Shenzhen , China 2 RIKEN Center for Developmental Biology Minatojima-minami, Chuo-ku, Kobe, Hyogo , Japan 3 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom. 4 BGI-Japan, Kobe KIMEC Center BLDG. 8F, Minatojima-minamicho, Chuo-ku, Kobe City, Hyogo , Japan. 5 Medical Research Institute, Tokyo Medical and Dental University, Yushima, Bunkyo-ku, Tokyo , Japan 6 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom. 7 NIBB Core Research Facilities, National Institute for Basic Biology Nishigonaka 38, Myodaiji, Okazaki Aichi, Japan 8 Department of Biology, University of Copenhagen, DK-1165 Copenhagen, Denmark 9 King Abdulaziz University, Jeddah 21589, Saudi Arabia 10 China National GeneBank, BGI-Shenzhen, , China 11 Centre for Social Evolution, Department of Biology, University of Copenhagen, DK-2200, Copenhagen, Denmark

2 Contents Supplementary Note... 4 Genomic DNA sequencing and assembly... 4 Prediction of protein sequences for phylogenetic analysis... 4 Phylogenetic analysis... 4 Divergence time estimation... 5 Gene family contraction and expansion in turtle lineages... 5 Extensive expansion of olfactory receptors (ORs)... 6 Turtle-specific gene loss and pseudogenization... 6 Genes with accelerated evolutionary rate... 7 Hourglass-like gene expression divergence during embryogenesis and un-shifted phylotype... 7 Developmental timetables by expression profile... 8 Genes that characterizes the stages after the phylotypic period... 8 Molecular development of the carapacial ridge... 8 Genomic DNA extraction... 9 Library construction and sequencing K-mer estimation of genome size Raw read filtering Error correction of short libraries before assembly Genome assembly Repeat annotation Assessment of genome assembly coverage Sanger-based quality check of the soft-shell turtle genome Whole genome alignment Dataset used for gene predictions Gene prediction pipeline for two turtles Additional gene prediction for soft-shell turtle by the Ensembl-prediction-pipeline Gene ontology analysis Turtle ultra conserved non-coding elements (TUCNE) Turtle-specific genes Phylogenetic tree reconstruction Divergence time estimation Gene loss analysis Statistical analysis of gene family expansion and contractions (E/C genes) Pseudogene / frame-shifted gene detection

3 Predictions of Olfactory Receptor (OR) Genes Genes of accelerated evolutionary rate in the turtle lineage Embryo sampling and mrna extraction RNA-Seq for transcriptome identification De novo transcriptome assembly RNA-Seq for gene expression analysis Obtaining gene expression scores from RNA-Seq data Comparison of gene expression profiles Wnt gene identification, cloning and whole-mount in situ hybridization microrna extraction, prediction, and expression analysis microrna target predictions Statistical tests Software and computation environment Supplementary Figures and Tables

4 Supplementary Note The turtle s anatomical features (Supplementary Figure 1), which are atypical for an amniote, have been the subject of an extensive debate regarding their evolutionary origin. To clarify the phylogenetic position of turtles with a genome-scale data set and to further investigate the origin of their unique body plan from an evolutionary developmental perspective, we determined the genome sequences of a soft-shell turtle (P. sinensis) and a green sea turtle (C. mydas) and performed further analyses as described in this supplementary text. Genomic DNA sequencing and assembly We utilized massively parallel short read sequencing with 8 types of short- to long-insert libraries (Supplementary Table 1) to determine the genomic sequences of a soft-shell turtle and a green sea turtle; each of these libraries was constructed from a single female individual. As shown in Supplementary Table 2, our assembly strategy allowed us to obtain an N50 scaffold size of longer than 3 Mega bases. To assess the quality of our assembly, we performed three different evaluations as described below. First, we checked the GC content of the sequenced DNA. The mode GC contents for soft-shell turtle and green sea turtle (44% and 43%, respectively) were comparable to those of human, chicken, and anole lizard (Supplementary Figure 2A), and a more detailed GC content distribution analysis (Supplementary Figure 2B-C) revealed a low average depth island that was found only in soft-shell turtle, which may reflect the ZW karyotype of the female sample (female soft-shell turtles are known to have the ZW karyotype 73, while green sea turtles have 56 diploid chromosomes in both sexes 74 ). We then evaluated the coverage of two turtle assemblies against 248 core eukaryotic genes (CEGs) and confirmed that both of the assemblies covered more than 70% of the complete sequences of core eukaryotic genes and more than 90% including partial coverage (Supplementary Table 3). Finally, for the soft-shell turtle genome sequence, we compared the scaffolds to independently constructed and sequenced (by Sanger-based sequencing) fosmid clones, and we clarified that the weighted average coverage rate for five fosmid clones was greater than 95% (96.7%, See Supplementary Table 4). In addition, the assembled genome sequences from soft-shell turtle were assessed to evaluate the coverage rate of transcribed regions (Supplementary Table 5). BLAT alignment against the Expressed Sequence Tag (EST) clusters with the assembled genome (with more than 50% sequence length alignment) showed that 98.6% of EST clusters generated by Illumina short reads were covered by the assembly. Similarly, 98.1% of the 79,305 EST clusters generated by 454 sequencing were covered by the assembly, suggesting high coverage of gene regions by the assembled genome. Although the overall characteristics of the turtle genomes were comparable to other vertebrates (Supplementary Figure 4-5), the repetitive elements in the turtle genomes were higher in frequency than in other sauropsid genomes (Supplementary Table 6). The Whole Genome Shotgun project data for soft-shell turtle and green sea turtle have been deposited in DDBJ / EMBL / GenBank under the accession numbers AGCU and AJIM , respectively. The genome versions described in this paper are the first versions for these genomes. Prediction of protein sequences for phylogenetic analysis While recent studies have shed light on the transcription and regulation of non-coding genomic elements, protein-coding genes have provided the measure of evolutionary relationships with higher fidelity because their cross-species relationships, namely their orthologies, are more reliably inferable within the established methodological framework 75. We therefore performed gene predictions for soft-shell turtle (Ps-BGI_gene), green sea turtle (Cm_gene), Saltwater crocodile, and American alligator (Supplementary Table 7) to determine the conserved gene sets of protein coding genes. The lengths of predicted genes, coding sequences, exons, introns, and exon numbers of the genes in two species are comparable to other vertebrate species (Supplementary Figure 4). Additionally, the evolutionarily conserved profiles of genes in two turtle species were found to be comparable when compared to the existing genome-determined species (Supplementary Figure 5). A Gene Ontology (GO) analysis was also performed for soft-shell turtle and green sea turtle. Our analysis indicates that the predicted gene sets cover more than 89% of conserved eukaryotic genes (Supplementary Table 8). The basic statistics of the two genomes are summarized in Supplementary Table 9. Phylogenetic analysis Despite efforts using both morphological and molecular approaches, the origin of turtles remains controversial, and researchers debate between three major hypotheses 1,76 (Supplementary Figure 6): turtles are the [I] sister group to the lizard-snake-tuatara (Lepidosauria) clade 9,10, [II] turtles are the sister group to birds and crocodilians (the Archosauria) 77,78, or [III] turtles are basal to the Diapsida (a clade composed of the Archosauria and the Lepidosauria 79 ). The main inconsistency among molecular-based analyses, including a recent study using micro RNA (mirnas) 9, is presumably due to a lack of comprehensive data (e.g., without a genome sequence, we cannot distinguish whether an mirna is absent from the genome or is present yet expressed under the detection level of the sequencing protocol). We therefore performed phylogenetic analyses using the large gene sets from two turtles (green sea turtle and soft-shell turtle) that were sequenced in this project together with 10 vertebrates with sequenced genomes 4

5 (chicken, zebra finch, saltwater crocodile, American alligator, anole lizard, dog, human, platypus, Western clawed frog and medaka). Using concatenated sequences constructed from 1,113 single-copy orthologues (Supplementary Table 10) with a variety of sequence sets (including amino acid sequences, whole coding sequences, 1 st, 2 nd, 1 st & 2 nd, and 3 rd codon positions), we have performed phylogenetic reconstruction analyses (Supplementary Figure 6). To increase the robustness of our data, phylogenetic trees were made using two different programs, RAxML 80 and PhyML 47,48 (Supplementary Table 10). The sequence length and log-likelihood value for each set of sequences are shown in Supplementary Table 10. Based on our analysis using various datasets, turtles were clearly grouped with the Archosauria, while the Lepidosauria served as a sister group. The significance of this result was supported by statistical tests, in which alternative tree topologies were explicitly rejected (Supplementary Table 11). The tree topology inferred with the third codon position was different, as turtles were grouped together with crocodiles and formed a sister group to birds. This could be largely due to the faster evolutionary rate at the third codon position (Supplementary Figure 6) and the saturation effect. We therefore decided not to rely on the data from the third codon position in the phylogenetic tree inferences, as commonly practiced 81,82. Divergence time estimation Based on the phylogenetic tree and data obtained from the 1 st and 2 nd codon positions, we next focused on estimating the divergence time of each species using the Bayesian MCMC method in PAML together with several fossil records that can be used as calibrating time points (Supplementary Table 12-13). The divergence between soft-shell turtles and green sea turtles was estimated to have occurred approximately million years ago (Mya); using the 95% credibility interval, this divergence could have occurred approximately Mya. The crocodilian/bird split and the turtle/(crocodilian/bird) split were estimated to have occurred Mya and Mya, respectively (Supplementary Figure 7). We also used several other methods and data sets to estimate divergence times, which are listed in Supplementary Table 12. Note that the estimated time rages for turtles/birds did not differ to a large extent regardless of the method or data set used. Gene family contraction and expansion in turtle lineages After obtaining a statistically reliable phylogenetic tree for turtle evolution and predicted gene sets, we explored genes that underwent a large expansion in gene family size during turtle evolution. We found several turtle lineage-specific expansions of gene families by analyzing 12 vertebrate genomes using the CAFÉ program 83 (Supplementary Table 14). Olfactory receptors (OR) genes are the typical example of genes that have evolved in vertebrates following a birth-and-death model. Indeed, we find many changes in the size of the OR gene families, notably a very large expansion of the OR52 gene family in the soft-shell turtle genome. Other OR families like OR10 are also expanded while a few subfamilies are contracted. A more detailed study on the expansion of olfactory receptor genes is provided in a subsequent section. Several gene families involved in the immune system and more specifically in the innate immune response are also expanded in the turtle lineage. We also observe several family expansions of Zinc-finger proteins. Turtle lineage-specific gene family contractions were also evaluated by the CAFÉ program and included OR protein subfamilies and some zinc-finger families Supplementary_Table 15.xls: Nucleotide sequences of genes that are predicted to be expanded in the turtle lineage. Supplementary_Table 16.xls: Table of gene family IDs and genes that are predicted to be contracted / expanded in the turtle lineage. Examination of the gene repertoire in selected gene families It is of great interest if gene families other than the previously mentioned families exhibit a standard pattern in their gene repertoires or any other unexpected feature. For example, lineage-specific gene duplications observed across gene families are recognized as signs of whole genome duplication that are unique to that lineage 84. To obtain a snapshot of the turtle protein-coding gene repertoire, we focused on a few select gene families that are known to exhibit a relatively high level of variation in gene retention among major vertebrate lineages. First, we focused on the prolactin-releasing hormone receptor (PRLHR) family, in which large differences in gene repertoires were previously documented between vertebrate lineages 85. In the soft-shell turtle genome, orthologues of all the four PRLHR subtype genes (PRLHR1-4) have been retained (Supplementary Figure 8), as was reported for the anole lizard 85. This suggests that PRLHR1, which is present in the anole lizard and soft-shell turtle genomes but absent in the chicken genome, was most likely lost somewhere in the archosaurian lineage after its separation from the turtle lineage. Functional characterization of the identified turtle genes and their relatives should highlight the biological significance of these changes in the gene repertoire. Second, we made an attempt to identify soft-shell turtle orthologues of selected genes that were absent in chicken, namely Pax4 86. As a result, no putative orthologue was identified for any of these genes. Their absence marks a relatively ancient characteristic of the gene repertoire that was possibly established, at the latest, at the radiation of extant sauropsids and would not account for phenotypic differences between turtles and other sauropsid lineages. Third, we found eye globin (also called globin E or GbE), which has been reported only in birds (including chicken) 87 and is considered to be involved in supplying oxygen to the thick avian retina 88. Our search in the soft-shell turtle genome detected a possible GbE orthologue (Ensembl Gene ID ENSPSIG ), and we further confirmed the orthology with the chicken GbE by reconstructing a molecular phylogenetic tree (Supplementary Figure 8). Remarkably, among vertebrates, turtles are the group with the highest rate of gene retention in the focused globin gene family, which delineates the turtle-specific gene repertoire that is different from anole lizards and birds. Our study on turtle genomes identified the first GbE sequences outside the avians. Although its exact role remains to be clarified 87, the characterization of turtle and bird GbE functionality may account for the defining 5

6 phenotypic characteristics of the turtle-archosaurian clade, and the extraordinarily thick plexiform layer of the P. scripta retina 89 deserves further study. If crocodilian genomes have retained the GbE orthologue, this would also be of great interest. This result, which was initially obtained based on the soft-shell turtle resource, was later confirmed with the green sea turtle genome sequence information. Overall, we did not observe any sign of whole genome duplication(s) that were unique to the turtle lineage. Extensive expansion of olfactory receptors (ORs) Diverse odor molecules in the environment are detected by olfactory receptors (ORs) expressed in the olfactory epithelium of the nasal cavity. Typically, mammalian genomes harbor ~1,000 OR genes, which form the largest multi-gene family in vertebrates 46,90. From the genome sequences of Chinese soft-shell turtle and green sea turtle, we have identified 1,137 and 254 intact (potentially functional) OR genes, respectively. The number for the soft-shell turtle is the largest among the non-mammalian vertebrates examined so far 91. We also found hundreds of OR pseudogenes and OR gene fragments in both the soft-shell turtle and green sea turtle genomes. The amino acid and nucleotide sequences of intact OR genes identified in this study are available as Supplementary_Table_17.xls and Supplementary_Table 18.xls, respectively, through the online version of the paper at Supplementary_Table 17. xls: Amino acid sequences of intact OR genes identified from genome sequences. Pesi and Chmy represent soft-shell turtle and green sea turtle genes, respectively. Each gene name contains a scaffold number, the initial and the terminal positions of the gene, and a transcriptional direction. Supplementary_Table 18.xls: Nucleotide sequences of intact OR genes identified from genome sequences. Gene names are the same as those in Supplementary_Table 17.xls The OR genes of Osteichthyes (teleost fishes and tetrapods) are classified into seven groups (α η), each of which corresponds to at least one ancestral gene in the last common ancestor of osteichthyans 91,92. It was reported that genes from group α and γ are present only in tetrapods (mammals, birds, reptiles, and amphibians), while genes from group δ, ε, ζ, and η are present exclusively in amphibians and bony fishes (Supplementary Table 19). Genes from group β are present in both tetrapods and bony fishes. Mammalian OR genes are usually classified into Class I and Class II genes. The former corresponds to groups α and β, while the latter corresponds to group γ 91,92. We identified OR genes belonging to groups α, β, and γ in the two turtle genomes. This observation is consistent with the distribution of OR groups in amniotes that have been previously examined. However, interestingly, we found that group α (Class I) OR genes are largely expanded in the turtle lineage, which is unique among amniotes. Group a genes generally represent 10-20% of OR genes in mammals, and this proportion is much smaller in birds and lizards (Supplementary Table 20). However, group a genes represent >45% of OR genes in the two turtle species. Genome-scale analyses demonstrated that a drastic expansion of group α genes has occurred in the turtle lineage. There are three major turtle-specific clades, each of which contains > soft-shell turtle OR genes (Figure 2). An estimation of the numbers of ancestral genes suggested that the soft-shell turtle, for example, acquired more than 500 functional group a OR genes after its separation from the green sea turtle (Supplementary Table 20, Figure 2). Several studies have demonstrated that aquatic mammals, cetaceans, sirenians, and pinnipedians have a greater proportion of OR pseudogenes than terrestrial mammals 13,14,93. Moreover, Kishida and Hikida 94 reported that the fraction of OR pseudogenes in fully aquatic viviparous sea snakes is significantly higher compared with oviparous sea snakes, which depend on a terrestrial environment for laying eggs. Contrary to these observations, our analyses showed that the OR gene repertoires of the two turtle species (soft-shell turtle and green sea turtle) have expanded despite their adaptation to aquatic life. In fact, several lines of evidence indicate that aquatic turtles have good olfactory abilities in general. Endres et al. 49 reported that sea turtles can detect airborne odorants as well as water-soluble odorants and proposed that this ability may play a role in navigation and/or foraging under natural conditions. The dynamic expansion of group α (Class I) OR genes in turtles implies the importance of these groups of genes for the turtles living environment. It was suggested that ligands of Class I ORs tend to be hydrophilic while ligands of Class II ORs tend to be hydrophobic 95. Therefore, the expansion of Class I genes may reflect the turtles reliance on an aquatic environment. We next examined the distribution of OR genes in genomes. In general, the distribution of OR genes in mammalian genomes has the following features 90,96 : (i) OR genes form genomic clusters that are scattered on many chromosomes, and (ii) Class I and Class II OR genes are located in distinct genomic clusters and do not coexist within a single cluster. Our analysis clarified that these two features are also characteristic of both soft-shell turtle and green sea turtle OR gene families. As shown in Figure 2c-d, the OR genes of soft-shell turtle are arrayed in tandem in a fairly regular pattern in a contig. The largest Class I and Class II OR gene clusters were found in scaffold 55 and scaffold 145 in the soft-shell turtle genome, which contains 53 and 41 intact Class I and Class II OR genes, respectively. These contigs do not include any other genes. We did not find any cases in which intact Class I and Class II genes are present together within one contig. Turtle-specific gene loss and pseudogenization To clarify the functional aspect of genes lost in the turtle lineage, we performed a more detailed gene loss analysis, together with enriched GO detection using GOstat 97. The genes lost in turtles (GLT) were defined as genes that cannot be found in either of the two turtle genomes but can be found in both archosaurians (either in American alligator, saltwater crocodile, chicken, or zebra finch) and mammals (either in human, dog, or platypus). Consistent with the family expansion/contraction analysis, the ontology olfactory receptor activity was found to have a high amount of gene losses (Supplementary Table 21). In relation to the evolution of body plan, 6

7 the loss of genes having the GO assignment of multicellular organismal process was also statistically significant. A similar observation was further implied by the protein family-level analysis (Supplementary Table 22), as Kruppel-associated box, GPCR rhodopsin-like, and 7TM were found to be over-represented in the turtle-lost protein families. One of the surprising findings was that we found loss of the hunger-stimulating hormone ghrelin 98 in two turtles (Supplementary Table 23). The loss was also confirmed by manual investigation including BLAST search against two turtle genomes (only partial-hit was observed in both genomes, and no EST cluster was found in the soft-shell turtle genome) and de novo assembled soft-shell turtle ESTs (partial hit corresponding to the BLAST against the genome). Furthermore, we also investigated genes that presumably underwent pseudogenization in turtles and found two genes that were predicted to have undergone complete pseudogenisation (see also Methods). Interestingly, two of these three genes, Forkhead box M1 (FoxM1) and serine/arginine-rich splicing factor 1 were related to developmental processes. Especially, the FoxM1 gene is known to be an indispensable gene: the knock-out phenotype is embryonic lethal, and the gene is also known to be involved in the development of liver, heart, lung and blood vessels in mice 99. Based on these results, however, no definitive conclusion can be made for the turtle body plan evolution, and the effects of the loss of these genes deserves for further investigation. Supplementary_Table 23.xls: List of genes that were predicted to be lost in two turtles, but exists in either of chicken, zebra finch, anole lizard, or X. tropicalis. Genes with accelerated evolutionary rate To identify genes that have experienced positive selection and accelerated evolutionary rate in the turtle lineage, we compared the soft-shell turtle genome and green sea turtle genome and calculated the dn/ds ratio of the coding sequences (Supplementary Table 24). Among the coding sequences, several genes, MGST3, ABCB1, FAH, RFC4, HEATR2, APOBEC2, SCYL3, PDC, and METTL15, were found to have a dn/ds ratio higher than 1, which implies positive selection after the split of these two turtle lineages. Of note is the gene with the highest evolutionary rate, microsomal glutathione S-transferase 3 (MGST3), which is reported to be involved in protecting cells from oxidative stress. Interestingly, disruption of the Drosophila homolog MGST-like reduces life-span in Drosophila 15. Relationship between accelerated MGST3 and the longevity deserves for further investigation. Hourglass-like gene expression divergence during embryogenesis and un-shifted phylotype In general, all animals build their complex body from a single fertilized egg. The process begins with establishing basic polarity information (such as information regarding body axes) followed by the further addition of polarity and topological information. The important role of the earliest (or upstream) developmental processes fits well with the hypothesis inspired by von Baer 25 and Haeckel 101, or the idea that earlier embryonic processes are typically better conserved in evolution (funnel-like model) 102. Meanwhile, observations that vertebrate cleavage and gastrulation patterns are rather divergent among species led to the formulation of the developmental hourglass model hypothesis 16,17. This model predicts divergent early stages and more highly conserved stages around the so-called phylotypic period 19,21,22,103,104, which is considered to be the source of the basic vertebrate body plan. Recent molecular studies supported developmental hourglass-like divergence in vertebrates 18,20,21 and Drosophila 19 ; however, none of the analysis was performed in a non-model organism. To examine whether turtles, which have a rather atypical anatomy for amniotes, also follow hourglass-like 16,17 divergence during embryogenesis, we compared the whole embryonic gene expression profiles of soft-shell turtle embryos against chicken embryos using orthologous genes with non-biased slim GO profiles (Supplementary Figure 9-10). We first explored the number of genes expressed during each developmental stage. Although the predicted gene number for soft-shell turtle (18175) differs largely from that of chicken (16736), the number of genes expressed (with at least one tag mapped to their coding region) was comparable between the two species (Supplementary Figure 11). Interestingly, the overall number of genes detected during embryogenesis was also comparable among the various embryonic stages (see also Supplementary Figure 13). We next compared the similarity of expression profiles between turtle and chicken embryos with one-to-one orthologues to evaluate the conserved nature of the embryonic stages. Consistent with the hourglass model, the mid-embryonic stages exhibited a higher expression similarity compared to the earliest and the latest stages of the sampled embryos (Supplementary Figure 14). This result could not be explained by the moderate and monotone increase of the numbers of genes expressed during embryogenesis (Supplementary Figure 13). While the above results suggest an hourglass-like divergence between turtle and chicken embryos, we cannot exclude the possibility that these results were biased by the pairs of turtle-chicken embryos we selected. We therefore performed more robust all-to-all comparisons to corroborate the hourglass-like divergence. As shown in Supplementary Figure 14, all-to-all comparison analysis supported the highest conservation at mid-embryonic stages, and importantly, this result did not change when either the distance calculation method or the normalization methods were used. The essence of the hourglass model resides in the waist region. This region represents the conserved phylotypic period, which is believed to illustrate the basic body plan of vertebrates 16,17. However, as indicated by the observations of von Baer 25 and Haeckel 101, relatively closely related species such as turtles and birds remain similar in appearance until late embryogenesis. This observation prompts the expectation that the most conserved period emerges at later developmental stages than the phylotype when comparisons are made between relatively small clades (inner or red part of the nested hourglasses model in Fig. 3a), whereas the vertebrate phylotype still emerges when comparisons are made among far-related vertebrate species (outer or blue part of the nested hourglasses model in Fig. 3a). Thus, the hierarchical relationship between phylogeny and ontogeny still becomes valid (as von Baer once proposed) after the phylotypic period. This idea, or the nested hourglass model 18 can be tested by identifying the most conserved developmental stages between turtles 7

8 and chickens and assessing whether it corresponds to the previously identified vertebrate phylotypic stage in chicken 8, namely stage HH16. As shown in Supplementary Table 25, our statistical analysis, which is based on the hierarchical Bayes method, robustly demonstrated that TK11 of soft-shell turtle and HH16 of chicken show the maximum expression similarity among the stages we have analyzed. All combinations of normalization and similarity calculation methods we have utilized supported the same result. The result suggests that these stages have a highly conserved nature in terms of gene regulation. Notably, chicken HH 16 is the stage at which the phylotypic period was observed in our previous study 8, which compared the embryonic gene expression profiles of mouse, zebrafish, and Xenopus. The identified turtle stage appears similar in external appearance with chicken phylotypic period and also shows a similar repertoire of organ primordia shared among the phylotypic periods of four other vertebrates (Supplementary Table 26). In accordance with this, the identified turtle/chicken phylotypic stages exhibited the shared expression of developmental toolkit genes 105 or genes known to be involved in developmental process in various animals (Fig. 4a, Supplementary Table 27). Furthermore, the group of genes that was associated with the GO assignment of multicellular organismal development showed a significantly similar expression level between the embryos (Supplementary Table 28). This result implied that the conserved nature of the phylotypic stage is more encoded by regulatory sequences and less by the primary sequences of coding genes. Considering the different sizes of their eggs (around 1.5 cm for turtle eggs, and 5cm for chicken eggs) and the actual time required for development 24,61, a similarity in both gene expression and morphology is surprising. Taken together, the maximal similarities of the expressed gene repertoire between the turtle TK11 embryo and chicken phylotypic period, together with the shared anatomical features, demonstrate that even animals that have an atypical body plan (e.g., turtles) follow the hourglass model with a conserved phylotypic period. Additionally, a temporal shift to later stages does not seem to occur within relatively small phylogenetic clades, suggesting the highly conserved nature of these phylotypic stages in various vertebrate clades (Supplementary Figure 15). In accordance with this, gene families that possibly experienced expansion or contraction (see supplementary Table 16) showed lower expression levels, particularly during the mid-embryonic stages (Supplementary Figure 16). Developmental timetables by expression profile We next explored whether the current understanding of corresponding developmental timetables 24,106 can be reproduced by gene expression similarities. An estimation of corresponding developmental timetables (CDT)* between different species is especially important in the search for developmental changes that have occurred during evolution, including heterochrony and heterotopy 107. However, both morphological and candidate molecular approaches (e.g., using Hox expression as a marker) have subjective biases; the morphological approach is often limited to structures that have clearly distinguishable elements, and the molecular candidate approach is often confined to genes that have known functions. Thus, we have investigated a revised molecular approach using microarray-generated expression profiles that contain all expressed genes as reported in our previous study 21. Despite the fact that actual time needed for soft-shell turtle and chicken development differs almost twice in length, the molecular analysis corresponded reasonably well with the current understanding 24,106 (Supplementary Figure 17). This type of molecular approach could be a new strategy for the adjustment of developmental timetables between different species, which in turn would be helpful for estimating the embryogenesis of common ancestors that are now extinct. Finally, the above results and conclusion did not change depending on which soft shell turtle gene model (Ps-BGI_genes and Ps-ens_genes) was used. *Corresponding developmental timetables (CDT): Because there are no equal developmental stages between different species due to evolutionary changes (e.g., heterochronic shifts), corresponding indicates the pair of stages that are expected to have diverged from the same stage of the last common ancestor with minimum changes in developmental events. Genes that characterizes the stages after the phylotypic period The result that turtle-chicken embryogenesis also follows the developmental hourglass model implies that the gene regulation that characterizes the turtle morphology also becomes evident after the phylotypic period. We, therefore, searched for genes that potentially explain the turtle-specific characteristics that appear after the phylotypic period, such as the sequence of ossification events, including the shell 108. By searching for turtle genes that become more highly expressed after the phylotypic period (excluding orthologous genes that also show increasing expression in chicken embryogenesis), we found 233 genes that increased after the phylotypic period (turtle IAP) (Supplementary Figure 18). As expected from the well-ossified and collagen-rich anatomy of turtles, these IAP genes showed enriched GO assignments related to ossification and extracellular matrices (Fig.4b). We further narrowed down the period into TK13 and TK15 to identify genetic programs that potentially explain turtle-specific morphogenesis, such as CR formation, the axial arrest of the ribs, and the folding of the body wall that occurs during this period 23. Based on the phylotypic period and the CDT predicted above, the expression profile of the TK13-15 period was compared to HH19-28, and we found that the genes highly expressed in these turtle stages, but not in the corresponding chicken stages, have enriched GOs of collagen fibril organization and positive regulation of chondrocyte differentiation (data not shown). However, the results do not necessarily indicate that these genes are the key players that cause turtle-specific morphogenesis, and their analysis awaits further investigation. Molecular development of the carapacial ridge The carapacial ridge (CR) of turtles represents a major morphological innovation within vertebrates 1. However, the molecular pathways leading to its development are still not well known. In previous work, we have shown that several genes are specifically involved in CR formation 28,109. Among them, genes 8

9 downstream the Wnt/β-catenin signaling pathway, such as the transcription factor LEF-1, are expressed in both the CR mesenchyme and ectoderm 28. Accordingly, β-catenin protein translocation to the nucleus was also shown 28. Thus, a Wnt gene must be upstream of this signaling cascade, and this gene was likely co-opted for the innovation of the CR in the turtle lineage. However, no Wnt gene identified so far in turtle has been detected in the CR, and this part of the pathway has been thought to be activated by HGF/c-Met signaling 109. Thus, to identify the complete set of Wnt genes, we have performed a screening of the soft-shell turtle genome together with the extensive RNA-seq and have identified a total of 20 members of the Wnt family (Fig.5), including Wnt10b, which seems to be lacking in the bird genomes (Garriok et al. 110 and our observations) and Wnt11b, which has been lost in mammals 85. The expression patterns illustrated by whole-mount in situ hybridizations show that out of 20 Wnt genes analyzed, only Wnt5a was expressed in the CR at TK14 (Fig.5), the stage at which the CR becomes more apparent. Although Wnt5a is typically involved in non-canonical Wnt signaling (i.e., exerting its function independently of β-catenin), this depends on the present receptors, and it has been shown that Wnt5a can also activate the canonical pathway 111. Nonetheless, further experiments are needed to clarify whether Wnt5a is responsible for β-catenin translocation in CR cells. micrornas (mirnas) are a type of non-coding RNA, nucleotides in length, that regulate genes either by inhibiting translation or directing mrna cleavage via binding to complementary sequences. These sequences mainly occur in the 3 UTR of mrnas 112. Moreover, mirnas are thought to have an important role in development 113. To investigate the mirna repertoire associated with CR formation, we have performed a small RNA-seq with various tissues (CR, limbs, and body walls) micro-dissected from soft-shell turtle embryos at stage TK14 (Fig.5b-c, see also Online Method for library construction and mirna prediction, Supplementary Figure 19-22). For each tissue, we analyzed the small RNA sequences with more than 62,000X sequencing depth (calculated against the predicted mature sequences). From the mapped reads, mirdeep2 software 114 predicted a total of 715 mirnas expressed in the CR, with 564 in limbs and 868 in body walls (see Supplementary Table 29 and Supplementary Table 30.xls). In total, 1082 unique mirnas were predicted in soft-shell turtle, and mature sequences of 22% of them were found to match % in the green sea turtle genome (Supplementary Figure 20). Among the 1082 unique mirnas, 212 were found to be specific for CR and were not detected in either the limbs or the body walls (Supplementary Figure 19). For example, mir-187 was found to be one of the most highly expressed (741 reads, ~1.5% of CR-specific reads) mirnas in the CR (Supplementary Figure 19b-c). This unexpectedly high number of specific mirnas suggests that mirnas have an important regulatory function in the CR, which opens the door to a new field of research on the unique morphological novelty of turtles. Provided that Wnt signaling plays a critical role in turtle development, certain circumstantial evidences should be observed. We first confirmed that none of the Wnt signaling components are lacking in either of the two turtle genomes (Supplementary Figure 21a,b, Supplementary Table 31). In addition, Wnt5a and some of its possible downstream components were found to be the potential targets of mirnas expressed in the CR, body wall, and limbs (Supplementary Figure 21c, Supplementary Table 32), suggesting that Wnt signaling components are also regulated at the level of protein translation. Of note, Wnt5a is predicted to be controlled in these three tissues by different mirnas, implying that, although these are transcribed in all three tissues, their translation regulation may be different and may be important for the differential patterning and development of these structures. Moreover, the important downstream components Tcf7 and b-catenin were predicted to be targeted by tissue-specific mirnas, which may explain why there is Wnt5a transcription in the body wall, but that it lacks b-catenin nuclear translocation, as was reported previously 28. We also found 6 non-coding regions in the 10kb-upstream region of Wnt5a that are conserved in the two turtles (Supplementary_Table 33.xls) with % identity but not conserved in human, chicken or anole lizard genomes. Taken together, our results suggest that the expression of the Wnt cascade components and their Wnt ligands are differentially controlled by multiple mirnas in each tissue, implying that the Wnt cascade co-option in the CR might also be accompanied by evolution of mirna regulation. Finally, we believe that the turtle genome and RNA sequences presented here are invaluable tools for the investigation of key facets of the development of morphological innovations, such as the CR and its associated aspects 1,23. Supplementary_Table 30.xls: Predictions of mirnas from P. sinensis TK14 carapacial ridge, limb and body wall. Supplementary_Table_33.xls: Turtle conserved (with % identity) non-coding elements located within 10kb upstream range of each gene. Regions shorter than 30bp and regions with a good alignment with human, chicken or anole lizard are excluded. All of the sequenced data have been made available and are freely accessible from the online databases. The details are summarized in Supplementary Table 34. Genomic DNA extraction For the soft-shell turtle, 3.58 mg of DNA (256 µg/ml) was extracted from 8 ml of whole blood of an anesthetized (diethyl ether) female (anatomically confirmed) purchased from a local farmer in Japan. For the green sea turtle, a total of 4 mg of high-quality DNA was extracted from the whole blood of a female individual provided by the G10K ( project (originally collected in Ocean Park, Hong Kong). DNA extraction was performed by an overnight treatment of whole blood with proteinase K, followed by phenol extraction and ethanol extraction using a glass rod. No column-based kits were used for the extraction step to avoid fragmentation of the sample DNA. The purity and integrity (especially regarding length) of the DNA samples were confirmed by a Qubit Fluorometer and agarose gel electrophoresis, respectively. 9

10 Library construction and sequencing For both the soft-shell turtle and green sea turtle genomes, we first constructed three different short-insert (170 bp, 500 bp, 800 bp) mate-pair libraries from the genomic DNA samples and sequenced them using the Illumina HiSeq 2000 system to obtain data for survey analyses (e.g., genome size, GC content, complexity). After obtaining basic information for genome size and complexity, we further constructed 2 Kb, 5 Kb, 10 Kb, 20 Kb and 40 Kb mate-pair libraries from the same DNA sample and sequenced them for further assembly. For long-insert (>1 Kb) mate-pair libraries, approximately µg of genomic DNA was fragmented, biotin labeled, self-ligated to form circularized DNA, merged at the two ends of the DNA fragment, broken into linear DNA fragments again, enriched using biotin/streptavidin, and prepared for sequencing. For both the soft-shell turtle and green sea turtle genomes, we have constructed a total of 18 and 17 libraries, respectively. K-mer estimation of genome size Genome size can be estimated by analyzing the occurrence and distribution of K-mers with following formula: Estimated genome size (bp) = K-mer number / depth Based on the rate of occurrence of K-mers in each genome,the read depths for soft-shell turtle and green sea turtle were estimated as 38 and 36.1, respectively, leading to genome size estimations of approximately 2.0 Gbp for soft-shell turtle and 2.2 Gbp for green sea turtle. Raw read filtering HiSeq raw reads with the following features were regarded as low quality reads and were filtered out (discarded) according to previously published methods 32, [1] Reads with more than 10 bp (base pair) aligned to the adapter sequence (allowing 3 bp mismatch); [2] Reads with more than 2% N residues; [3] Reads containing polya structures; [4] Possible PCR duplicates (two paired end reads with completely identical sequences); [5] Small-insert library reads that have 40 bases with quality scores 7; [6] Large-insert library reads that have more than 30 bases with quality scores 7; [8] Small-insert reads with more than 10 bp of overlapping reads between read1 and read2 (allowing 10% mismatch). Error correction of short libraries before assembly Artificial K-mers generated from sequencing errors normally occur at a low frequency. This indicates that K-mer frequency information can be used for the correction of reads with a low frequency of K-mers. We used K=17 because 4 17 = (> 17 Gbp) is larger than the estimated soft-shell turtle and green sea turtle genome size (2.1 Gbp and 2.2 Gbp, respectively); this is also sensitive enough to identify reads embracing the K-mer. Based on the K-mer distribution curve, frequencies lower than the turning point (13 for soft-shell turtle and 10 for green sea turtle) were considered to be low frequency K-mers. We then constructed a hash table storing the frequencies of all 17-mers (which occupied 16 G bytes of memory) and tested whether the substitution of any residues with the other three nucleotides could lead to a high frequency of 17-mers. Starting from regions of high K-mer frequency, we extended our correction base by base to both sides of the low-frequency K-mer regions that are considered to have potential erroneous sites. When all 17-mers that covered the altered residue were changed into high frequency ones, we assumed that the erroneous residue was indeed an error and corrected it. If the erroneous sites could not be corrected, we trimmed the low frequency K-mers from the reads. We did not correct for the reads from long-insert libraries, which were only used in scaffolding and can tolerate some sequencin g errors. In total, we corrected 0.21% of the bases and trimmed 2.39% of the bases from the filtered reads. Genome assembly We assembled the genome with the filtered and corrected (=clean) data described above using SOAPdenovo 31,32 (updated version based on SOAPdenovo1.05) software. The assembly was carried out using the following steps: [1] Contig construction. The reads from short-insert size (less than 1 K) library data were split into K-mers and used to construct a de Bruijn graph. The graph was then simplified to achieve the contigs by removing tips, merging bubbles and solving repeats. [2] Scaffold construction. All the sequenced reads were re-aligned onto the contig sequences. Scaffolds were then constructed by weighting the rate of consistent and conflicting paired-ends relationships, and only the scaffolds supported by a high weight of paired-end relationships were kept in the assembly. [3] Gap filling. We retrieved the read pairs that had one end that uniquely mapped to the contig with the other located in the gap region and carried out a local assembly for these collected reads to fill the gaps. The parameters used for the SOAPdenovo assembler are as follows: the options of K=27, -d, -M 2 were set for the soft-shell turtle and K=35 was set for the green sea turtle. Repeat annotation Repeat detection was performed with the program RepeatMasker 33 using the repeat library issued on April 18, 2012, by Genetic Information Research Institute (GIRI, The categorization of the repeat types in Supplementary Figure 4 is based on the classification produced by RepeatMasker. Prior to the gene prediction process, the repeat elements were further screened and masked using a combination of homology-based and de novo approaches. For homology-based prediction of repeats, we used the known repeat library in the Repbase 34 database (version , Repbase-16.02) with the software RepeatMasker 33 (version 3.2.6) and RepeatProteinMask to identify TEs at the DNA and protein level, respectively. De novo prediction of repeats involved building a de novo repeat library using RepeatModeler 35 and subsequently employing RepeatMasker to find repeats in the genome and classify the repeats. We searched for tandem repeats in the genome with the TRF (Tandem Repeats Finder) 36 program. Assessment of genome assembly coverage The coverage of the assemblies was assessed using the CEGMA 115 program (version 2.3) 116. Coverage of 248 core eukaryotic genes (CEGs) that are present in a wide range of taxa was assessed to 10

11 measure the completeness of genome assembly. CEGMA combines TBLASTN (blast ), genewise (wise2.2.3), hmmer (hmmer-3.0), and geneid (geneid v1.4) to find gene models in the genome. Sanger-based quality check of the soft-shell turtle genome The CopyControl Fosmid Library Production Kit with the pcc1fos vector (EPICENTRE Biotechnologies) was used to construct the fosmid libraries from the same DNA sample that was used for genome sequencing. After randomly selecting 5 fosmid clones, we sheared the DNA (using Gene Machines) into fragments of approximately 1~3 kb in length and ligated these into the puc118 vector. Cloned DNA fragments were transformed into E. coli by electroporation, plated and grown overnight on LB plates containing X-gal, IPTG and ampicillin. Sanger sequencing was performed with an ABI 3730 to approximately 6-fold coverage. We then assembled the Sanger reads by overlapping and filled gaps by further rounds of sequencing to obtain complete maps of the fosmid clones. Alignment of the assembled genome sequences to the five fosmid clones was performed using the BLAST algorithm (blastn, E-value = 1e-20). Whole genome alignment Whole genome pair-wise alignments were generated by LASTZ 37,38, which is known to have a higher sensitivity than BLAST and is suitable for animal genome comparisons. The parameters used for the LASTZ program are as follows: T=2, C=2, H=2000, Y=3400, L=6000, K=2200 T=2 is short for --seed=12of19 --notransition, which sets the seed pattern and allowing transition or not for lastz alignment; C=2 is short for --chain --gapped, which means performing chaining after lastz alignment, and gap is allowed in the alignment; H=2000, Y=3400, L=6000, K=2200 are short for --inner=2000, --ydrop=3400, --gappedthresh=6000, --hspthresh=2200, which are thresholds controlling the alignment process. The main chains of the two selected genomes were calculated by chainnet 37, which is designed to link aligned segments into larger structures such as chains and nets. The alignments were used as a basis for comparative analysis, e.g., detecting conserved blocks, comparing indels within genomes, and calculating evolutionary distances. Dataset used for gene predictions Genomic sequences, together with registered sequences, were used for predicting the gene sets of soft-shell turtle and green sea turtle. For soft-shell turtle, additional RNA-Seq data of G bases were used for predicting genes. In addition to the gene sets prepared for the two turtles, we performed an additional prediction for soft-shell turtle to ensure the robustness of our results (e.g., comparative expression analysis). The additional prediction for soft-shell turtle was performed by taking advantage of the Ensembl prediction pipeline (Ps-ens_gene). Both of these soft-shell turtle gene sets can be accessed through the Ensembl website ( A sequencing depth for the mirna was higher than 62,000 for each sample (see the following section for more details). Gene prediction pipeline for two turtles [1] Gene prediction for the two turtle genomes: The BGI annotation pipeline employs both the ab initio approach (GENSCAN <ver > 39, AUGUSTUS <ver. 1.0> 40 ) and a homolog-based approach (Western clawed frog, chicken, human, anole lizard, bottlenose dolphin) against the repeat-masked genome, and it further consolidates with the GLEAN 41 program. Homolog-based gene prediction employs GeneWise 43 <ver > as the core program. Homologous proteins of other species (human, anole lizard, chicken, bottlenose dolphin and Western clawed frog from the Ensembl 61 release) were mapped to the genome using TBLASTN (Legacy Blast 56 ver ) with an E-value cutoff of 1e-5. The aligned sequences were then filtered and passed to GeneWise 43 along with the query sequences for searching accurate spliced alignments. Source evidence generated from the above two approaches was integrated by GLEAN 41 to produce a consensus gene set. Gene functions were also assigned according to the best match with the alignments using BLASTP and the SwissProt 117 and TrEMBL databases. The motifs and domains of genes were annotated by InterProScan 44 against protein databases such as ProDom, PRINTS, Pfam, SMART, PANTHER and PROSITE. Gene Ontology 45 IDs for each gene were obtained from the corresponding InterPro entries. All genes were aligned against KEGG 118 proteins, and the pathway in which the gene might be involved was derived from the matching genes in KEGG. Both of the soft-shell turtle gene sets (Ps-ens_gene and Ps-BGI_gene) are available through Ensembl website ( and C. mydas can be available through NCBI website (Accession number: AJIM00000). [2] Gene prediction for crocodilians: For gene predictions in crocodilian genomes, two primary assemblies 119 (saltwater crocodile and American alligator) were provided from the crocodile genome consortium ( courtesy of the project coordinator, Dr. David A. Ray. The same BGI prediction pipeline used for the green sea turtle was applied to the gene prediction of the saltwater crocodile. For the American alligator, the annotation procedure was different from that of the saltwater crocodile only in the integration step. The predicted gene sets were further analyzed for orthology and used for phylogenetic analysis. [3] Gene family identification: We identified gene families with TreeFam 32 using the following steps: [1] BLASTP was used to compare all the protein sequences within the database containing sequences of all species with E-values less than 1e-7. [2] HSP segments were concatenated between the same pair of proteins with solar; this was followed by identification of homologous gene-pair relationships among protein sequences with Bit-score. [3] Gene families were constructed by clustering with hcluster_sg, the algorithm of which is similar to average hierarchical clustering. Additional gene prediction for soft-shell turtle by the Ensembl-prediction-pipeline [1] Raw Compute Stage: This initial stage involved searching for sequence patterns as well as aligning proteins and cdnas to the genome. The annotation process of the high-coverage Chinese soft-shell turtle assembly began with the raw compute stage whereby 11

12 the genomic sequence was screened for sequence patterns (including repeats) using RepeatMasker 33 (version with the parameters -nolow -species pelodiscus_sinensis s ), RepeatModeler 120 (version open to obtain a repeats library and filtered for an additional RepeatMasker run), Dust 121 and TRF 36. A combination of all the repeat analyses (RepeatMasker, RepeatModeler, Dust and TRF) brought the total proportion of the masked genome to 43.59%. Transcription start sites were predicted using Eponine scan 122 and FirstEF 123. CpG islands longer than 400bases and trnas 124 were also predicted. Genscan 125 was run across RepeatMasked sequence, and the results were used as input for UniProt 126, UniGene 127 and Vertebrate RNA 128 alignments by WU-BLAST 129 (Passing only Genscan results to BLAST is an effective way of reducing the search space and therefore the computational resources required). This resulted in UniProt, UniGene and Vertebrate RNA sequences aligning to the genome. [2] Targeted Stage: This stage involved the generation of coding models from Chinese soft-shell evidence. Turtle protein sequences were downloaded from the public databases UniProt, SwissProt/TrEMBL 126 and RefSeq 127. Models of the coding sequences (CDS) were produced from the proteins using Genewise 43 and Exonerate 130. The generation of transcript models using turtle-specific data is referred to as the Targeted stage. This stage uncovered 32 of the 33 turtle proteins that were used to build coding models. However, none of these models were used in subsequent analyses, as they were overridden with longer models from the Similarity stage. [3] cdna and EST Alignment: Turtle cdnas and ESTs were downloaded from Genbank, clipped to remove polya tails, and aligned to the genome using Exonerate. Of the 334 turtle cdnas, 216 sequences aligned, while 142 of the 178 ESTs aligned. The cutoffs for both data sets were 90% coverage and 90% identity. Contig sequences generated by the Chinese soft-shell turtle consortium using 454 sequencing were also aligned to the genome. Of 84,680 initial set, 54,739 aligned with a cut-off of 90% coverage and 95% identity. [4] Similarity Stage: This stage involved the generation of additional coding models using proteins from related species. Due to the scarcity of turtle-specific protein and cdna evidence, the majority of gene models were based on proteins from other species. UniProt alignments from the raw compute step were filtered, and only sequences belonging to UniProt's Protein Existence (PE) classification level 1 and 2 were kept. WU-BLAST was rerun for these sequences and the results were passed to Genewise 43 to build coding models. The generation of transcript models using data from related species is referred to as the similarity stage. This stage resulted in 53,646 coding models. [5] Alignment of Ensembl chicken and anole lizard translations. Ensembl chicken and anole lizard translations were aligned against the turtle genome. The cutoff values for coverage and identity were set at 80% and 60%, respectively. Of the chicken translations, 14,935 of the 16,736 total retrieved translations aligned. From the 17,805 lizard translations, 17,264 sequences aligned above the set thresholds. The resulting coding models were taken through all subsequent steps. [6] Filtering Coding Models: Coding models from the Similarity stage were filtered using modules such as TranscriptConsensus and LayerAnnotation. RNA-Seq spliced alignments supporting introns were used to assist in filtering the set. Apollo software 131 was used to visualise the results of filtering. [7] Addition of RNA-seq models: The largest set of turtle-specific evidence was from paired-end RNA-seq, which was used where appropriate to help inform our gene annotation. A set of 1.2 billion reads that passed QC were aligned to the genome using BWA, resulting in 1.1 billion (87.6%) reads aligning and properly pairing. The Ensembl RNA-Seq pipeline was used to process the BWA alignments and create a further 120 million split read alignments using Exonerate. The split reads and the processed BWA alignments were combined to produce 21,417 transcript models in total (one transcript per locus). The predicted open reading frames were compared to Uniprot Protein Existence (PE) classification level 1 and 2 proteins using WUBLAST; models with no BLAST alignment or poorly scoring BLAST alignments were discarded. The resulting models were added into the gene set where they produced a novel model or splice variant. In total, models were added. [8] Addition of UTRs to coding models: The set of coding models was extended into the untranslated regions (UTRs) using turtle cdna and contigs from the 454 sequencing project. This resulted in 5,935 of 32,470 coding models with UTRs. In addition, 10,892 RNA-Seq models also contributed to the addition of UTRs to the final models. [9] Generating multi-transcript genes: The above steps generated a large set of potential transcript models, many of which overlapped one another. Redundant transcript models were removed, and the remaining unique set of transcript models were clustered into multi-transcript genes where each transcript in a gene has at least one coding exon that overlaps a coding exon from another transcript within the same gene. The final gene set of 18,272 genes included 4,603 genes built only using proteins from other species and 8,070 genes built only from RNA-Seq evidence. Overall, 3,322 genes had a mixture of RNA-Seq evidence and evidence from proteins of other species. A further 1,263 genes were supported only by Ensembl chicken or Ensembl lizard translations. The remaining 917 genes contained transcripts from all four sources. The final set of 20,752 transcripts included 12,384 transcripts with support from RNA-Seq evidence, 8,616 transcripts with support from proteins of other species and 2,236 transcripts with support from Ensembl chicken or lizard data. A small subset of the transcripts (2,581) was supported by evidence from two sources. [10] Pseudogenes, non-coding genes, Stable Identifiers: The gene set was screened for potential pseudogenes. Before public release the transcripts and translations were given external references cross references to external databases), while translations were searched for domains/signatures of interest and labeled where appropriate. Stable Identifiers were assigned to each gene, transcript, exon and translation. (When annotating a species for the first time, these identifiers are auto-generated. In all subsequent annotations the stable identifiers are propagated based on comparison of the new gene set to the previous gene set.) Small structured non-coding genes were added using annotations taken from RFAM 132 and mirbase 133. The 12

13 final gene set consists of protein coding genes including mitochondrial genes, these contain containing transcripts. A total of 97 pseudogenes were identified and 1018 ncrnas. Gene ontology analysis Over-represented GO IDs were investigated by testing the bias in frequency to other GO IDs among certain gene sets (e.g., genes that were expressed differentially) using the total defined GO files as a control distribution. Fisher s exact test was used for this analysis, with an alpha level of Developmental genes were defined as genes that have developmental GOs, and developmental GOs were defined as GOs having GO: (developmental process) as an ancestor. A total of 5659 developmental GOs were extracted from ver. 1.2 (downloaded on Sep. 2 nd, 2012) of the gene ontology obo formatted file using the obo edit program ( Turtle ultra conserved non-coding elements (TUCNE) Turtle conserved non-coding regions have been determined using a pairwise alignment between P. sinnensis and C. mydas genomes and filtering out: (i) regions with a good alignment with human, chicken or anole lizard. (ii) regions that are coding. (iii) final elements that are shorter than 30 bp (this can happen if an element partially overlaps a coding element). The turtle ultra conserved regions were searched using a 30bp sliding window with perfect matches in the P. sinnensis - C. mydas alignment and no match in any other alignment. After removing all regions that correspond to repeats (and delete elements shorter than 30bp), the conserved regions were further filtered into those reside within upstream region (10kb upstream from start codon) of each gene. Turtle-specific genes The genes existing in both the soft-shell turtle and green sea turtle genomes, but not in C. mydas, A. mississippiensis, C. porosus, G. gallus, T. guttata, A. carolinensis, C. familiaris, H. sapiens, O. anatinus, and X. tropicalis, were extracted. Then enrichment analysis for the turtle-specific genes were performed based on the algorithm presented by GOstat 97, with the soft-shell turtle genes (Ps-ens_genes) as the control. The p-value was approximated by the chi-square test. Fisher s exact test was used when any expected value was below 5, which will make the chi-square test inaccurate. This program was implemented as a pipeline 134. To provide succinct results in the GO and IPR enrichment analyses, if one of the items was ancestral to another and the enriched gene list of these two items was same, the ancestral item was deleted from the results. To adjust for multiple testing, we calculated the False Discovery Rate (FDR) using the Benjamini-Hochberg method 151 for each class. We enriched 8 GO categories and 3 IPR categories. Phylogenetic tree reconstruction To reconstruct the phylogenetic tree, we used single-copy gene families conserved among P. sinensis (Ps-BGI_gene), C. mydas, A. mississippiensis, C. porosus, G. gallus, T. guttata, A. carolinensis, C. familiaris, H. sapiens, O. anatinus, and X. tropicalis. Single-copy gene families were defined as follows: in each family each species has just one gene copy. By using above determined single-copy gene families, we next extracted CDS sequences from each single-copy family and made CDS sequence alignments guided by its amino acid alignments created by MUSCLE program 60. The sequences were then concatenated to one super gene sequence for each species. Codon position 1, position 2, position 3, position 1+2 sequences were extracted from CDS alignments and were concatenated and aligned, and respectively used for building trees, along with protein, CDS sequences. Then, PhyML 47,48 was applied to construct the phylogenetic tree under HKY85+gamma or GTR+gamma model for nucleotide sequences and JTT+gamma model for protein sequences. alrt values were taken to assess the branch reliability in PhyML. Also, RAxML 80 was applied for the same set of sequences to build phylogenetic tree under GTR+gamma or JTT+gamma model for nucleotide and protein sequences respectively, 1,000 times of rapid bootstrap were employed to assess the branch reliability in RAxML. Divergence time estimation The same set of codon position 1+2 sequences that was used for phylogenetic tree construction was used for estimating divergence time. Fossil calibration times were set as described in Table 1.1. The PAML mcmctree (PAML version 4.5) program was used to determine split times with the approximate likelihood calculation method and the correlated molecular clock and REV substitution model. The shape and scale parameters describing the gamma prior for the overall substitution rates were set according to the substitution rate per time unit computed by PAML baseml 52. The alpha parameter for gamma rates at sites was set to that computed by PAML baseml. The MCMC process of PAML mcmctree was set to sample 10,000 times with the sample frequency set to 5,000 after a burn-in of 5,000,000 iterations. The fine-tuned parameter was set to make the acceptance proportions fall in the interval (20%, 40%). The other parameters were set at the default values. Tracer (v1.5.0) was applied to check convergence, and two independent runs were performed to confirm convergence. Additionally, codon position 1, codon position 2, and protein sequences were used to estimate divergence time with similar methods with PAML mcmctree. When the multidivtime 135,136 program was used to calculate split time, the MCMC chain was run for 10,000,000 generations as burnin and approximately 50,000,000 generations to calculate posterior distributions. The Western clawed frog was identified as an outgroup taxon in the estimation and was discarded from the tree by multidivtime. Other parameters were set as suggested in the manual. Likelihood values in the first (baseml) and second (estbnew 135,136 ) steps were checked to ensure the global optimizations were reached. Meanwhile, two independent runs were performed to check convergence. For r8s 137, the maximum likelihood trees inferred by RAxML (with branch lengths) were used as input to calculate split times in the global molecular clock with default settings. Gene loss analysis We used the protein sequences of the two turtles and their related species (Gallus gallus, Anolis carolinensis, Xenopus tropicalis and Taeniopygia guttata) blasted against the human protein sequences (downloaded from Ensembl gene v.68). The proteins with blast hits to 13

14 human proteins (threshold of identity 30 and align ratio 30) were identified as the homologs of the human protein. Subsequently, the human proteins that lacked homologs in both of the turtles but had homologs in some of the other related species (Gallus gallus, Anolis carolinensis, Xenopus tropicalis and Taeniopygia guttata) were identified as turtle lost genes. Statistical analysis of gene family expansion and contractions (E/C genes) We generated pairwise whole genome alignments for anole lizard - soft-shell turtle and anole lizard - green sea turtle using LASTZ 38,53. Subsequently, we created three-way alignments using MULTIZ 54, which showed approximately 61 Mbp of conserved genome alignment sequences. When an anole lizard gene fell in an area of conserved sequence (overlap> bp), and there was no homologous gene in the corresponding aligned sequences of soft-shell turtle and green sea turtle (Ps-ens_gene) we hypothesized that a potential gene loss occurred at that locus for the turtle genomes. Further, we added a -Kb extension at each end of the conserved region for the two turtle genomes to find genes by GENEWISE (2.2.0) using the corresponding anole lizard gene as a query. When the alignment rate of a predicted homologous gene fragment in the synteny locus was less than 30% compared to the query sequence, and there were no large gaps in the neighboring genome sequence, we considered it a gene loss. Additionally, when there was a frame-shift or premature termination in the predicted homologous gene fragment at the synteny locus, we also considered this to be a gene loss. For each gene family we estimated rates of gene gain and loss using the CAFE software 83. This software models gene family evolution as a stochastic birth-and-death process where genes are gained and lost independently along each branch of a phylogenetic tree. We used a species tree containing the Chinese soft-shell turtle, the green sea turtle plus 10 additional species (human, dog, platypus, chicken, zebra finch, anole lizard, alligator, crocodile, western clawed frog and medaka). In CAFE, the lambda parameter describes the rate of change as the probability that a gene family either expands or contracts (via gene gain and loss) per gene per million years. We estimate over gene gains and losses per million years for both turtles and the lineage from their common ancestors. CAFE assumes at least one member of each gene-family in the root of the species tree, for this reason only the gene families having at least 1 gene at the root of the species tree were used. Genes in these expanded or contracted families were defined as E/C genes (See also supplementary Table 16). We then analyzed gene gains and losses at the gene-family level focusing on those families having significant changes in any of the two turtles or in their ancestors (two turtles and sauropsis). In these analyses, we run CAFE several times grouping the gene families by lowest common ancestor to preserve the assumption of at least one member at the root of the tree. The significant families were further inspected manually. Pseudogene / frame-shifted gene detection Human proteins (Ensembl release 61) were used as the reference proteins to detect the pseudogenes in turtles. Firstly, for each gene, we chose the longest transcript as the homolog in the two turtles, and TblastN 56 was used to identify the probable location of each gene in the turtle genomes with the parameter of -m 8 -a 4 -F F -e 1e-5. GeneWise 43 was then used to identify the gene structure of turtles with the parameter of -genesf -trev -quiet. After obtaining the gene structures for each species, the genes that were defined to be pseudogenes in both turtles were used for further analysis. Subsequently, manual checking was performed to test whether these mutations were falsely caused by bad assembly results or incorrect homolog prediction, and the false positive cases were filtered out. Transcripts of the human genes were mapped to the turtle genomes to ascertain whether possible splicing variants exist that skip the mutated exon(s). Predictions of Olfactory Receptor (OR) Genes The methods we utilized for identifying OR genes were performed as described previously 55 with minor modifications. We first conducted TBLASTN 56 searches against the genome sequences of a given species with known OR genes as queries. For query sequences, we used 119 functional OR genes in human, mouse, and zebrafish, all of which show 50% or less amino acid identities to one another. We did this rather than using the same 920 OR genes that were used in our previous study 55 to reduce the computational time. The query genes include OR genes from groups δ, ε, ζ, and η that are absent from the amniote genomes 91. We used an E-value cutoff of 1e-5 for the TBLASTN searches. The OR gene identification methods were the same as those previously described 55 with the exception of this first-round TBLASTN search. To construct the phylogenetic tree in Figure 1C, the translated amino acid sequences of OR genes were aligned using the program E-INS-i in MAFFT 57. Poisson correction distances were calculated after all the alignment gaps were eliminated. We then constructed a phylogenetic tree from these distances by the neighbor-joining (NJ) method 58 using the program LINTREE 59 (available at The numbers of OR genes in ancestral species and the numbers of gene gains and losses in evolution (Figure 2b) were calculated by the reconciled tree method described in Niimura et al. 55. We used 70% as the bootstrap value cutoff. Group β genes were used as the out-group for the estimation. The programs are available at Genes of accelerated evolutionary rate in the turtle lineage Homologous genes in soft-shell turtle, green sea turtle, and other related species (chicken, zebra finch, anole lizard, Xenopus tropicalis and platypus) were first detected by the all-versus-all blastp program. The orthologs were defined by a reciprocal best blast hit between human and the other species. The full orthologous gene sets were aligned using the program MUSCLE 60. We then compared a series of evolutionary models within the likelihood framework using the phylogenetic tree obtained by our analysis. A branch model 52 was used to detect the average length (ω) across the tree (ω0), the ω of the ancestor of soft-shell turtle, the green sea turtle branch (ω2) and the ω of all of the other branches (ω1). 14

15 Embryo sampling and mrna extraction Fertilized soft-shell turtle and chicken eggs were obtained from local farms in Japan. Turtles eggs were obtained during the breeding season, which ranges from June to September. After allowing the eggs to grow in a humidified incubator (30 C for turtle eggs and 38 C for chicken eggs), the developmental stages were determined based on previous descriptions (the TK stage for turtle 24 and the HH stage for chicken 61, Supplementary Figure 13), and the eggs were collected. Amniotic membranes (a yellow-like region in the gastrula ~ TK9 embryos for turtle embryos, and a primitive streak ~ HH16 for chicken embryos) were removed before mrna extraction, and more than two individual embryos were pooled for each sample. For the mrna extraction, excised staged embryos (2 ~ 60 individuals depending on stage) were quickly frozen, crushed in liquid nitrogen, and total RNA was extracted using the RNeasy Lipid Tissue kit (QIAGEN, cat. #: 74804, 75842). After testing the integrity and purity of the RNA with gel electrophoresis and an absorption spectrometer, the mrna was extracted using the Ambion MicroPoly (A) Purist kit (Life technologies cat. # AM1919). To obtain samples with minimal rrna contamination, up to µg of total RNA was used for this step. The quality of purified mrna samples was checked with an Agilent 2 BioAnalyzer before preparing the libraries for sequencing. RNA-Seq for transcriptome identification [1] 454 Titanium sequencing: Six independent libraries were constructed from soft-shell turtle embryos at the gastrula, TK5, TK7, TK12, TK18, and TK26 stage. For each sample, sequencing libraries were constructed by following the manufacturer s standard protocol. [2] HiSeq Strand-specific RNA-Seq: Approximately equal amounts of mrnas from each embryonic stage were mixed together, and libraries were constructed for further sequencing. Two libraries were prepared by methods that retain strand-specific information. One of these is a dutp-based method 62,63, which was modified to comply with Illumina s TruSeq RNA sample prep kit (Ps_stranded_RNA-Seq_dUTP); the other is an original method developed at BGI (Ps_stranded_RNA-Seq_BGI). In brief, the dutp-based method takes advantage of complementary strand-specific dutp degradation. The original method begins with fragmentation of mrna and reverse transcription to synthesize first strand cdna. Next, the second strand of cdna was synthesized, the ends were repaired, 3' adenosines were added along with adapters, and agarose gel electrophoresis was run to retrieve the fragments. For the version developed at BGI, after digesting the products with the Uracil-N-glycosylase (UNG) enzyme, we PCR-amplified and gel-purified to obtain the cdna library (Ps_stranded_RNA-Seq_BGI). Sequencing was performed with Illumina HiSeq 2000 (paired-end, bp) followed by read clean up with cutadapt 138 (ver 1.0) with the options -q 20 -O 4 -e 0.1 -m 50 --discard 139. In brief, we trimmed low-quality (quality score lower than Pred 20) ends and adapter sequences (minimum overlap 4 bp, allowing 10% of mismatch) and discarded reads shorter than 50 bp. The RNA-Seq data are available through DRA under the accession number DRA (ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/). [3] HiSeq non-stranded RNA-Seq: The deep sequencing data for RNA-Seq for gene expression analysis (see following sections) was also used for transcriptome identification. See Gene prediction by Ensembl-prediction-pipeline for details regarding how we integrated the data for transcriptome identification and gene predictions. De novo transcriptome assembly Using the two RNA-Seq reads with strand information, we first merged the pair-reads that had overlapping regions using SeqPrep 140 (with default settings). Next, de novo transcriptome assembly was performed by Trinity 141,142 <ver > with default parameter settings except the heap space setting for the butterfly program (--bflyheapspace 300G). RNA-Seq for gene expression analysis. Samples at each stage were composed of at least two or more individual embryos to obtain ample RNA and to average the fluctuations derived from individual embryos. Biological replicates for each developmental stage were created from an independent sample pool so that proper statistical populations could be estimated. For each mrna sample prepared (see Embryo sampling and mrna extraction section), a barcoded sequencing library was constructed using the standard protocol of the TruSeq RNA Sample Prep Kit (Illumina) with a minor modification for RNA fragmentation (four minutes instead of eight minutes at 94ºC, no poly (A) selection). Multiplex sequencing with 101bp single-end reads was performed on an Illumina HiSeq2000 instrument at the NIBB Core Research Facilities. This was followed by raw data processing, base calling and quality control by the manufacturer s standard pipeline using RTA, OLB and CASAVA. The quality of the output sequences was inspected using the FastQC program ( Adapter sequences were trimmed using the cutadapt program 138 (ver 1.0) with the options -q 20 -O 4 -e 0.1. Sequencing depth of RNA-Seq: The curves in Supplementary Figure 9A suggest that the HiSeq reads for each sample were large enough to cover almost all the expressed genes in each sample, which is practically accepted as being less biased by read depth. Nonetheless, unfavorable biases are noteworthy as noted by a recent study 143. We therefore made a depth-adjusted dataset for further expression analyses by randomly selecting mapped 10 Mega reads. Random selection of mapped reads is more advantageous than simply making a random selection from raw reads. This is because mapped reads are theoretically free of the bias that arises from different levels of rrna contamination. Obtaining gene expression scores from RNA-Seq data Clean reads of each sample were mapped to the genome using the aln command of bwa 65 software (ver r16) with the -t 12 option. For reads from turtle and chicken samples, mapping was performed against the assembled soft-shell turtle genome and the chicken genome sequences downloaded from the Ensembl database (Ensembl66), respectively. SAMtools 66, BEDtools 67 and the DEGseq package 68 for the R program <ver > were used for calculating 15

16 count data that mapped to coding regions. Only the coding sequences of soft-shell turtle <Ps-ens_gene version, available through the Ensembl website> and chicken <Ensembl66> gene sets were used as references for mapped tag counting. Comparison of gene expression profiles In comparing the gene expression profiles of soft-shell turtle and chicken embryos, we first determined the genes to compare by defining one-to-one orthologous genes by Reciprocal Best BLAST Hit (RBBH). A Blast search was performed between the coding sequences of soft-shell turtle and chicken <Ensembl 66> using the BLASTP algorithm of NCBI BLAST+ 64 <ver > with an E-value of 1e-5 (10-5 ). In total, we found ortholog-pairs. We next normalized gene expression scores by two independent methods to deduce a robust conclusion from these methods (RPKM and TMM normalization 69 ). Normalization was performed with all samples at once so that similarity scores obtained from each sample pair were analytically comparable. After obtaining the normalized expression scores of genes for each sample, the scores were log (base=2) transformed and plotted to make a scattergraph to compare the expression scores of two samples. Next, the distributions of the scattergraphs were evaluated either by the Pearson product-moment correlation coefficient, the Spearman correlation coefficient, total Euclidean distances (t-euclidean), or total Manhattan distances (t-manhattan) to estimate the similarities in the gene expression profiles of the two samples being compared. Because t-euclidean and t-manhattan distances require a rectangular coordinate system as a prerequisite, the mean quantile normalization was utilized to meet this requirement before the calculation of distance. For analysis with mapped-10m reads, two independent random selections of mapped-10m reads were performed, and we averaged the mapped counts to make hypothetical data files. Most conserved stages were estimated by comparing the similarities of gene expression profiles among all pairs of turtle-chicken embryos (9 turtle stages x 8 chicken stages = 72 pairs). Combinations of two biological replicates for each sample yielded 2 x 2 = 4 data points for each turtle-chicken pair, and these values were compared by the Welch two-sample t-test or the Wilcoxon signed-rank test based on satisfactions of statistical requirements. The Holm-corrected alpha level was applied for these multiple comparisons. Only results reproduced by the dataset from two different normalizations (RPKM 70, TMM 69 ) were considered to be significant. For Reciprocal Best Hit Stages (RBTS) analysis, stages that exhibited the most similar gene expression profiles were tested by the Welch two-sample t-test or the Wilcoxon signed-rank test based on satisfactions of statistical requirements with a Holm-corrected alpha level. soft-shell turtle genes that showed a significant increase in expression level after the phylotypic period (IAP). The turtle IAP genes were screened by the following criteria: (1) mean expression level after the phylotypic period (stages that begin to show turtle-specific morphologies, TK15-TK23) is more than five times higher (Wilcoxon test, alpha level = 0.01) than those of earlier stages (gastrula, neurula, TK7 and TK9). (2) The chicken orthologs (if there are any) of turtle IAP genes do not show such increases (the average expression levels in HH28 & HH38 do not show more than five times higher expression than those in the Prim-HH14 stages). Wnt gene identification, cloning and whole-mount in situ hybridization Wnt genes were identified by TBLASTN against the genomic sequence or Illumina RNA-seq data assembled by Trinity 141, using the corresponding Hox amino acid sequences of mouse, chicken or anole lizard as queries. The corresponding genomic sequences were retrieved and a model was predicted by GeneScan (ver. 1.0) 125 or GeneWise2 (ver ). The orthology of the predicted genes was confirmed by BLASTP against the NCBI Genbank database. cdna was generated from mrna extracted from different stages of soft-shell turtle using the GeneRacer Kit (Invitrogen). We designed primers to clone partial sequences of the 20 Wnt genes. The sequences of the primers, vectors and GenBank accession numbers of the different genes are listed in the Supplementary Table 27. The clones were used to ribosynthesise digoxigenin-labeled RNA probes according to manufacturer s protocol (Roche). The whole-mount in situ hybridizations were performed as previously described 28. microrna extraction, prediction, and expression analysis We microdissected limbs, body walls and CRs from 20 embryos of the Chinese soft-shell turtle. The microdissections were performed in cold PBS, and the tissues were stored in RNAlater (Ambion) at -20 C. The small RNA phase was extracted from the dissected tissues the mirvana TM microrna Isolation Kit (Ambion, Life Technologies). Small RNA libraries were prepared and sequenced using a HiSeq2000 platform (Illumina) at the Beijing Genomics Institute. We obtained 24,168,754 reads for the CR, 24,772,037 reads for the limbs and 28,025,813 reads for the body walls. The sequencing depth for each tissue was 63,020X for CR, 84,540X for limb, and 62,495X for body wall tissues (calculated against predicted mature sequences). We used mirdeep2 (ver ) software, which is based in the Perl programming language 71. Briefly, the soft-shell turtle genome v1.0 was indexed by bowtie-build software using default parameters. The reads were manipulated using the mapper module of mirdeep2 to trim adapters, eliminate reads smaller than 18 nucleotides long, collapse those with the same sequence and map them to the genome index. The reads that mapped to the genome were used by the mirdeep2 core algorithm to predict probable hairpin structures around a stack of a minimum number of reads determined by the internal controls of the software 71. We used the mirdeep2 module with default parameters except for the score cutoff, for which we used the option b 5. This was the value that yielded a signal-to-noise ratio larger than 15, which yields a condition that is 50% more stringent than that which was used in the description of the algorithm 71. Additionally, we selected mature microrna sequences and their precursors from chicken, zebrafinch and Anolis carolinensis from mirbase v.18 to serve as micrornas from related species. Using these micrornas, mirdeep2 identified predicted micrornas with the same seed sequence. As a result, the 16

17 programs generated a list of micrornas for every sample (Supplementary_Table 23.xls). Only microrna predictions that were lower than a significant Randfold alpha level (P < 0.05 mononucleotide shuffling, 999 permutations, see Friedländer et al for details) were taken into account for subsequent comparisons. The predictions were manipulated by normal bash shell scripts to sort unique sequences and compare common vs. differential predictions among samples. The predicted mirnas were aligned against the chiken or green sea turtle genomes using bowtie with k 1 and best parameters, allowing 0 (-v 0), 1 (-v 1) or 3 (-v 3) mismatches. microrna target predictions Annotated 3 -UTRs of P. sinensis transcripts were obtained from Ensembl build 68 using BioMart. Because some genes produce different alternatively spliced transcripts, the number of genes is smaller than that of transcripts (see Supplementary Table 25). The prediction of the targets was performed with miranda 72 <v. 3.3a> according to the method reported by John et al. 144 Statistical tests To avoid an inflated type I error rate, an alpha level of 0.01 (further Bonferronni correction in case of multiple comparisons) was accepted for statistical significance throughout the analyses unless otherwise specified. Statistical methods were carefully chosen to properly reflect the population of interest. The Welch two-sample t-test was used for two-sample comparison when the data passed the Kolmogorov Smirnov test for normal distribution; otherwise, the Wilcoxon signed-rank test was used. Software and computation environment Data processing and command pipelining were performed using customized Python, Perl scripts, R ( and C shell scripts. Heavy calculations were performed using the cluster computers at RIKEN, NIBB, BGI, and the Sanger Institute. * Species names (general name: binomial name) Green sea turtle: C. mydas Soft-shell turtle: P. sinensis Chicken: G. gallus Zebra finch: T. guttata Saltwater crocodile: C. porosus American alligator: A. mississippiensis Anole lizard: A. carolinensis Dog: C. familiaris Human: H. sapiens Platypus: O. anatinus Western clawed frog: X. tropicalis Medaka: O. latipes Bottlenose dolphin: T. truncatus 17

18 Supplementary Figures and Tables Supplementary Figure 1. Skeletal structure of a Chinese soft-shell turtle The skeletal structure of a Chinese soft-shell turtle (P. sinensis) is illustrated here. The turtle shell consists of a dorsal shell called a carapace (colored in blue) and a ventral shell called a plastron. Note that the carapace consists of axial skeleton (vertebrae and ribs) and dermal bone, which is one of the unique features of the turtle. Many of the trunk muscles are also absent in adult turtles. Unlike other tetrapods, the shoulder blade (scapula, colored in red) of the turtle is located ventral to the axial skeleton. Insert Size Read Length (bp) Total Data (G bases) soft-shell turtle Sequence Depth (X) Physical Depth (X) Total Data (G bases) green sea turtle Sequence Depth (X) Physical Depth (X) 170 bp bp bp Kbp Kbp Kbp Kbp Kbp Total Supplementary Table 1. Paired-end DNA libraries sequenced for soft-shell turtle and green sea turtle genome assembly. Sequencing depths were calculated for high-quality clean data based on genome size (2.1 Gb for soft-shell turtle and 2.2 Gb for green sea turtle). The basic statistics were performed on cleaned sequencing data. 18

19 soft-shell turtle Contig Scaffold Size (bp) Number Size (bp) Number N90 5,526, , N80 9,577 71,950 1,039, N70 13,470 53,421 1,686, N60 17,505 39,658 2,486, N50 21,907 28,859 3,331, Longest 177, ,023, Total Size 2,114,220, ,210,337, Total Number (>= bp) , ,151 Total Number (>=2 kb) , ,548 green sea turtle Contig Scaffold Size (bp) Number Size (bp) Number N90 3, , , N80 8,131 79,427 1,258, N70 12,065 57,970 2,114, N60 16,060 42,643 3,012, N50 20,392 30,830 3,777, Longest 177, ,916, Total Size 2,139,401, ,236,138, Total Number (>= bp) , ,958 Total Number (>=2 kb) , ,442 Supplementary Table 2. Basic statistics of the assembled genomes of soft-shell turtle and green sea turtle. The statistics data were generated based on the original assembly files both for soft-shell turtle and green sea turtle. Supplementary Figure 2. GC content of the two turtle genomes. (a) A comparison of the GC contents of five vertebrates. The x-axis indicates GC content and the y-axis indicates the proportion of the bin number divided by the total windows. We used 500-bp bins (with a 250-bp overlap) sliding along the genome. The data shows that the mode GC content of the soft-shell turtle genome and green sea turtle genome are approximately 44% and 43%, respectively, which is similar to that of the human, chicken and anole lizard genomes. (b-c), GC content and sequencing depth. A scatter plot was made by sliding 50 kb non-overlapping windows against the assembled soft-shell turtle (b) and green sea turtle genome (c), and the GC content and average depth were calculated within the sliding window. The x-axis in the scatter plot represents GC content (%), whereas the y-axis represents the average depth. The average depth was obtained by aligning the filtered reads onto the assembled genome sequence using SOAPaligner and allowing 3 mismatches for 49-bp reads and 5 mismatches for -bp reads. Depth frequency was then calculated for each of the genome bases. Summary graphs are provided as histograms illustrating the frequency at various depths (histogram at the right) and GC contents (histogram at the top). The percentage of sequencing depths below 10 was less than 3% in both genomes, indicating an extremely high sequencing depth covering the whole genome. 19

20 soft-shell turtle green sea turtle Proteins Completeness Completeness Proteins (%) (%) Complete Group Group Group Group Partial Group Group Group Group Supplementary Table 3. Coverage rate of core eukaryotic genes (CEGs) in the assembled genomes by CEGMA. Complete indicates that proteins from 248 core eukaryotic genes (CEGs) were covered by the genome assembly with an alignment length longer than 70%. Partial indicates that CEG proteins were covered by the assembly with a coverage rate that exceeded a pre-computed minimum alignment score. The coverage rates were calculated based on assembly PelSin_1.0 obtained from NCBI after uploading original assembly to NCBI for soft-shell turtle, while based on original assembly before processing and uploading to NCBI for green sea turtle. Supplementary Figure 3. Quality check of the assembled genome by alignment with Sanger-sequenced, randomly picked up 5 control libraries. Red bars and blue bars indicate the scaffold sequences and the Sanger-sequenced control clone sequences, respectively. Scaffold sequences that aligned well to the fosmid clone sequences are indicated in yellow polygons. Further analysis clarified that the region covered by the zhbcxa clone is rich in repetitive sequences (data not shown), which explains its low alignment quality. Sanger sequencing was performed with an ABI 3730 to approximately 6-fold coverage. Sanger reads were assembled by overlapping and filled gaps by further rounds of sequencing to obtain complete maps of the fosmid clones. Alignment of the assembled genome sequences to the five fosmid clones was performed using the BLAST algorithm (blastn, E-value = 1e-20). 20

21 Fosmid ID Length (bp) Coverage Ratio (%) Alignment Blocks Aligned Scaffold Aligned Scaffold Length (bp) Gaps Gap Length (bp) Gap Ratio (%) zhbaxa 33, ,958, zhbbxa 40, ,187, zhbcxa 33, , 4, , zhbexa 39, ,664, zhbfxa 34, ,370, Supplementary Table 4. High coverage of the assembled genome against Sanger-sequenced control libraries. Dataset Ps_454_mRNA-Seq Ps_dUTP_RNA-Seq and Ps_strand_RNA-Seq assembled together Number of EST clusters Total length (bp) Coverage rate of the EST clusters by the assembly (%) With > 90% sequence in one scaffold With > 50% sequence in one scaffold Number Percent Number Percent > 0bp 79,305 80,382, , , > 200bp 71,973 79,257, , , > 500bp 51,023 71,828, , , > 0bp 22,793 52,043, , , > 0bp 307, ,345, , , > 200bp 307, ,345, , , > 500bp 152, ,387, , , > 0bp 69, ,690, , , Supplementary Table 5. EST cluster coverage rate by the soft-shell turtle genome. Transcriptome data generated by 454 sequencing and illumina sequencing (see Methods section) were assembled to produce 103,823 and 355,258 EST clusters respectively. Then we mapped whole genome sequencing reads to these EST clusters using SOAP, and filtered out the EST clusters with low alignment (< 30% alignment or average depth < 5) to remove potential miss-assembled clusters. Remaining 79,305 EST clusters generated by 454 sequencing and 307,462 EST clusters generated by illumina sequencing were used assess the genome assembly quality. 21

22 Supplementary Figure 4. Basic statistics of predicted genes of soft-shell turtle and green sea turtle. Basic statistics of predicted genes of soft-shell turtle (Ps-BGI_gene) and green sea turtle. The x-axis indicates length (bp) of each genetic feature and the y-axis indicates the percentage of genes that have the corresponding length. Features of predicted (GLEAN) gene sets for soft-shell turtle and green sea turtle, together with those of Western clawed frog, chicken and anole lizard (from Ensembl release 64) are illustrated. Supplementary Figure 5. Evolutionarily conserved profiles of genes in two turtles. (a-b) Protein orthology comparisons of predicted genes. (a) Gene orthologues of chicken, zebra finch, green sea turtle, soft-shell turtle, saltwater crocodile, American alligator, anole lizard, dog, human, platypus, Western clawed frog and medaka. 1:1:1 indicates conserved single-copy genes. N:N:N indicates genes that are not 1:1:1, or genes that have sub-family homologs in either of the 12 species. Sauropsida specific indicates Sauropsida specific genes that present in at least two Sauropsida species and not in any non-sauropsida species. Non-Sauropsida specific indicates non-sauropsida specific genes that present in at least two non-sauropsida species and not in any Sauropsida species. Patchy indicates orthologs that are present in at least one Sauropsida species and at least one non-sauropsida species, but not present in all of the 12 species simultaneously. SD, species-specific duplicated genes; ND, species-specific genes. (b) Venn diagram showing the shared orthologous groups among genomes of green sea turtle, soft-shell turtle, anole lizard and human. Numbers of mirnas are not to scale in the Venn diagram. 22

23 Repeat type Proportion (%) in total nucleotide bases Category Subcategory lizard soft-shell turtle green sea turtle chicken ALU SINE MIR Total LINE LINE LINE L3/CR Total ERVL ERVL-MaLRs LTR elements ERV_class I ERV_class II Total hat-charlie DNA elements TcMar-Tigger Total Unclassified Total interspersed repeats Small RNA Satellites Simple repeats Low complexity Total number of bases masked Supplementary Table 6. General statistics of repeat elements in anole lizard, two turtles and chicken. Gene set Number of predicted coding genes Average CDS length (bp) Average exon per gene Average exon length (bp) Average intron length (bp) Soft-shell turtle (Ps-ENS_gene) 18,188 1, ,644 Soft-shell turtle (Ps-BGI_gene) 23,649 1, ,211 Green sea turtle 19,633 1, ,860 Saltwater crocodile 17, ,715 American alligator 17,611 1, ,690 Supplementary Table 7. General statistics of predicted protein-coding genes of turtles and crocodiles. Type Ps-BGI_gene Ps-ens_gene Cm_gene Number Percent (%) Number Percent (%) Number Percent (%) total KOGs one KOG aligns one gene CDS overlap> CDS overlap> one KOG aligns several genes one KOG aligns zero genes Supplementary Table 8. Coverage rate of CEGMA-predicted Eukaryotic Orthologous Group (KOG) genes by annotated gene sequences. Cm_gene and Ps-BGI_gene indicate gene sets predicted by BGI s annotation pipeline for the green sea turtle and soft-shell turtle, respectively. Ps-ens_gene indicates a gene set predicted by the Ensembl annotation for the soft-shell turtle. Statistical results indicated an overlap in gene numbers between KOG genes predicted by CEGMA and the three gene sets in the paper. The coverage of the assemblies was assessed using the CEGMA 115 program (version 2.3)

24 Soft-shell turtle Green sea turtle Genome size (Gb) Total clean data (Gb) Assembly contig N50 length (Kb) Assembly scaffold N50 length (Mb) Number of scaffolds (>2Kb) 4,548 5,442 Percent of repeat (%) GC content (%) Ps-ens_gene Ps-BGI_gene gene Number of genes 19,327 23,649 19,633 Genes with InterPro annotation 16,062 14,719 14,164 Genes with GO annotation 15,154 12,492 12,043 Total length of coding region (bp) 27,939,847 29,981,919 28,581,024 Supplementary Table 9. Genome features of the soft-shell and green sea turtles. Sequence type Gene Sequence length RAxML PhyML number (bp) log (likelihood) Model log (likelihood) Model Protein 1, ,304-8,187,925 JTT+gamma -8,188,695 JTT+gamma Position 1 1, ,304-4,565,465 GTR+gamma -4,565,465 GTR+gamma Position 2 1, ,304-3,808,462 GTR+gamma -3,833,686 HKY85+gamma Position 1+2 1,113 1,826,608-8,422,863 GTR+gamma -8,422,862 GTR+gamma Position 3 1, ,304-7,708,981 GTR+gamma -7,708,981 GTR+gamma CDS 1,113 2,739,912-16,558,373 GTR+gamma -16,579,999 HKY85+gamma Supplementary Table 10. Single-copy gene families identified for phylogenetic analysis. Tree # Tree topology a ΔlogL ±S.E. P-value KH-test SH-test AU-test 1 (((((HS,CF),OA),((((TG,GG),(CP,AM)),(PS,CM)),AC)),XT),OL) ML (((((HS,CF),OA),(((TG,GG),((CP,AM),(PS,CM))),AC)),XT),OL) ± (((((HS,CF),OA),(((CP,AM),((TG,GG),(PS,CM))),AC)),XT),OL) ± (((((HS,CF),OA),((((TG,GG),(CP,AM)),AC),(PS,CM))),XT),OL) ± Supplementary Table 11. Statistical assessment of the phylogenetic position of the turtles using the maximum-likelihood method. a The first letters of the genus and species names are shown for the individual species involved in the analysis (see Supplementary Figure 6 for details). 24

25 a [I] b 0 CDS Medaka Green sea turtle Soft-shell turtle American alligator Saltwater crocodile Chicken Anole lizard Dog Human Platypus Western clawed frog 0 Peptide Green sea turtle Soft-shell turtle Chicken American alligator Saltwater crocodile Anole lizard Dog Human Platypus Western clawed frog Medaka [II] 0 position 1+2 Green sea turtle Soft-shell turtle Chicken American alligator Saltwater crocodile Anole lizard Dog Human Platypus Western clawed frog Medaka 0.3 position 1 position 2 Green sea turtle Soft-shell turtle Chicken American alligator Saltwater crocodile Anole lizard Dog Human Platypus Western clawed frog Medaka Green sea turtle Soft-shell turtle Chicken American alligator Saltwater crocodile Anole lizard Dog Human Platypus Western clawed frog Medaka [III] position 3 American alligator Saltwater crocodile Green sea turtle Soft-shell turtle Chicken Medaka Anole lizard Dog Human Platypus Western clawed frog Supplementary Figure 6. Phylogenetic analysis supports a close relationship between turtle and bird/crocodilian lineages. (a) Three major hypotheses of turtle origin, illustrating turtles as the [I] sister group to the lizard-snake-tuatara (Lepidosauria) clade, [II] sister group to birds and crocodilians (Archosauria), or [III] outside the diapsida (a clade composed of Archosauria and Lepidosauria). (b) Phylogenetic tree of 12 species constructed with RAxML under the GTR/JTT+gamma model based on CDS, peptides, 1 st codon position, 2 nd codon position, 3 rd codon position, and 1 st + 2 nd codon positions of the codon sequences of 1,113 genes. Each tree was run with 1,000 replications. All internal branches of the above trees were % bootstrap supported. A phylogeny based on CDS, peptides, 1 st + 2 nd codon positions, 1 st codon position, 2 nd codon position, and 3 rd codon position sequences is shown in the above figure. Note that all tree topologies, except those from codon position 3 and CDS, support hypothesis [II] of panel a, and none of the trees support hypotheses [I] or [III]. The slightly different topology of trees constructed with CDS and codon position 3 may presumably be due to mutation saturation in the 3 rd codon position as observed in long branches. Method PAML mcmctree Sequence Soft-shell turtle / green sea turtle Crocodiles/birds Turtles/birds type Age (Mya) 95% CI (Mya) Age (Mya) 95% CI (Mya) Age (Mya) 95% CI (Mya) Codon Codon Codon Protein Multidivtime Codon Codon r8s 137 Codon Codon Protein Supplementary Table 12. Estimation of divergence time between turtles and crocodilians / birds. Several different sets of sequences were taken as input to estimate split times. Fossil calibration times used for PAML mcmctree and r8s analysis were described in Supplementary Table 13. While only four calibrations, (2), (3), (4), (7) in Supplementary Table 13 were used for Multidivtime analysis. 25

26 Species 1 Species 2 Lower bound (Mya) Upper bound (Mya) Reference (1) Alligatoridae Crocodylidae Brochu et al (2) Galliformes Passeriformes Benton et al (3) Aves Crocodylidae Benton et al (4) Euarchontoglires Laurasiatheria Benton et al (5) Homo sapiens Ornithorhynchus anatinus Benton et al (6) Sauropsida Mammalia Benton et al (7) Mammalia Amphibia Benton et al (8) Homo sapiens Oryzias latipes Supplementary Table 13. Calibration time points used in split time estimation. a ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 77.3 ( ) 76.7 ( ) 94.5 ( ) Silurian Devonian Carboniferous Permian Triassic Jurassic Cretaceous Paleogene Palaeozoic Mesozoic Cenozoic Green sea turtle Soft-shell turtle American alligator Saltwater crocodile Chicken Anole lizard Dog Human Platypus Western clawed frog Medaka Myr ago b ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 77.4 ( ) 76.8 ( ) 94.8 ( ) Silurian Devonian Carboniferous Permian Triassic Jurassic Cretaceous Paleogene Palaeozoic Mesozoic Cenozoic Green sea turtle Soft-shell turtle American alligator Saltwater crocodile Chicken Anole lizard Dog Human Platypus Western clawed frog Medaka Myr ago Supplementary Figure 7. Estimated divergence time of vertebrate lineages Divergence time of 12 species was estimated by PAML mcmctree based on the 1 st + 2 nd codon positions (a) and 2 nd codon position (b) respectively. Myr is short for Million years. The eight calibration times (dark red circles) were adopted from Brochu et al , Benton et al. 146, and 147, which including Alligatoridae-Crocodylidae divergence (70.6 ~ 83.5 Myr ago), Galliformes-Passeriformes divergence ( Myr ago), Aves-Crocodylidae divergence (235 ~ Myr ago), Euarchontoglires-Laurasiatheria divergence ( Myr ago), Homo sapiens-ornithorhynchus anatinus divergence ( Myr ago), Sauropsida-Mammalia divergence ( Myr ago), Mammalia-Amphibia divergence ( Myr ago), Homo sapiens-oryzias latipes divergence ( Myr ago). 26

27 Family ID Medaka Western clawed frog Anole lizard American alligator Saltwater crocodile Zebra finch Chicken Soft-shell turtle (Ps-BGI_gene) Green sea turtle Platypus Dog Human Turtles/ Others ratio Annotation Olfactory receptor, Class I Olfactory receptor, Class II Olfactory receptor, Class II Olfactory receptor, Class I Olfactory receptor, Class I Zinc finger protein Immunoglobulin V-set Zinc finger protein Olfactory receptor, Class II Immunoglobulin V-set HAT dimerization Olfactory receptor, Class II Immunoglobulin V-set Peptidase S1/S6, chymotrypsin/hap Olfactory receptor, Class I Olfactory receptor, Class II Supplementary Table 14. Gene families expanded in the turtle lineage. For each species, the number of genes belonging to each gene family is shown. Gene families are listed in the descending order with respect to number of soft-shell turtle and green sea turtle genes. Only the gene families containing at least one non-turtle gene are shown. Turtles/Others was calculated as the mean number between t he two turtle species divided by the mean number among the 10 non-turtle species, and the gene families with the value of Turtles/Others >5 are depicted in bold. 27

28 a 0.1 substitutions / site Chicken PrPR-like (EU117423) 91 Anole lizard (ENSACAP ) 48 Green sea turtle (GLEAN_17614) 55 Soft-shell turtle (ENSPSIP ) Western clawed frog (ENSXETP ) Elephant shark (AAVX ) Zebrafish PrPR-like 3 (EU165206) Stickleback (ENSGACP ) Medaka (ENSORLP ) Zebrafish PrPR-like 4 (EU165207) Stickleback (ENSGACP ) Green pufferfish (ENSTNIP ) 59 Takifugu (ENSTRUP ) 42 Medaka (ENSORLP ) 88 Human PRLHR (NP_004239) Dog (ENSCAFP ) 72 Mouse GPR10 (AAQ84215) anole lizard (ENSACAP ) Green sea turtle (GLEAN_11387) Soft-shell turtle (ENSPSIP ) Elephant shark (AAVX ) Opossum (ENSMODP ) Anole lizard (ENSACAP ) Green sea turtle (GLEAN_11388) Chicken ENSGALP ) 38 Soft-shell turtle (ENSPSIP ) 55 Western clawed frog (ENSXETP ) Sea lamprey (ENSPMAP ) 65 Green sea turtle (GLEAN_08538) b 95 Soft-shell turtle (ENSPSIP ) Anole lizard (ENSACAP ) Chicken PrPR-like (ENSGALP ) 51 Zebrafish PrPR (NP_ ) Stickleback (ENSGACP ) Medaka (ENSORLP ) Zebrafish brain PrPR (EU165205) PRLHR1????? Salmon PrPR (DQ083990) 94 PRLHR2??? Green pufferfish (ENSTNIP ) Takifugu (ENSTRUP ) PRLHR3?? Stickleback (ENSGACP ) 98 Medaka (ENSORLP ) PRLHR4????? Amphioxus (XP_ ) PRLHR3 PRLHR4 PRLHR1 sea lamprey elephant shark PRLHR2 teleost fishes Xenopus tropicalis chicken turtles anole lizard platypus opossum eutherian mammals c Human (BC014547) 52 Platypus (ENSOANP ) Myoglobin Chicken (ENSGALP ) Zebra finch (ENSTGUP ) Anole lizard (ENSACAP ) Green sea turtle (GLEAN_09465) 82 Soft-shell turtle (ENSPSIP ) 99 Zebrafish (ENSDARP ) Medaka (ENSORLP ) 62 Chicken (ENSGALP ) Globin E 99 Zebra finch (ENSTGUP ) Green sea turtle (GLEAN_08272) 83 Soft-shell turtle (ENSPSIP ) 77 Green sea turtle (GLEAN_01805) Soft-shell turtle (ENSPSIP ) 97 Anole lizard (ENSACAP ) African clawed frog (AJ635233) Western clawed frog (ENSXETP ) 66 Human (AJ315162) Platypus (ENSOANP ) Cytoglobin Chicken (ENSGALP ) 86 Zebra finch (ENSTGUP ) Anole lizard (ENSACAP ) Green sea turtle (GLEAN_16650) Soft-shell turtle (ENSPSIP ) Western clawed frog (ENSXETP ) Zebrafish (ENSDARP ) 93 Medaka (ENSORLP ) 84 Stickleback (ENSGACP ) Medaka (ENSORLP ) Stickleback (ENSGACP ) Sea lamprey (ENSPMAP ) Cyclostome Hbs Sea lamprey (ENSPMAP ) Sea lamprey (ENSPMAP ) 94 Sea lamprey (ENSPMAP ) Globin Y 0.2 substitutions / site Supplementary Figure 8. PRLHR and globin gene family members are well retained in the turtle lineage. (a) Molecular phylogenetic tree inferred with the maximum-likelihood method. Turtle sequences are in red, and non-turtle sauropsida sequences are in blue. Although the topology of the tree is not always consistent with the results of our phylogenetic analysis (Supplementary Figure 6), this is likely due to the low bootstrap values in this analysis. (b) Orthology table showing the identified gene repertoire of diverse vertebrates. (c) A molecular phylogenetic tree including vertebrate myoglobin, globin E, globin Y, cytoglobin, and cyclostome haemoglobin (Hbs) sequences was inferred with the maximum-likelihood method. Grouping of these genes was supported in a previous study by Hoffmann et al. 148 in which other globin subfamilies, such as neuroglobin, were excluded from this group. Although the topology of the tree is not always consistent with the results of our phylogenetic analysis (Supplementary Figure 4), this is likely due to the low bootstrap values in this analysis. 28

29 α β γ (Class I) (Class I) (Class II) δ ε ζ η Turtles Birds Lizards Mammals Frogs Bony Fishes a) Supplementary Table 19. Turtles have group α, β, and γ OR genes. indicates the presence of OR genes. a) One gene from group γ is present in the zebrafish genome, but members of this group of OR genes are absent from all other fish genomes examined 91. Class I Class II Number of Number of Pseudo- Total number of α (%) β γ Intact Genes genes or Gene Fragments OR Genes soft-shell turtle 532 (46.8) green sea turtle 158 (62.2) Chicken 9 (4.3) Zebra finch 2 (1.1) Anole lizard 1 (0.9) Human 61 (15.4) Rat 132 (10.9) Dog 159 (19.6) Western clawed frog 8 (1.0) * Supplementary Table 20. A large number of OR genes is found in turtles. The entire OR gene repertoire from the zebra finch genome 149 was identified based on the same methods used for the two turtle genomes. The numbers of OR genes from chicken, anole lizard, and Western clawed frog were adopted from Niimura , human was adopted from Matsui et al , and rat and dog were adopted from Niimura et al * The sum of the group α, β, and γ genes is not equal to this number because it includes OR genes that belong to other groups. 29

30 GO_ID GO_Term GO_Class GO_levl Gene_Num FDR GO: sensory perception of taste BP E-31 GO: taste receptor activity MF E-14 GO: detection of chemical stimulus involved in sensory perception of taste BP E-12 GO: sensory perception BP E-10 GO: detection of chemical stimulus BP E-06 GO: keratin filament CC E-06 GO: virion CC E-05 GO: structural constituent of tooth enamel MF E-05 GO: detection of stimulus BP E-04 GO: response to pheromone BP E-04 GO: pheromone receptor activity MF E-04 GO: MHC class I protein complex CC E-03 GO: G-protein coupled receptor activity MF E-03 GO: protein localization to chromosome, telomeric region BP E-03 GO: endopeptidase inhibitor activity MF E-03 GO: bitter taste receptor activity MF E-03 GO: centromeric core chromatin assembly BP E-03 GO: extracellular region CC E-03 GO: signaling receptor activity MF E-02 GO: negative regulation of telomere maintenance BP E-02 GO: G-protein coupled receptor protein signaling pathway BP E-02 GO: natural killer cell activation BP E-02 GO: transmembrane signaling receptor activity MF E-02 GO: chemokine receptor binding MF E-02 GO: sensory perception of bitter taste BP E-02 GO: cytokine activity MF E-02 GO: serine-type endopeptidase inhibitor activity MF E-02 GO: Slx1-Slx4 complex CC E-02 GO: nuclear telomere cap complex CC E-02 GO: positive regulation of leukocyte chemotaxis BP E-02 GO: positive regulation of response to external stimulus BP E-02 GO: G-protein-coupled receptor binding MF E-02 GO: chemokine activity MF E-02 GO: gamma-tubulin complex CC E-02 GO: DNA double-strand break processing involved in repair via single-strand annealing BP E-02 GO: crossover junction endodeoxyribonuclease activity MF E-02 GO: protection from non-homologous end joining at telomere BP E-02 GO: immunoglobulin receptor activity MF E-02 GO: positive regulation of natural killer cell activation BP E-02 GO: system process BP E-02 GO: positive regulation of behavior BP E-02 GO: telomere maintenance via TERF2IPTERF BP E-02 GO: neurological system process BP E-02 GO: telomere assembly BP E-02 GO: negative regulation of telomere maintenance via telomerase BP E-02 Supplementary Table 21. GO enrichment analysis of genes lost in the turtle lineage. An enrichment analysis for the genes lost in both turtles was performed based on the algorithm presented by GOstat 97 using human genes as the background. The p-value was approximated using the chi-square test. Fisher s exact test was used when any gene count was below 5, which will make the chi-square test inaccurate. This program was implemented as a pipeline 134. To provide succinct results in the GO and IPR enrichment analyses, if one of the items was ancestral to another and the enriched gene list of these two items was same, the ancestral item was deleted from the results. To adjust for multiple testing, we calculated the False Discovery Rate (FDR) using the Benjamini-Hochberg method 151 for each class. 30

31 IPR_ID IPR_Title Gene_Num P-value IPR Krueppel-associated box E-18 IPR Olfactory receptor E-18 IPR GPCR, rhodopsin-like, 7TM E-17 IPR Zinc finger, C2H2-like E-13 IPR Zinc finger, C2H2-type E-12 IPR ARF/SAR superfamily IPR Ubiquitin conserved site IPR Mammalian taste receptor IPR Palmitoyl protein thioesterase IPR Zinc finger, RING-type, conserved site IPR Short-chain dehydrogenase/reductase, conserved site IPR Ubiquitin IPR Ubiquitin supergroup IPR Protein of unknown function DUF IPR Short-chain dehydrogenase/reductase SDR IPR Apolipoprotein M IPR Glucose/ribitol dehydrogenase IPR Zinc finger, C3HC4 RING-type Supplementary Table 22. IPR enrichment analysis of genes lost in both turtles. Symbol Gene name w0 w1 w2 (average) (other) (turtle) p-value MGST3 microsomal glutathione S-transferase E-04 ABCB1 ATP-binding cassette, sub-family B (MDR/TAP), member E+00 FAH fumarylacetoacetate hydrolase (fumarylacetoacetase) E-04 RFC4 replication factor C (activator 1) 4, 37kDa E-06 HEATR2 HEAT repeat containing E-03 APOBEC2 apolipoprotein B mrna editing enzyme, catalytic polypeptide-like E-03 SCYL3 SCY1-like 3 (S. cerevisiae) E-03 PDC phosducin E-04 METTL15 methyltransferase like E-03 MFSD1 major facilitator superfamily domain containing E-04 EIF4H eukaryotic translation initiation factor 4H E-03 SLC38A4 solute carrier family 38, member E-04 DDX20 DEAD (Asp-Glu-Ala-Asp) box polypeptide E-03 ABCE1 ATP-binding cassette, sub-family E (OABP), member E-04 MSH3 muts homolog 3 (E. coli) E-05 ADHFE1 alcohol dehydrogenase, iron containing, E-04 HRH3 histamine receptor H E-03 SLC34A2 solute carrier family 34 (sodium phosphate), member E-04 NDFIP2 Nedd4 family interacting protein E-04 MIPEP mitochondrial intermediate peptidase E-04 VAT1L vesicle amine transport protein 1 homolog (T. californica)-like E-07 CPNE3 copine III E-07 TRAK2 trafficking protein, kinesin binding E-09 RAD21 RAD21 homolog (S. pombe) E-05 VPS13C vacuolar protein sorting 13 homolog C (S. cerevisiae) E-04 CLUAP1 clusterin associated protein E-03 COL11A1 collagen, type XI, alpha E-05 SLC40A1 solute carrier family 40 (iron-regulated transporter), member E-03 ZBTB37 zinc finger and BTB domain containing E+00 JAK1 Janus kinase E-06 DENND4C DENN/MADD domain containing 4C E-04 ANKIB1 ankyrin repeat and IBR domain containing E-06 MKX mohawk homeobox E-05 31

32 PPID peptidylprolyl isomerase D E-04 DEPDC1B DEP domain containing 1B E-04 MORC3 MORC family CW-type zinc finger E-04 IBTK inhibitor of Bruton agammaglobulinemia tyrosine kinase E-03 KIT v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog E-04 GALNS galactosamine (N-acetyl)-6-sulfate sulfatase E-04 TMEM106B transmembrane protein 106B E-04 OSBPL8 oxysterol binding protein-like E-05 NUP107 nucleoporin 107kDa E-10 ITSN2 intersectin E-06 NR5A2 nuclear receptor subfamily 5, group A, member E-07 WDR69 WD repeat domain E-05 BMPR1B bone morphogenetic protein receptor, type IB E-08 TTC21B tetratricopeptide repeat domain 21B E-04 MYBPC1 myosin binding protein C, slow type E-05 PARP1 poly (ADP-ribose) polymerase E-04 ADCY1 adenylate cyclase 1 (brain) E-05 Supplementary Table 24. Genes with accelerated evolutionary rates in the turtle lineage. Orthologous genes between soft-shell turtle, green sea turtle, and other related species (chicken, zebra finch, anole lizard, Xenopus tropicalis and platypus) were aligned using the program MUSCLE 60 and compared with a series of evolutionary models in the likelihood framework using the phylogenetic tree obtained by our analysis. A branch model 52 was used to detect the average dn/ds ratio (ω) across the tree (ω0), the ω of the ancestor of soft-shell turtle, the green sea turtle branch (ω2) and the ω of all of the other branches (ω1). Supplementary Figure 9. Early to late stages of turtle and chicken embryos. Representative images of turtle and chicken embryos used in this study are shown (images are not to scale). TK (Tokita-Kuratani) stages for turtle 24 and HH stages for chicken 61 are denoted for each image. 32

33 Supplementary Figure 10. Gene Ontology (Biological Process) profiles of turtle-chicken orthologous genes Cumulative bar plots showing biological process-related slim Gene Ontologies (slim GO) of C. mydas (Cm), P. sinensis (Ps) and G. gallus (Gg) genes. The slim GOs of orthologous genes used for the comparative expression analysis are shown in between the Ps and Gg barplots (Ps_gg, soft-shell turtle orthologs; Gg_ps, chicken orthologs). Importantly, the similar profile between Ps_gg/Gg_ps confirms that the gene set used for the GXP analysis is comprehensive and not biased in terms of the GO predicted biological processes. Supplementary Figure 11. Saturating sequencing depth and detected gene number in each sample. Plots of the expressed gene count versus the simulated RNA-Seq mapped read number show that the RNA-Seq reads for each sample are at a quasi-saturation point. The red bar represents the percentile of the gene repertoire detected in either sample; the blue lines indicate the height of the total gene number. (a-b) Detected genes were counted against a random selection of mapped reads for soft-shell turtle (a) and chicken (b) samples. (c-d) Detected genes among the orthologues were counted against a random selections of mapped reads for soft-shell turtle (c) and chicken (d) samples. The actual read numbers are larger than the numbers of mapped reads. The x-axis indicates the total number of reads for each sample, and the y-axis indicates the number of genes detected (at least one tag mapped to gene coding regions). Tips of the colored pins indicate the read number and expressed genes of each sample. Error bars in each curve indicate the standard deviation produced from two independent random selections. Ps-ens_gene sets were used for this analysis. Essentially the same saturation curve was also observed for Ps-BGI_gene sets (data not shown). 33

34 Supplementary Figure 12. Hourglass-like gene expression divergence between selected stages of soft-shell turtle and chicken. Gene expression diversities were calculated for five embryos from soft-shell turtle and chicken with various distance calculation methods and datasets of various depth. (a-d) Expression divergence calculated with the all-reads data. (a) Expression divergence calculated with the 1 - Pearson correlation coefficiency (b) Expression divergence calculated with the 1 - Spearman correlation coefficiency. (c) Expression divergence calculated with the total Euclidean distance. (d) Expression divergence calculated with the total Manhattan distance. (e-g) Expression divergence calculated with the mapped-10m reads data. (e) Expression divergence calculated with the 1 - Pearson correlation coefficiency. (f) Expression divergence calculated with the total Euclidean distance. (g) Expression divergence calculated with the total Manhattan distance. P values were calculated by ANOVA with heteroskedasticity. Note that similar tendencies are observed among the various distance evaluation methods. Essentially the same results were obtained for the RPKM-normalized data set (data not shown). A similar saturation curve was also observed for Ps-BGI_gene sets (data not shown). Error bars: S.D. Supplementary Figure 13. The number of genes expressed does not correlate with the highest GXP similarity in the mid-embryonic stages. The bar-plots indicate the number of genes detected (> 1 read count) during soft-shell turtle (left) and chicken (right) embryogenesis (read depth controlled data: mapped 10 M). Although the number of genes detected in each developmental stage showed statistically significant differences (tested by ANOVA, Alpha level = 0.01), no correlation with the conserved expression profiles found in the mid-embryonic stages (Supplementary Figure 18) was observed. In contrast, the number of genes detected in each developmental stage showed a moderate increase during development (5-6 % increase between the earliest and latest stages). Essentially the same saturation curve was also observed for the Ps-BGI_gene sets (data not shown). Error bars: S.D. 34

35 Supplementary Figure 14. All-to-all comparison of turtle-chicken embryos reveals the period of most similar GXP. Gene expression divergence scores were compared among all the combinations of paired embryos from soft-shell turtle and chicken (9 Ps stages x 8 Gg stages = 72 combinations) using orthologous genes. See Supplementary Table 21 for the statistical testing of this result. Panels in (a) represent similarity scores made from the all-reads dataset, and panels in (b) represent those from the mapped-10m reads dataset. Distance calculation methods and normalization methods (in brackets) are shown at the top of each panel. Similar results were also observed for Ps-BGI_gene sets (data not shown). Error bars: S.D. 35

36 Distance method All reads Randomly mapped-10 M reads 1- Pearson correlation co-efficiency 1- Spearman correlation co-efficiency Total Euclidean distance PS_TK11 ó GG_HH16 PS_TK15 ó GG_HH28 PS_TK11 ó GG_HH16 PS_TK15 ó GG_HH28 PS_TK11 ó GG_HH16 PS_TK11 ó GG_HH16 PS_TK15 ó GG_HH28 PS_TK11 ó GG_HH16 PS_TK15 ó GG_HH28 PS_TK13 ó GG_HH16 PS_TK11 ó GG_HH14 PS_TK13 ó GG_HH28 PS_TK13 ó GG_HH19 PS_TK11 ó GG_HH19 PS_TK11 ó GG_HH11 PS_TK11 ó GG_HH16 PS_TK13 ó GG_HH16 PS_TK11 ó GG_HH14 Total Manhattan distance PS_TK11 ó GG_HH16 PS_TK13 ó GG_HH16 PS_TK11 ó GG_HH14 PS_TK15 ó GG_HH28 PS_TK7 ó GG_HH11 PS_TK13 ó GG_HH19 PS_TK11 ó GG_HH19 PS_TK11 ó GG_HH16 PS_TK13 ó GG_HH16 PS_TK11 ó GG_HH14 PS_TK15 ó GG_HH28 PS_TK11 ó GG_HH11 Supplementary Table 25. Pairs of turtle-chicken embryos with the highest expression similarity. Pairs of turtle-chicken embryos with the highest gene expression profiles are shown. Statistical tests (Welch two-sample student t-test or Wilcoxon test depending on satisfaction of statistical prerequisites and Holm corrected alpha level. All results less than p < 0.01) were performed to test the significance of the pairs of embryos with the highest similarity in gene expression. The results reproduced (statistically significant) by the dataset from both of the normalization methods (RPKM 70, TMM 69 ) are shown. Although the results varied depending on the dataset (all reads and 10 M-mapped reads), the normalization method (RPKM and TMM), and distance calculation method (1-Pearson, 1-Spearman, total Euclidean, and total Manhattan), the PS_TK11ó GG_HH16 pair was robustly supported by these analyses. Developmental event / organ structures Axial structures 36 Mouse E9.5 Chicken HH16 Soft-shell turtle TK11 X. laevis stage 28 / stage 31 D. rerio 24 hpf Rhombomere / + + Neural crest cells / + + Notochord / + + Somite / + + Neural tube / neural folds partially fused / + + Pharyngeal Pharyngeal arch / + + Olfactory Olfactory pit / placode / + + Otic Auditory system / Otic placode / + + Optic Lens / lens placode / + + Cardiovascular system Aortic arches / Heart with chambers / + + Kidney Mesonephric duct anlagen / + + Epidermal Epidermis / + + Supplementary Table 26. Anatomical structures shared between TK11 turtle embryo and phylotypic periods of four vertebrate species. +observed or rudimentary structure can be found; not observed. The anatomical features of each embryonic stage were adopted from Kimmel et al. 152, Faber et al. 153, Hamburger et al. 61, and Matthew et al The phylotypic periods of mouse, chicken, X. laevis, and zebrafish were adopted from Irie et al

37 Common description Number of genes in soft shell turtle Hox genes 35 Ensembl gene family ID ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM Paired box pax 7 ENSFM , ENSFM , ENSFM , NSFM FGF / FGF receptor 22 ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM , NSFM , ENSFM BMP / BMP receptor Hedgehog (N-term, C-term) 16 6 Wnt 19 Notch / Notch ligands ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM , ENSFM ENSFM , ENSFM , ENSFM , NSFM , ENSFM ENSFM , ENSFM , ENSFM , NSFM , ENSFM , ENSFM ENSFM , ENSFM , ENSFM Frizzled 7 ENSFM Activin receptors 6 ENSFM TGF-β / receptor 5 ENSFM , ENSFM , ENSFM Supplementary Table 27. Identified developmental toolkit gene families in soft-shell turtle. Developmental toolkit genes 105 in soft-shell turtle classified by Ensembl family IDs. 37

38 GO ID GO description No. of genes within No. of genes out % of genes in 2-fold range of 2-fold range 2-fold range GO: protein export from nucleus 8 0 % GO: histone lysine methylation 10 0 % GO: semaphorin-plexin signaling pathway 9 0 % GO: histone-lysine N-methyltransferase activity % GO: histone methylation % GO: lamellipodium assembly % GO: autophagic vacuole assembly % GO: transcription initiation from RNA polymerase II promoter % GO: cellular process % GO: nuclear pore % GO: SCF ubiquitin ligase complex % GO: positive regulation of proteasomal ubiquitin-dependent protein catabolic process % GO: trna aminoacylation for protein translation % GO: nuclear speck % GO: kinetochore % GO: aminoacyl-trna ligase activity % GO: ATP catabolic process % GO: mitochondrial outer membrane % GO: spindle % GO: cytoplasmic membrane-bounded vesicle % GO: ribonucleoprotein complex % GO: helicase activity % GO: regulation of gene expression % GO: ubiquitin protein ligase binding % GO: catalytic step 2 spliceosome % GO: negative regulation of canonical Wnt receptor signaling pathway % GO: transcription corepressor activity % GO: binding % GO: ubiquitin-dependent protein catabolic process % GO: multicellular organismal development % Supplementary Table 28. Groups of genes that show similar expression levels in turtle/chicken phylotypic stages. Genes grouped by their GO IDs were investigated for similar expression levels (less than 2-fold change) between soft-shell turtle TK11 and chicken HH16 embryos. Groups having a significantly higher ratio (compared to all genes) of similar expression are listed (Fisher s exact test, alpha level = 0.01; the table is restricted to GO groups having more than 70% of similarly expressed genes). All of the results were corroborated by all of the data sets (all reads, mapped 10 M reads, RPKM normalization, and TMM normalization). The number of genes and percentiles were averaged within the four different data sets and are rounded off to the closest whole number. 38

39 Supplementary Figure 15. Turtle-chicken divergence also follows the hourglass model (a) The horizontal width of the hourglass model 16,17 represents the evolutionary divergence observed during embryogenesis (which flows upward in this drawing), which achieves maximum similarity at the phylotypic period (constricted part of the model). The possible phylotypic periods of mice (E 9.5), chicken (HH16), X. laevis (stages 28-31), and Zebrafish (24hr post fertilization) were identified by similarity of gene expression profiles, and reported previously 21. Although recent studies supported the idea that the model could be expanded to include bilaterian or larger phylogenetic groups 104, the exact relationship between the vertebrate hourglass and that of bilaterian or larger animal groups 104 remains to be clarified. What is more, the hierarchical relationship of evolution and development proposed by von Baer 25 that The general features of a large group of animals appear earlier in development than do the specialized features of a smaller group could still be explained by the nested hourglasses model with a later-shifted hourglass (b), however this was not supported by our investigation (Figure 2). Although direct evidence is yet to be provided, the hourglass model was originally proposed with the mechanism that makes the phylotypic period as the most conserved stages. Circles and Hox on the right side of the model (a) represent explanations regarding the cause of phylotypic period conservation, particularly modularity 17,103 and Hox co-linearity 16, respectively. However, the possibility that the developmental system itself could be a major reason for this conservation is still under discussion (22). Additionally, the early flexibility issue, or the mechanism of how vertebrate embryogenesis tolerated divergence during early development while conserving the subsequent phylotypic period is not well understood 21,155. Embryos in the dark circle are the possible vertebrate phylotypic period identified in our previous 21, and the current study (for soft-shell turtle). However, exact relationship of these stages against vertebrate phylotypic period awaits further study that directly compares all other vertebrates, especially including cyclostomes. The hourglass model in (a) was adapted with permission from Irie N., Kuratani S

40 Supplementary Figure 16. Expanded/contracted genes in the turtle lineage show a low expression level during embryogenesis The expression levels of genes in a family that are predicted to have experienced significant expansion or contraction in the turtle lineage (both soft-shell turtle and green sea turtle) were analyzed. (a) Genes in a family of significant expansion or contraction in the turtle lineage (E/C genes, 244 genes, see the Online methods for the prediction of E/C genes) showing significantly lower levels of expression throughout soft-shell turtle embryogenesis (Wilcoxon test, Bonferroni-corrected multiple comparison, p < 0.01). Y-axis, relative expression level; X-axis, developmental stages. The lowest average expression level was marked in the phylotypic period of soft-shell turtle (TK11). TMM-normalized, 10 M-mapped data sets were used for this plot. The RPKM-normalized data set also showed similar results and supported the same conclusion. (b) A large portion (65-79%) of the E/C genes were not expressed (no tag count in the read-depth-controlled data set [mapped 10 M]) during embryogenesis (upper, blue bar-plots), and this was significantly lower when compared to all of the genes (bottom, green bar-plots, Wilcoxon test p < 0.01 at all stages). The highest score of the un-expressed ratio of E/C genes was marked by the phylotypic period (TK11). Error bars: S.D. 40

41 Supplementary Figure 17. Molecular estimation of the correspondence between turtle and chicken developmental timetables. Pairs of turtle-chicken embryos with reciprocal best transcriptome similarities (RBTS) are connected by arrowed lines. Only RBTS supported by all the distance calculation methods and both normalizations are drawn. Multiple lines extending from a single embryonic stage indicate that there were no significant differences in expression similarity. Left: RBTS embryos estimated with the all-reads dataset. Right: RBTS embryos estimated with the mapped-10m reads dataset. Similar results were also observed for Ps-BGI_gene sets (data not shown). For Reciprocal Best Hit Stages (RBTS) identification, stages that exhibited the most similar gene expression profiles were tested by the Welch two-sample t-test or the Wilcoxon signed-rank test based on satisfactions of statistical requirements with a Holm-corrected alpha level. 41

42 Supplementary Figure 18. Genes increasingly expressed after the phylotypic period in turtle (a: Same as Fig. 4b but illustrated for comparison with the chicken ortholog dataset on the right.) The expression dynamics of soft-shell turtle genes that showed a significant increase in expression level after the phylotypic period (IAP). Each line represents the average expression level of each IAP gene calculated with biological replications of each stage. The turtle IAP genes were screened by the following criteria: (1) mean expression level after the phylotypic period (stages that begin to show turtle-specific morphologies, TK15-TK23) is more than five times higher (Wilcoxon test, alpha level = 0.01) than those of earlier stages (gastrula, neurula, TK7 and TK9). (2) The chicken orthologs (if there are any) of turtle IAP genes do not show such increases (the average expression levels in HH28 & HH38 do not show more than five times higher expression than those in the Prim-HH14 stages). The gene names of the top three highest expression levels in TK23 are shown (right panel). Consequently, 233 turtle IAP genes were found. The chicken orthologs of turtle IAP genes (206 genes) are also shown in the left panel. Tissues Detected hairpin precursor with mirnas significant Randfold p-value Unique sequences CR Limbs Body walls Supplementary Table 29. mirnas detected in turtle embryonic tissues. Of these predictions, 94+/-1% are estimated to be true positives. 42

43 Gene Expected band Size Forward Primer Sequence Reverse Primer Sequence Antisense PCR or restriction enzyme PsWnt1 607 TCCTGCACGTGTGACTATCG ACCAGTGGAAGGTGCAGTTG M13R/PsWnt1_F PsWnt2 PsWnt2b PsWnt3 PsWnt3a PsWnt4 PsWnt5a PsWnt5b PsWnt6 PsWnt7a PsWnt7b PsWnt8a PsWnt8b PsWnt9a PsWnt9b PsWnt10a PsWnt10b PsWnt11 PsWnt11b PsWnt TGGAACTGCAACACCCTCCA ACAAGCAACGGCAACTCTG CGTTAGGCCAGCAATACACC TGAGTCAAGCCATGTCAAGC CGAGGAGTGCCAGTACCAGT CAAAGACGGGCATTAAGGAA AGTCGAGAGGCTGCATTCAC GCATCTGCCGGAAGACGAAG GGGGAAAGGACCGTCTTTGG TCCGCTATGGCAGGTGGAAC CCCTTCAGCTCTCCACTCAC TCTCGCGTCCTTCCTCTGCT TGGAGACGACGTGTAAGTGC TCATGGAGTGCCAGTACCAA AGATCGCCACCCACGAGT CCTCAGCCCTGCAGGGCATC GGCACCGCTCTACTACCTTG CCTGGAGCTCATGCACAGTA AAGGAGCTGTGCAAGAGGAA GCCTCTCCCACAGCACATCA AACTTGCACTCGCACTTGGT GTCTCTGGTCCCAAAGGAG AAAGTTGGGGGAGTTCTCGT GAACGCTCCGTGTCTTTCTC CCCCTATTTGCACACGAACT GGTCGACGATCTCTGTGCAC CACAGGCAGTTCTCCTCCAG CCTCGACCGCAACACATCAG TGTTGCACTTCACAAAGCAG TTGTCATGAGTCCGCTTGAG CGCTTCATGGTCCCCTTCAC TGCCACTTGAGGAACAACTG ACAGCAGCAGAAGGGCTAAG AGAAGCGCTCCTTGAGCA GGTTGTGTGCGTTGATGAAG GGTCTGGTGGTGTCTCAGGT AAGCTCACAGCTTCCCATGT ATCTTCCTTGCGGATTGGTA M13R/PsWnt2_F3 M13R/PsWnt2b_F M13R/PsWnt3_F M13R/PsWnt3a_F M13F/PsWnt4_F M13R/PsWnt5a_F M13R/PsWnt5b_F2 M13R/PsWnt7b_R2 M13R/PsWnt8a_F M13F/PsWnt8b_F3 M13R/PsWnt9a_F M13F/PsWnt9b_F EcoRV M13R/PsWnt11_F M13R/PsWnt11b_F M13R/PsWnt16_F2 Supplementary Table 31 Primers, vectors and GenBank accession numbers of soft-shell turtle wnt genes NotI NotI NotI Genbank ID JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ JQ

44 MicroRNAs from: Targets predicted* Specific targets predicted** Ensembl Transcripts Ensembl Genes Ensembl Transcripts Ensembl Genes Carapacial ridge Body wall Limbs Supplementary Table 32. mirna target prediction statistics *Ensembl transcripts or genes that have been predicted by miranda to be putative targets of specific micrornas in each tissue. For example, the 236 (212 unique sequences) specific micrornas of the CR are predicted to target 8358 genes in the CR. **From the previous predictions, the three tissues were compared to identify the specific targets for each, i.e., the targets that have been predicted in the CR and not the rest of the body, predicted in the body wall and not the rest of the body, or predicted in the limbs and not the rest of the body. 44

45 Supplementary Figure 19. Tissue-specific mirnas found in the soft-shell turtle (a) Venn diagram illustrating the number of predicted micrornas, represented as unique sequences, that are shared among structures or completely specific to each structures (limb, body wall, and CR). Numbers of mirnas are not to scale in the Venn diagram. (b) Hairpin prediction for mir-187, which is the most abundant microrna (in terms of number of reads) that is specific to the CR, showing both the mature mirna (in red) and the star mirna (in purple). (c) RNA level evidence for mir-187 prediction. In the upper section, the relative frequency of reads, which reflects the depth over the predicted-precursor sequence, is shown. Below this, the hairpin precursor prediction is shown lineally with the mature sequence in red, the loop in yellow, the expected star sequence in light blue, and the observed star sequence in purple. The reads that mapped to the mature mirna and mirna* are listed in the lower section with the read number at the right. The complete output files illustrate several more features of the prediction 156 (Friedländer et al., 2012) that are not shown here for clarity. (d) Potential targets of soft-shell turtle transcripts and genes (Ps-ens_genes) predicted by miranda

46 Supplementary Figure 20. Sequence conservation of soft-shell turtle mirnas in green sea turtle and chicken. Sequences of soft-shell turtle mature mirnas were searched in green sea turtle and chicken genomes. The larger pie-charts indicate the percentile (%) of soft-shell turtle mature mirna sequences (1082 unique mirnas in total) found in the two species with a various allowance of mismatches (0, 1, and 3 bases). The smaller pie-charts are those of the soft-shell turtle mirna (393 unique mirnas in total) that were predicted to target the Wnt and Wnt components listed on Supplementary Figure 29. Note that the mirnas that are predicted to target Wnt downstream components have a higher ratio of conservation. The alignments were performed using bowtie, with -k 1 and --best parameters. 46

47 Supplementary Figure 21. Conserved molecular components of the Wnt signaling cascade are potential targets of mirnas. (a) A simplified scheme depicting a general β -catenin-dependent Wnt pathway 157. (b) Table representing the genes found in the genomes of P. sinensis and C. mydas. Dark green, found in the gene predictions (Ps-ens_gene; in-house gene predictions for C. mydas); light green, found using blastp against the genome and using the human orthologs as a query. The asterisk in FZD8 indicates that it was found by blast against the RNA-Seq data. FZD, frizzled; SMOH, smoothened; DVL, dishevelled-like; Tcf7, transcription factor 7; Tcf7l, transcription factor 7-like; Lef1, lymphoid enhancer-binding factor 1; LRP, low-density lipoprotein receptor-related protein; APC, Adenomatous polyposis coli protein; GSK3, glycogen synthase kinase-3. (c) List of genes that are potential targets of mirnas expressed in each tissue. Genes that are predicted to be targeted by one or more mirna expressed in each tissue are colored (yellow, body wall; red, limb; blue, CR). Supplementary Figure 22. Wnt gene expression in the carapacial ridge. Among the 20 Wnt genes investigated, for those with expression in or nearby the CR, a focus is shown. fl, forelimb; hl, hindlimb. The red asterisk marks the CR. Only the Wnt5a gene seems to be clearly expressed in the CR, while Wnt6, Wnt7a and Wnt8a are expressed in more medial parts of the embryo. 47

Epigenetic regulation of Plasmodium falciparum clonally. variant gene expression during development in An. gambiae

Epigenetic regulation of Plasmodium falciparum clonally. variant gene expression during development in An. gambiae Epigenetic regulation of Plasmodium falciparum clonally variant gene expression during development in An. gambiae Elena Gómez-Díaz, Rakiswendé S. Yerbanga, Thierry Lefèvre, Anna Cohuet, M. Jordan Rowley,

More information

CLADISTICS Student Packet SUMMARY Phylogeny Phylogenetic trees/cladograms

CLADISTICS Student Packet SUMMARY Phylogeny Phylogenetic trees/cladograms CLADISTICS Student Packet SUMMARY PHYLOGENETIC TREES AND CLADOGRAMS ARE MODELS OF EVOLUTIONARY HISTORY THAT CAN BE TESTED Phylogeny is the history of descent of organisms from their common ancestor. Phylogenetic

More information

Species: Panthera pardus Genus: Panthera Family: Felidae Order: Carnivora Class: Mammalia Phylum: Chordata

Species: Panthera pardus Genus: Panthera Family: Felidae Order: Carnivora Class: Mammalia Phylum: Chordata CHAPTER 6: PHYLOGENY AND THE TREE OF LIFE AP Biology 3 PHYLOGENY AND SYSTEMATICS Phylogeny - evolutionary history of a species or group of related species Systematics - analytical approach to understanding

More information

Evidence for Evolution by Natural Selection. Hunting for evolution clues Elementary, my dear, Darwin!

Evidence for Evolution by Natural Selection. Hunting for evolution clues Elementary, my dear, Darwin! Evidence for Evolution by Natural Selection Hunting for evolution clues Elementary, my dear, Darwin! 2006-2007 Evidence supporting evolution Fossil record shows change over time Anatomical record comparing

More information

These small issues are easily addressed by small changes in wording, and should in no way delay publication of this first- rate paper.

These small issues are easily addressed by small changes in wording, and should in no way delay publication of this first- rate paper. Reviewers' comments: Reviewer #1 (Remarks to the Author): This paper reports on a highly significant discovery and associated analysis that are likely to be of broad interest to the scientific community.

More information

Sec KEY CONCEPT Reptiles, birds, and mammals are amniotes.

Sec KEY CONCEPT Reptiles, birds, and mammals are amniotes. Thu 4/27 Learning Target Class Activities *attached below (scroll down)* Website: my.hrw.com Username: bio678 Password:a4s5s Activities Students will describe the evolutionary significance of amniotic

More information

Testing Phylogenetic Hypotheses with Molecular Data 1

Testing Phylogenetic Hypotheses with Molecular Data 1 Testing Phylogenetic Hypotheses with Molecular Data 1 How does an evolutionary biologist quantify the timing and pathways for diversification (speciation)? If we observe diversification today, the processes

More information

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare

More information

UNIT III A. Descent with Modification(Ch19) B. Phylogeny (Ch20) C. Evolution of Populations (Ch21) D. Origin of Species or Speciation (Ch22)

UNIT III A. Descent with Modification(Ch19) B. Phylogeny (Ch20) C. Evolution of Populations (Ch21) D. Origin of Species or Speciation (Ch22) UNIT III A. Descent with Modification(Ch9) B. Phylogeny (Ch2) C. Evolution of Populations (Ch2) D. Origin of Species or Speciation (Ch22) Classification in broad term simply means putting things in classes

More information

Modern Evolutionary Classification. Lesson Overview. Lesson Overview Modern Evolutionary Classification

Modern Evolutionary Classification. Lesson Overview. Lesson Overview Modern Evolutionary Classification Lesson Overview 18.2 Modern Evolutionary Classification THINK ABOUT IT Darwin s ideas about a tree of life suggested a new way to classify organisms not just based on similarities and differences, but

More information

muscles (enhancing biting strength). Possible states: none, one, or two.

muscles (enhancing biting strength). Possible states: none, one, or two. Reconstructing Evolutionary Relationships S-1 Practice Exercise: Phylogeny of Terrestrial Vertebrates In this example we will construct a phylogenetic hypothesis of the relationships between seven taxa

More information

Title: Phylogenetic Methods and Vertebrate Phylogeny

Title: Phylogenetic Methods and Vertebrate Phylogeny Title: Phylogenetic Methods and Vertebrate Phylogeny Central Question: How can evolutionary relationships be determined objectively? Sub-questions: 1. What affect does the selection of the outgroup have

More information

Ch 1.2 Determining How Species Are Related.notebook February 06, 2018

Ch 1.2 Determining How Species Are Related.notebook February 06, 2018 Name 3 "Big Ideas" from our last notebook lecture: * * * 1 WDYR? Of the following organisms, which is the closest relative of the "Snowy Owl" (Bubo scandiacus)? a) barn owl (Tyto alba) b) saw whet owl

More information

May 10, SWBAT analyze and evaluate the scientific evidence provided by the fossil record.

May 10, SWBAT analyze and evaluate the scientific evidence provided by the fossil record. May 10, 2017 Aims: SWBAT analyze and evaluate the scientific evidence provided by the fossil record. Agenda 1. Do Now 2. Class Notes 3. Guided Practice 4. Independent Practice 5. Practicing our AIMS: E.3-Examining

More information

NAME: DATE: SECTION:

NAME: DATE: SECTION: NAME: DATE: SECTION: MCAS PREP PACKET EVOLUTION AND BIODIVERSITY 1. Which of the following observations best supports the conclusion that dolphins and sharks do not have a recent common ancestor? A. Dolphins

More information

Lecture 11 Wednesday, September 19, 2012

Lecture 11 Wednesday, September 19, 2012 Lecture 11 Wednesday, September 19, 2012 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean

More information

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST Big Idea 1 Evolution INVESTIGATION 3 COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST How can bioinformatics be used as a tool to determine evolutionary relationships and to

More information

A. Pulse-field gel of hummingbird genomic DNA. B. Bioanalyzer plot of hummingbird SMRTbell library

A. Pulse-field gel of hummingbird genomic DNA. B. Bioanalyzer plot of hummingbird SMRTbell library A. Pulse-field gel of hummingbird genomic DNA 1: Sheared gdna: 35 kb & 40 kb 2: BluePippin sizeselected library (17 kb cut-off) 3: Original gdna B. Bioanalyzer plot of hummingbird SMRTbell library 5kb

More information

Evolution as Fact. The figure below shows transitional fossils in the whale lineage.

Evolution as Fact. The figure below shows transitional fossils in the whale lineage. Evolution as Fact Evolution is a fact. Organisms descend from others with modification. Phylogeny, the lineage of ancestors and descendants, is the scientific term to Darwin's phrase "descent with modification."

More information

From Slime to Scales: Evolution of Reptiles. Review: Disadvantages of Being an Amphibian

From Slime to Scales: Evolution of Reptiles. Review: Disadvantages of Being an Amphibian From Slime to Scales: Evolution of Reptiles Review: Disadvantages of Being an Amphibian Gelatinous eggs of amphibians cannot survive out of water, so amphibians are limited in terms of the environments

More information

Biology 1B Evolution Lecture 11 (March 19, 2010), Insights from the Fossil Record and Evo-Devo

Biology 1B Evolution Lecture 11 (March 19, 2010), Insights from the Fossil Record and Evo-Devo Biology 1B Evolution Lecture 11 (March 19, 2010), Insights from the Fossil Record and Evo-Devo Extinction Important points on extinction rates: Background rate of extinctions per million species per year:

More information

What is the evidence for evolution?

What is the evidence for evolution? What is the evidence for evolution? 1. Geographic Distribution 2. Fossil Evidence & Transitional Species 3. Comparative Anatomy 1. Homologous Structures 2. Analogous Structures 3. Vestigial Structures

More information

Comparing DNA Sequences Cladogram Practice

Comparing DNA Sequences Cladogram Practice Name Period Assignment # See lecture questions 75, 122-123, 127, 137 Comparing DNA Sequences Cladogram Practice BACKGROUND Between 1990 2003, scientists working on an international research project known

More information

Characteristics of a Reptile. Vertebrate animals Lungs Scaly skin Amniotic egg

Characteristics of a Reptile. Vertebrate animals Lungs Scaly skin Amniotic egg Reptiles Characteristics of a Reptile Vertebrate animals Lungs Scaly skin Amniotic egg Characteristics of Reptiles Adaptations to life on land More efficient lungs and a better circulator system were develope

More information

17.2 Classification Based on Evolutionary Relationships Organization of all that speciation!

17.2 Classification Based on Evolutionary Relationships Organization of all that speciation! Organization of all that speciation! Patterns of evolution.. Taxonomy gets an over haul! Using more than morphology! 3 domains, 6 kingdoms KEY CONCEPT Modern classification is based on evolutionary relationships.

More information

Animal Diversity wrap-up Lecture 9 Winter 2014

Animal Diversity wrap-up Lecture 9 Winter 2014 Animal Diversity wrap-up Lecture 9 Winter 2014 1 Animal phylogeny based on morphology & development Fig. 32.10 2 Animal phylogeny based on molecular data Fig. 32.11 New Clades 3 Lophotrochozoa Lophophore:

More information

Biology. Slide 1of 50. End Show. Copyright Pearson Prentice Hall

Biology. Slide 1of 50. End Show. Copyright Pearson Prentice Hall Biology 1of 50 2of 50 Phylogeny of Chordates Nonvertebrate chordates Jawless fishes Sharks & their relatives Bony fishes Reptiles Amphibians Birds Mammals Invertebrate ancestor 3of 50 A vertebrate dry,

More information

8/19/2013. Topic 5: The Origin of Amniotes. What are some stem Amniotes? What are some stem Amniotes? The Amniotic Egg. What is an Amniote?

8/19/2013. Topic 5: The Origin of Amniotes. What are some stem Amniotes? What are some stem Amniotes? The Amniotic Egg. What is an Amniote? Topic 5: The Origin of Amniotes Where do amniotes fall out on the vertebrate phylogeny? What are some stem Amniotes? What is an Amniote? What changes were involved with the transition to dry habitats?

More information

Biology Slide 1 of 50

Biology Slide 1 of 50 Biology 1 of 50 2 of 50 What Is a Reptile? What are the characteristics of reptiles? 3 of 50 What Is a Reptile? What Is a Reptile? A reptile is a vertebrate that has dry, scaly skin, lungs, and terrestrial

More information

REPTILES. Scientific Classification of Reptiles To creep. Kingdom: Animalia Phylum: Chordata Subphylum: Vertebrata Class: Reptilia

REPTILES. Scientific Classification of Reptiles To creep. Kingdom: Animalia Phylum: Chordata Subphylum: Vertebrata Class: Reptilia Scientific Classification of Reptiles To creep Kingdom: Animalia Phylum: Chordata Subphylum: Vertebrata Class: Reptilia REPTILES tetrapods - 4 legs adapted for land, hip/girdle Amniotes - animals whose

More information

Introduction to phylogenetic trees and tree-thinking Copyright 2005, D. A. Baum (Free use for non-commercial educational pruposes)

Introduction to phylogenetic trees and tree-thinking Copyright 2005, D. A. Baum (Free use for non-commercial educational pruposes) Introduction to phylogenetic trees and tree-thinking Copyright 2005, D. A. Baum (Free use for non-commercial educational pruposes) Phylogenetics is the study of the relationships of organisms to each other.

More information

AP Lab Three: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

AP Lab Three: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST AP Biology Name AP Lab Three: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST In the 1990 s when scientists began to compile a list of genes and DNA sequences in the human genome

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle  holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/19952 holds various files of this Leiden University dissertation. Author: Vonk, Freek Jacobus Title: Snake evolution and prospecting of snake venom Date:

More information

2 nd Term Final. Revision Sheet. Students Name: Grade: 11 A/B. Subject: Biology. Teacher Signature. Page 1 of 11

2 nd Term Final. Revision Sheet. Students Name: Grade: 11 A/B. Subject: Biology. Teacher Signature. Page 1 of 11 2 nd Term Final Revision Sheet Students Name: Grade: 11 A/B Subject: Biology Teacher Signature Page 1 of 11 Nour Al Maref International School Riyadh, Saudi Arabia Biology Worksheet (2 nd Term) Chapter-26

More information

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST In this laboratory investigation, you will use BLAST to compare several genes, and then use the information to construct a cladogram.

More information

Animal Diversity III: Mollusca and Deuterostomes

Animal Diversity III: Mollusca and Deuterostomes Animal Diversity III: Mollusca and Deuterostomes Objectives: Be able to identify specimens from the main groups of Mollusca and Echinodermata. Be able to distinguish between the bilateral symmetry on a

More information

1 Describe the anatomy and function of the turtle shell. 2 Describe respiration in turtles. How does the shell affect respiration?

1 Describe the anatomy and function of the turtle shell. 2 Describe respiration in turtles. How does the shell affect respiration? GVZ 2017 Practice Questions Set 1 Test 3 1 Describe the anatomy and function of the turtle shell. 2 Describe respiration in turtles. How does the shell affect respiration? 3 According to the most recent

More information

1 In 1958, scientists made a breakthrough in artificial reproductive cloning by successfully cloning a

1 In 1958, scientists made a breakthrough in artificial reproductive cloning by successfully cloning a 1 In 1958, scientists made a breakthrough in artificial reproductive cloning by successfully cloning a vertebrate species. The species cloned was the African clawed frog, Xenopus laevis. Fig. 1.1, on page

More information

What are taxonomy, classification, and systematics?

What are taxonomy, classification, and systematics? Topic 2: Comparative Method o Taxonomy, classification, systematics o Importance of phylogenies o A closer look at systematics o Some key concepts o Parts of a cladogram o Groups and characters o Homology

More information

Fig Phylogeny & Systematics

Fig Phylogeny & Systematics Fig. 26- Phylogeny & Systematics Tree of Life phylogenetic relationship for 3 clades (http://evolution.berkeley.edu Fig. 26-2 Phylogenetic tree Figure 26.3 Taxonomy Taxon Carolus Linnaeus Species: Panthera

More information

LABORATORY EXERCISE 7: CLADISTICS I

LABORATORY EXERCISE 7: CLADISTICS I Biology 4415/5415 Evolution LABORATORY EXERCISE 7: CLADISTICS I Take a group of organisms. Let s use five: a lungfish, a frog, a crocodile, a flamingo, and a human. How to reconstruct their relationships?

More information

LABORATORY EXERCISE 6: CLADISTICS I

LABORATORY EXERCISE 6: CLADISTICS I Biology 4415/5415 Evolution LABORATORY EXERCISE 6: CLADISTICS I Take a group of organisms. Let s use five: a lungfish, a frog, a crocodile, a flamingo, and a human. How to reconstruct their relationships?

More information

Geo 302D: Age of Dinosaurs LAB 4: Systematics Part 1

Geo 302D: Age of Dinosaurs LAB 4: Systematics Part 1 Geo 302D: Age of Dinosaurs LAB 4: Systematics Part 1 Systematics is the comparative study of biological diversity with the intent of determining the relationships between organisms. Humankind has always

More information

Fishes, Amphibians, Reptiles

Fishes, Amphibians, Reptiles Fishes, Amphibians, Reptiles Section 1: What is a Vertebrate? Characteristics of CHORDATES Most are Vertebrates (have a spinal cord) Some point in life cycle all chordates have: Notochord Nerve cord that

More information

Vertebrates. skull ribs vertebral column

Vertebrates. skull ribs vertebral column Vertebrates skull ribs vertebral column endoskeleton in cells working together tissues tissues working together organs working together organs systems Blood carries oxygen to the cells carries nutrients

More information

Presence and Absence of COX8 in Reptile Transcriptomes

Presence and Absence of COX8 in Reptile Transcriptomes Presence and Absence of COX8 in Reptile Transcriptomes Emily K. West, Michael W. Vandewege, Federico G. Hoffmann Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology Mississippi

More information

6. The lifetime Darwinian fitness of one organism is greater than that of another organism if: A. it lives longer than the other B. it is able to outc

6. The lifetime Darwinian fitness of one organism is greater than that of another organism if: A. it lives longer than the other B. it is able to outc 1. The money in the kingdom of Florin consists of bills with the value written on the front, and pictures of members of the royal family on the back. To test the hypothesis that all of the Florinese $5

More information

Comparative Zoology Portfolio Project Assignment

Comparative Zoology Portfolio Project Assignment Comparative Zoology Portfolio Project Assignment Using your knowledge from the in class activities, your notes, you Integrated Science text, or the internet, you will look at the major trends in the evolution

More information

VERTEBRATE READING. Fishes

VERTEBRATE READING. Fishes VERTEBRATE READING Fishes The first vertebrates to become a widespread, predominant life form on earth were fishes. Prior to this, only invertebrates, such as mollusks, worms and squid-like animals, would

More information

Dynamic evolution of venom proteins in squamate reptiles. Nicholas R. Casewell, Gavin A. Huttley and Wolfgang Wüster

Dynamic evolution of venom proteins in squamate reptiles. Nicholas R. Casewell, Gavin A. Huttley and Wolfgang Wüster Dynamic evolution of venom proteins in squamate reptiles Nicholas R. Casewell, Gavin A. Huttley and Wolfgang Wüster Supplementary Information Supplementary Figure S1. Phylogeny of the Toxicofera and evolution

More information

Vertebrates. Vertebrates are animals that have a backbone and an endoskeleton.

Vertebrates. Vertebrates are animals that have a backbone and an endoskeleton. Vertebrates Vertebrates are animals that have a backbone and an endoskeleton. The backbone replaces the notochord and contains bones called vertebrae. An endoskeleton is an internal skeleton that protects

More information

BioSci 110, Fall 08 Exam 2

BioSci 110, Fall 08 Exam 2 1. is the cell division process that results in the production of a. mitosis; 2 gametes b. meiosis; 2 gametes c. meiosis; 2 somatic (body) cells d. mitosis; 4 somatic (body) cells e. *meiosis; 4 gametes

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2006

Bio 1B Lecture Outline (please print and bring along) Fall, 2006 Bio 1B Lecture Outline (please print and bring along) Fall, 2006 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #4 -- Phylogenetic Analysis (Cladistics) -- Oct.

More information

Evolution. Evolution is change in organisms over time. Evolution does not have a goal; it is often shaped by natural selection (see below).

Evolution. Evolution is change in organisms over time. Evolution does not have a goal; it is often shaped by natural selection (see below). Evolution Evolution is change in organisms over time. Evolution does not have a goal; it is often shaped by natural selection (see below). Species an interbreeding population of organisms that can produce

More information

KINGDOM ANIMALIA Phylum Chordata Subphylum Vertebrata Class Reptilia

KINGDOM ANIMALIA Phylum Chordata Subphylum Vertebrata Class Reptilia KINGDOM ANIMALIA Phylum Chordata Subphylum Vertebrata Class Reptilia Vertebrate Classes Reptiles are the evolutionary base for the rest of the tetrapods. Early divergence of mammals from reptilian ancestor.

More information

Video Assignments. Microraptor PBS The Four-winged Dinosaur Mark Davis SUNY Cortland Library Online

Video Assignments. Microraptor PBS The Four-winged Dinosaur Mark Davis SUNY Cortland Library Online Video Assignments Microraptor PBS The Four-winged Dinosaur Mark Davis SUNY Cortland Library Online Radiolab Apocalyptical http://www.youtube.com/watch?v=k52vd4wbdlw&feature=youtu.be Minute 13 through minute

More information

Class Reptilia Testudines Squamata Crocodilia Sphenodontia

Class Reptilia Testudines Squamata Crocodilia Sphenodontia Class Reptilia Testudines (around 300 species Tortoises and Turtles) Squamata (around 7,900 species Snakes, Lizards and amphisbaenids) Crocodilia (around 23 species Alligators, Crocodiles, Caimans and

More information

Red Eared Slider Secrets. Although Most Red-Eared Sliders Can Live Up to Years, Most WILL NOT Survive Two Years!

Red Eared Slider Secrets. Although Most Red-Eared Sliders Can Live Up to Years, Most WILL NOT Survive Two Years! Although Most Red-Eared Sliders Can Live Up to 45-60 Years, Most WILL NOT Survive Two Years! Chris Johnson 2014 2 Red Eared Slider Secrets Although Most Red-Eared Sliders Can Live Up to 45-60 Years, Most

More information

DEVELOPMENT OF THE HEAD AND NECK PLACODES

DEVELOPMENT OF THE HEAD AND NECK PLACODES DEVELOPMENT OF THE HEAD AND NECK Placodes and the development of organs of special sense L. Moss-Salentijn PLACODES Localized thickened areas of specialized ectoderm, lateral to the neural crest, at the

More information

Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST INVESTIGATION 3 BIG IDEA 1 Lab Investigation 3: BLAST Pre-Lab Essential Question: How can bioinformatics be used as a tool to

More information

Warm-Up: Fill in the Blank

Warm-Up: Fill in the Blank Warm-Up: Fill in the Blank 1. For natural selection to happen, there must be variation in the population. 2. The preserved remains of organisms, called provides evidence for evolution. 3. By using and

More information

Bi156 Lecture 1/13/12. Dog Genetics

Bi156 Lecture 1/13/12. Dog Genetics Bi156 Lecture 1/13/12 Dog Genetics The radiation of the family Canidae occurred about 100 million years ago. Dogs are most closely related to wolves, from which they diverged through domestication about

More information

Vertebrate Structure and Function

Vertebrate Structure and Function Vertebrate Structure and Function Part 1 - Comparing Structure and Function Classification of Vertebrates a. Phylum: Chordata Common Characteristics: Notochord, pharyngeal gill slits, hollow dorsal nerve

More information

Evolutionary Trade-Offs in Mammalian Sensory Perceptions: Visual Pathways of Bats. By Adam Proctor Mentor: Dr. Emma Teeling

Evolutionary Trade-Offs in Mammalian Sensory Perceptions: Visual Pathways of Bats. By Adam Proctor Mentor: Dr. Emma Teeling Evolutionary Trade-Offs in Mammalian Sensory Perceptions: Visual Pathways of Bats By Adam Proctor Mentor: Dr. Emma Teeling Visual Pathways of Bats Purpose Background on mammalian vision Tradeoffs and bats

More information

Living Dinosaurs (3-5) Animal Demonstrations

Living Dinosaurs (3-5) Animal Demonstrations Living Dinosaurs (3-5) Animal Demonstrations At a glance Students visiting the zoo will be introduced to live animals and understand their connection to a common ancestor, dinosaurs. Time requirement One

More information

CHAPTER 26. Animal Evolution The Vertebrates

CHAPTER 26. Animal Evolution The Vertebrates CHAPTER 26 Animal Evolution The Vertebrates Impacts, Issues: Interpreting and Misinterpreting the Past No one was around to witness the transitions in the history of life Fossils allow us glimpses into

More information

Fish 2/26/13. Chordates 2. Sharks and Rays (about 470 species) Sharks etc Bony fish. Tetrapods. Osteichthans Lobe fins and lungfish

Fish 2/26/13. Chordates 2. Sharks and Rays (about 470 species) Sharks etc Bony fish. Tetrapods. Osteichthans Lobe fins and lungfish Chordates 2 Sharks etc Bony fish Osteichthans Lobe fins and lungfish Tetrapods ns Reptiles Birds Feb 27, 2013 Chordates ANCESTRAL DEUTEROSTOME Notochord Common ancestor of chordates Head Vertebral column

More information

Anatomy. Name Section. The Vertebrate Skeleton

Anatomy. Name Section. The Vertebrate Skeleton Name Section Anatomy The Vertebrate Skeleton Vertebrate paleontologists get most of their knowledge about past organisms from skeletal remains. Skeletons are useful for gleaning information about an organism

More information

The color and patterning of pigmentation in cats, dogs, mice horses and other mammals results from the interaction of several different genes

The color and patterning of pigmentation in cats, dogs, mice horses and other mammals results from the interaction of several different genes The color and patterning of pigmentation in cats, dogs, mice horses and other mammals results from the interaction of several different genes 1 Gene Interactions: Specific alleles of one gene mask or modify

More information

Conservation genomics of the highly endangered Red Siskin

Conservation genomics of the highly endangered Red Siskin Conservation genomics of the highly endangered Red Siskin Haw Chuan Lim Dept of Vertebrate Zoology & Center for Conservation Genomics Smithsonian Institution Brian Coyle Project Coordinator, Red Siskin

More information

INTRODUCTION TO ANIMAL AND VETERINARY SCIENCE CURRICULUM. Unit 1: Animals in Society/Global Perspective

INTRODUCTION TO ANIMAL AND VETERINARY SCIENCE CURRICULUM. Unit 1: Animals in Society/Global Perspective Chariho Regional School District - Science Curriculum September, 2016 INTRODUCTION TO ANIMAL AND VETERINARY SCIENCE CURRICULUM Unit 1: Animals in Society/Global Perspective Students will gain an understanding

More information

No limbs Eastern glass lizard. Monitor lizard. Iguanas. ANCESTRAL LIZARD (with limbs) Snakes. No limbs. Geckos Pearson Education, Inc.

No limbs Eastern glass lizard. Monitor lizard. Iguanas. ANCESTRAL LIZARD (with limbs) Snakes. No limbs. Geckos Pearson Education, Inc. No limbs Eastern glass lizard Monitor lizard guanas ANCESTRAL LZARD (with limbs) No limbs Snakes Geckos Species: Panthera pardus Genus: Panthera Family: Felidae Order: Carnivora Class: Mammalia Phylum:

More information

Page # Diversity of Arthropoda Crustacea Morphology. Diversity of Arthropoda. Diversity of Arthropoda. Diversity of Arthropoda. Arthropods, from last

Page # Diversity of Arthropoda Crustacea Morphology. Diversity of Arthropoda. Diversity of Arthropoda. Diversity of Arthropoda. Arthropods, from last Arthropods, from last time Crustacea are the dominant marine arthropods Crustacea are the dominant marine arthropods any terrestrial crustaceans? Should we call them shellfish? sowbugs 2 3 Crustacea Morphology

More information

Phenotype Observed Expected (O-E) 2 (O-E) 2 /E dotted yellow solid yellow dotted blue solid blue

Phenotype Observed Expected (O-E) 2 (O-E) 2 /E dotted yellow solid yellow dotted blue solid blue 1. (30 pts) A tropical fish breeder for the local pet store is interested in creating a new type of fancy tropical fish. She observes consistent patterns of inheritance for the following traits: P 1 :

More information

Amniote Relationships. Reptilian Ancestor. Reptilia. Mesosuarus freshwater dwelling reptile

Amniote Relationships. Reptilian Ancestor. Reptilia. Mesosuarus freshwater dwelling reptile Amniote Relationships mammals Synapsida turtles lizards,? Anapsida snakes, birds, crocs Diapsida Reptilia Amniota Reptilian Ancestor Mesosuarus freshwater dwelling reptile Reptilia General characteristics

More information

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore Activitydevelop EXPLO RING VERTEBRATE CL ASSIFICATIO N What criteria

More information

How the eye sees. Properties of light. The light-gathering parts of the eye. 1. Properties of light. 2. The anatomy of the eye. 3.

How the eye sees. Properties of light. The light-gathering parts of the eye. 1. Properties of light. 2. The anatomy of the eye. 3. How the eye sees 1. Properties of light 2. The anatomy of the eye 3. Visual pigments 4. Color vision 1 Properties of light Light is made up of particles called photons Light travels as waves speed of light

More information

Question Set 1: Animal EVOLUTIONARY BIODIVERSITY

Question Set 1: Animal EVOLUTIONARY BIODIVERSITY Biology 162 LAB EXAM 2, AM Version Thursday 24 April 2003 page 1 Question Set 1: Animal EVOLUTIONARY BIODIVERSITY (a). We have mentioned several times in class that the concepts of Developed and Evolved

More information

Interpreting Evolutionary Trees Honors Integrated Science 4 Name Per.

Interpreting Evolutionary Trees Honors Integrated Science 4 Name Per. Interpreting Evolutionary Trees Honors Integrated Science 4 Name Per. Introduction Imagine a single diagram representing the evolutionary relationships between everything that has ever lived. If life evolved

More information

INQUIRY & INVESTIGATION

INQUIRY & INVESTIGATION INQUIRY & INVESTIGTION Phylogenies & Tree-Thinking D VID. UM SUSN OFFNER character a trait or feature that varies among a set of taxa (e.g., hair color) character-state a variant of a character that occurs

More information

d. Wrist bones. Pacific salmon life cycle. Atlantic salmon (different genus) can spawn more than once.

d. Wrist bones. Pacific salmon life cycle. Atlantic salmon (different genus) can spawn more than once. Lecture III.5b Answers to HW 1. (2 pts). Tiktaalik bridges the gap between fish and tetrapods by virtue of possessing which of the following? a. Humerus. b. Radius. c. Ulna. d. Wrist bones. 2. (2 pts)

More information

Phylogeny Reconstruction

Phylogeny Reconstruction Phylogeny Reconstruction Trees, Methods and Characters Reading: Gregory, 2008. Understanding Evolutionary Trees (Polly, 2006) Lab tomorrow Meet in Geology GY522 Bring computers if you have them (they will

More information

History of Lineages. Chapter 11. Jamie Oaks 1. April 11, Kincaid Hall 524. c 2007 Boris Kulikov boris-kulikov.blogspot.

History of Lineages. Chapter 11. Jamie Oaks 1. April 11, Kincaid Hall 524. c 2007 Boris Kulikov boris-kulikov.blogspot. History of Lineages Chapter 11 Jamie Oaks 1 1 Kincaid Hall 524 joaks1@gmail.com April 11, 2014 c 2007 Boris Kulikov boris-kulikov.blogspot.com History of Lineages J. Oaks, University of Washington 1/46

More information

PolyA_DB: a database for mammalian mrna polyadenylation

PolyA_DB: a database for mammalian mrna polyadenylation D116 D120 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki055 PolyA_DB: a database for mammalian mrna polyadenylation Haibo Zhang 1,2, Jun Hu 2, Michael Recce 1 and Bin Tian 2,

More information

2013 Holiday Lectures on Science Medicine in the Genomic Era

2013 Holiday Lectures on Science Medicine in the Genomic Era INTRODUCTION Figure 1. Tasha. Scientists sequenced the first canine genome using DNA from a boxer named Tasha. Meet Tasha, a boxer dog (Figure 1). In 2005, scientists obtained the first complete dog genome

More information

Jerry and I am a NGS addict

Jerry and I am a NGS addict Introduction Identification and Management of Loss of Function Alleles Impacting Fertility L1 Dominette 01449 Jerry and I am a NGS addict Jerry Taylor taylorjerr@missouri.edu University of Missouri 2014

More information

Analysis of CR1 repeats in the zebra finch genome

Analysis of CR1 repeats in the zebra finch genome Analysis of CR1 repeats in the zebra finch genome George E. Liu, Yali Hou* and Twain Brown Bovine Functional Genomics Laboratory, ANRI, ARS, USDA, Beltsville, Maryland 20705, USA *Also affiliated with

More information

Was the Spotted Horse an Imaginary Creature? g.org/sciencenow/2011/11/was-the-spotted-horse-an-imagina.html

Was the Spotted Horse an Imaginary Creature?   g.org/sciencenow/2011/11/was-the-spotted-horse-an-imagina.html Was the Spotted Horse an Imaginary Creature? http://news.sciencema g.org/sciencenow/2011/11/was-the-spotted-horse-an-imagina.html 1 Genotypes of predomestic horses match phenotypes painted in Paleolithic

More information

Get the other MEGA courses!

Get the other MEGA courses! www.thesimplehomeschool.com Simple Schooling BUGS MEGA course is ten weeks of all about bugs! This course grabs your student s attention and never lets go! Grades K-3 Get the other MEGA courses! Simple

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/319/5870/1679/dc1 Supporting Online Material for Drosophila Egg-Laying Site Selection as a System to Study Simple Decision-Making Processes Chung-hui Yang, Priyanka

More information

13. Swim bladder function: A. What happens to the density of a fish if the volume of its swim bladder increases?

13. Swim bladder function: A. What happens to the density of a fish if the volume of its swim bladder increases? Ch 11 Review - Use this worksheet as practice and as an addition to your Chapter 11 Study Guide. Test will only be over Ch 11.1-11.4. (Ch 11.5 Fossil and Paleontology section will not be on your test)

More information

Comparing DNA Sequence to Understand

Comparing DNA Sequence to Understand Comparing DNA Sequence to Understand Evolutionary Relationships with BLAST Name: Big Idea 1: Evolution Pre-Reading In order to understand the purposes and learning objectives of this investigation, you

More information

Growth and Development. Sex determination Development: embryogenesis and morphogenesis Metamorphosis

Growth and Development. Sex determination Development: embryogenesis and morphogenesis Metamorphosis Herp Development Growth and Development Sex determination Development: embryogenesis and morphogenesis Metamorphosis Growth and Development Sex determination Development: embryogenesis and morphogenesis

More information

The genetic basis of breed diversification: signatures of selection in pig breeds

The genetic basis of breed diversification: signatures of selection in pig breeds The genetic basis of breed diversification: signatures of selection in pig breeds Samantha Wilkinson Lu ZH, Megens H-J, Archibald AL, Haley CS, Jackson IJ, Groenen MAM, Crooijmans RP, Ogden R, Wiener P

More information

S7L2_Genetics and S7L5_Theory of Evolution (Thrower)

S7L2_Genetics and S7L5_Theory of Evolution (Thrower) Name: Date: 1. Single-celled organisms can reproduce and create cells exactly like themselves without combining genes from two different parent cells. When they do this, they use a type of A. asexual reproduction.

More information

Mr. Bouchard Summer Assignment AP Biology. Name: Block: Score: / 20. Topic: Chemistry Review and Evolution Intro Packet Due: 9/4/18

Mr. Bouchard Summer Assignment AP Biology. Name: Block: Score: / 20. Topic: Chemistry Review and Evolution Intro Packet Due: 9/4/18 Name: Block: Score: / 20 Topic: Chemistry Review and Evolution Intro Packet Due: 9/4/18 Week Schedule Monday Tuesday Wednesday Thursday Friday In class discussion/activity NONE NONE NONE Syllabus and Course

More information

Modern taxonomy. Building family trees 10/10/2011. Knowing a lot about lots of creatures. Tom Hartman. Systematics includes: 1.

Modern taxonomy. Building family trees 10/10/2011. Knowing a lot about lots of creatures. Tom Hartman. Systematics includes: 1. Modern taxonomy Building family trees Tom Hartman www.tuatara9.co.uk Classification has moved away from the simple grouping of organisms according to their similarities (phenetics) and has become the study

More information

From Reptiles to Aves

From Reptiles to Aves First Vertebrates From Reptiles to Aves Evolutions of Fish to Amphibians Evolution of Amphibians to Reptiles Evolution of Reptiles to Dinosaurs to Birds Common Ancestor of Birds and Reptiles: Thecodonts

More information

Clarifications to the genetic differentiation of German Shepherds

Clarifications to the genetic differentiation of German Shepherds Clarifications to the genetic differentiation of German Shepherds Our short research report on the genetic differentiation of different breeding lines in German Shepherds has stimulated a lot interest

More information

In the first half of the 20th century, Dr. Guido Fanconi published detailed clinical descriptions of several heritable human diseases.

In the first half of the 20th century, Dr. Guido Fanconi published detailed clinical descriptions of several heritable human diseases. In the first half of the 20th century, Dr. Guido Fanconi published detailed clinical descriptions of several heritable human diseases. Two disease syndromes were named after him: Fanconi Anemia and Fanconi

More information