A. Pulse-field gel of hummingbird genomic DNA 1: Sheared gdna: 35 kb & 40 kb 2: BluePippin sizeselected library (17 kb cut-off) 3: Original gdna B. Bioanalyzer plot of hummingbird SMRTbell library 5kb 7kb 17kb 10kb Blue: SMRTbell library before size selecpon Red: SMRTbell library aqer Blue Pippin size selecpon (17kb cutoff) Green: Control DNA ladder Figure S1
Zebra finch A. Read length distribupon Anna s hummingbird B. Read length distribupon C. Insert length distribupon D. Insert length distribupon Figure S2
Eukaryota Geneset Aves Geneset 1000 1000 800 800 Protein Length 600 400 600 400 200 200 0 0 CEGMA BUSCO CEGMA BUSCO N = 144, W = 5683, P < 0.001 N = 177, W = 13376, P < 0.001 Figure S3
MIDBRAIN CEREBRUM CEREBELLUM HINDBRAIN THALAMUS MIDBRAIN v Am larynx muscles PAG at Broca s LMC/LSC ast Human (b) A1- L4 CEREBRUM HVC RA DM Av NIf HINDBRAIN XII MO MAN adlm CEREBELLUM CEREBRUM THALAMUS MIDBRAIN AreaX trachea & syrinx muscles Pallium/Cortex Striatum Pallidum Songbird (a) L2 Hyperpallium Mesopallium Nidopallium Arcopallium m m m m/a m/a m/a a m m a m m Figure S4
A. EGR1 Sanger vs PacBio zebra finch assemblies Sanger assembly (on Chr13) Gaps (100 Ns) in Sanger reference GC-Rich promoter region PacBio assembly (conpg 405, rev. comp.) B. EGR1 Illumina vs PacBio hummingbird assemblies Illumina assembly (scaffold 414) Beginning of EGR1 gene 544 N gap in Illumina assembly 1987 N gap in Illumina assembly Erroneous duplicapon in Illumina assembly Figure S5 PacBio assembly (conpg 251)
A. Raw PacBio SMRT genome reads against EGR1 Pacbio assembly 1 kb B. Raw PacBio IsoSeq mrna read against EGR1 assemblies Figure S6
Hummingbird vs zebra finch EGR1 region PacBio hummingbird assembly (conpg 251) Regions of low to no homology between species EGR1 gene GC-Rich region PacBio zebra finch assembly (conpg 405, rev. comp.) Figure S7
A. DUSP1 Sanger vs PacBio zebra finch assemblies Sanger reference (on Chr13) Erroneous R sequence placement in Sanger-based assembly Gaps (100 Ns) in Sanger reference Erroneous tandem duplicapon in Sanger-based assembly from each haplotype Microsatellite repeats PacBio assembly (conpg 32, rev. comp.) B. DUSP1 Illumina vs PacBio hummingbird assemblies Illumina hummingbird assembly (scaffold 56, rev. comp.) Beginning of DUSP1 gene Erroneous tandem duplicapon in Illumina-based assembly 1005 Ns gap in Illumina assembly Pacbio hummingbird assembly (conpg 11, rev. comp.) Figure S8
A. Apparent a.a. errors due to base call errors in Sanger finch reference zebra_finch QGLLHTVVLLDYRSADLEVPQRDSSMLFTLRLQFWHKN---- b_crowned_manakin QGLFHTVVLLDERSADLDAPKRDSTMLLALGTLCREARGARI c_flycatcher QGLFHSVVLLDERSADLEAPKRDSTVLLALGTLCREARGARI white_t_sparrow QGLFHTVVLLDERSADLEMPKRDSTMLLALGTLCREARGARI starling QGLFHTVVLLDERSADLEVPKRDSTMLLALGTLCREARGARI Great_tit QGLFHTVVLLDERSADLEVPKRDSTMLLALGTLCREARGARI ground_tit QGLFHTVVLLDERSADLEVPKRDSTMLLALGTLCREARGARI ***:*:***** *****: *:***::*::*.. B. Pacbio zebra finch DUSP1 reads has same base calls as other species Figure S9 Phe Glu Lys
A. Haplotype differences in DUSP1 microsatellite repeats 796 bp PacBio zebra finch secondary haplotype (conpg 32_022, rev. comp.) 720 bp 160 bp (8 repeat units) PacBio zebra finch primary haplotype (conpg 32, rev. comp.) 220 bp (11 repeat units) B. Species differences in DUSP1 microsatellite repeats Pacbio hummingbird assembly (conpg 11, rev. comp.) 1100 bp 270 bp PacBio zebra finch assembly (conpg 32, rev. comp.) Figure S10
A. DUSP1 PacBio vs single clone Sanger-based zebra finch assemblies PacBio assembly (conpg 32) No support of erroneous R sequence placement or tandem duplicapons found in original Sanger-based reference (Fig. S8A) (10 vs 11 repeats) (~320 bp vs 720 bp) Repeat sequences that vary between haplotypes and individuals Sanger-based single clone (AB574425.1) B. DUSP1 PacBio vs single clone Sanger-based hummingbird assemblies Pacbio hummingbird assembly (conpg 11) No support of erroneous tandem duplicapon in original Illuminabased reference (Fig. S8B) Sanger-based single clone (AB574427.1) Figure S11
A. DUSP1 B. FOXP2 Figure S12
A. FOXP2 Sanger vs PacBio zebra finch assemblies Sanger assembly (Chr1A) 9 Gaps (100 Ns each) PacBio assembly (conpg 5) B. FOXP2 Illumina vs PacBio hummingbird assemblies Illumina assembly (scaffold 125, rev. comp.) 22 Gaps (1.4 % of gene missing, 5993 Ns) Figure S13 PacBio assembly (conpg 110)
A. Predicted FOXP2 protein alignment across assemblies and species B. 88 Ns in the middle of exon 6 in Illumina assembly, making 2 exons C. 1 conpguous exon, PacBio assembly (1 SNP relapve to mrna) Figure S14
A. CorrecPon of homonucleopde and large segment in FOXP2 locus Region in 1 st intron: Illumina scaffold has 462 bp of addiponal sequence adjacent to the long T homonucleopde stretch. No read support for this stretch of sequence in the PacBio data B. Large delepon in one haplotype of FOXP2 locus PacBio secondary haplotype (conpg 110_009) Figure S15 PacBio primary haplotype (conpg 110)
A. SLIT1 Sanger vs PacBio zebra finch assemblies Finch Sanger reference 14 Gaps (100 Ns each) PacBio assembly (conpg 110) B. SLIT1 Illumina vs PacBio hummingbird assemblies Beginning of gene: - 495 Ns in Illumina - 567 bp in Pacbio 76% GC Illumina assembly (scaffold 522) 8 Gaps (5.3 % of gene missing, 3320 Ns) Figure S16 PacBio assembly (conpg 23)