Introduction Identification and Management of Loss of Function Alleles Impacting Fertility L1 Dominette 01449 Jerry and I am a NGS addict Jerry Taylor taylorjerr@missouri.edu University of Missouri 2014 Applied Reproductive Strategies in Beef Cattle Conference, October 9, 2014 Stillwater, OK 1 Whole Genome Sequencing 2 Mendelian Loci Fanconi Syndrome (FS) is a generalized proximal tubule reabsorption deficiency, characterized by metabolic acidosis, amino aciduria, glucosuria, and phosphaturia. An autosomal recessive, adult-onset FS occurs in Basenjis. 22 cases and 37 controls Basenji - Fanconi Syndrome 18368 26081 29202 28773 15916 12915 26082 26147 26155 2625 25288 31265 26148 26152 26153 26150 30774 26154 26151 26149 17251 26157 26156 38584 38644 31729 31198 2622 33167 38564 38866 29500 29504 31403 38583 38531 38433 28889 33269 30568 2915 39993 39996 39228 2614 29933 30567 30141 30487 30171 31170 29934 30061 A 12301 34248 28883 26146 2638 24445 29499 34274 29932 2637 30527 30622 34178 LOD score 10 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 CFA3 position (Mb) Homozygosity mapping restricted the FS locus to 2.7 Mb region of CFA3 containing 27 genes 3 Sequence Affected Dogs Compare the sequence of 1 affected dog to those of NC other dogs with different diseases But What does it mean to sequence a genome?: What sequencing technology? (Read length, single-end, paired-end, matepair error rate) What kind of sequencing libraries? (Fragment size) What depth of sequence coverage? (Average # reads covering a random base) How do I analyze the data? 4 Affected (NA=1) vs Controls (NC=84) The causal variant is in the sequence This ignores regions of homozygosity Align to reference assembly 5 6 1
Dog Disease Mutations Occur in Regions of Homozygosity by Descent LOF Mutations in Dogs 7 SAG - causes progressive retinal atrophy in Basenjis SERAC1 distinct mutations cause Canine Multiple System Degeneration in Chinese Crested and Kerry Blue Terriers KCNJ10 - causes Progressive Spinocerebellar Ataxia and Myokymia and seizures in Jack Russell Terriers RBM20 - causes dilated cardiomyopathy in Giant and Standard Schnauzers CLN8 - causes neuronal ceroid lipofuscinosis in Australian Shepherds PIGN - causes proxysmal dyskinesia in Soft Coated Wheaten Terriers MFSD8 causes neuronal ceroid lipofuscinosis in Chinese Crested dogs 8 LOF Mutations in Cows I Got to Sequence My Own Dogs 197 sequenced animals At least two animals with variant Essential = Gene essential for life in mouse You are a nobody in genomics until you sequence your own dog... These are probably all lethals Some of these are going to be lethals 9 LOF Essential Genes in Angus 10 Relationship to Heifer Pregnancy EPD Carry same number of lethals but have very different effects on ferality due to how common they are in the populaaon Range 5 to 23 Average = 12.04 11 12 2
Impact of Lethals If N genes have lethal LOF alleles each at a frequency of q On average each bull would carry 2qN lethals (N=86 Angus; Average = 12; Range 5-23) Average human carries ~20 lethals (so qn ~ 10, e.g. N=1000 and q=0.01) The frequency of lost fertilizations due to homozygous lethals is ~1-(1-q 2 ) N How Do We Test if LOF Alleles Are Lethal? Make a genotyping chip and put all candidates on it (PracAcal limit is 200,000 variants) ProporAon of feralizaaons lost for N lethal genes each at a frequency of q such that 2.5 < qn < 20 (qn = 10) Genotype 10,000 heifers (We have ~7,000 collected) THIS IS GOING TO COST ~$400,000 See which ones NEVER turn up as homozygotes (out of the 200,000 we tested) 13 14 Another Approach Impute 50K Data to Genome Sequence 3,570 Angus Bulls - FindHap Pair of Chromosomes Variable posiaons on 50K Variable posiaons in Sequence If I know EXT s 50K genotypes Can I esamate his genome sequence? 15 16 Imputation GWAS in 3,570 Angus Bulls The accuracy of imputation depends on: How closely related the sequenced bulls are to the genotyped bulls We have been sequencing bulls with the largest numbers of registered descendants in each breed With 99 sequenced bulls the average accuracy of imputation is >90% in 3,570 registered animals We should be sequencing more bulls in every breed We are beginning to assemble a lot of 50K data AAA: >60,000 ASA/CSA: >24,000 AHA: >6,000 RAAA: >3,800 NALF: >3,400 AICA: >1,200 If we can accurately impute 50K to sequence data We can test embryonic lethals in every breed with sequence data FOR FREE We can identify all carriers FOR FREE We can use mate selection software to avoid mating carriers We can also use this data to hunt for large effect genes 17 18 3
Impute Angus Variants GWAS on Imputed Sequence Data BTA7 @ 93,218,452 bp 92.75 94.25 Mb 13,800 variable biallelic loci detected in 94 sequenced Angus 4,835 with MAF 5% (13 on BovineSNP50) BTA20 @ 4,713,452 bp 4.00 5.30 Mb 7,406 variable biallelic loci detected in 94 sequenced Angus 3,060 with MAF 5% (28 on BovineSNP50) Impute 50K data to sequence data with FindHap Only 2.5% of animals have complete data 95.5% of genotypes are missing >90% accuracy Mutant 19 20 Last of the Cattle Mendelians? Grosz and MacNeil (1999) linkage mapped the Hereford Spotted locus to a 21 Mb region of BTA6 (64.5-85.5 Mb) Not within an annotated gene 21 22 Hereford Coat Color Lynsey Whitacre utilized BovineHD data (770K SNPs) in 811 Herefords to refine and annotate the sweep region found by Holly Ramey Hereford Coat Color Lynsey created 2 Mb de novo sequence assemblies for two Hereford and two Angus bulls for the genomic region harboring KIT and compared them for the 415 kb selective sweep region She found 292 SNPs (5 in KIT) and 33 small indels predicted to be fixed for different alleles between Angus and Hereford But none were in coding regions Isn t this weird? 23 24 4
Structural Variants Nucleotide resolution depth of coverage data was generated for 14 Hereford and 10 Angus animals individually sequenced to ~30X 25 15 Indicine & 3 Bison Structural Variation Causes Spotted BRAHMAN Structural variation reduces transcriptional activity of KIT: White face phenotype is dominant Allelic effects on rate of transcription behave additively the face is the last place that migrating melanocytes arrive 26 NELORE GIR BISON But which duplication causes the phenotype? The upstream duplication interrupts a long-range KIT enhancer (~100 kb away in other species) The intronic duplication increases the size of the unprocessed transcript by ~110 kb 27 Upstream Duplication 28 Beefmaster Simmental are the only other breed to have the upstream duplication ¼ Shorthorn, ¼ Hereford, ½ Brahman 29 30 5
The End 31 32 6