Bayesian Analysis of Population Mixture and Admixture

Similar documents
SEDAR31-DW30: Shrimp Fishery Bycatch Estimates for Gulf of Mexico Red Snapper, Brian Linton SEDAR-PW6-RD17. 1 May 2014

Determining the Infection Status of a Herd

A Genetic Comparison of Standard and Miniature Poodles based on autosomal markers and DLA class II haplotypes.

Question 3 (30 points)

Biology 164 Laboratory

Introduction Histories and Population Genetics of the Nile Monitor (Varanus niloticus) and Argentine Black-and-White Tegu (Salvator merianae) in

WILDCAT HYBRID SCORING FOR CONSERVATION BREEDING UNDER THE SCOTTISH WILDCAT CONSERVATION ACTION PLAN. Dr Helen Senn, Dr Rob Ogden

Manhattan and quantile-quantile plots (with inflation factors, λ) for across-breed disease phenotypes A) CCLD B)

arxiv: v1 [stat.ap] 12 Jan 2014

Genetics for breeders. The genetics of polygenes: selection and inbreeding

Biology 2108 Laboratory Exercises: Variation in Natural Systems. LABORATORY 2 Evolution: Genetic Variation within Species

Biochemical HA T FT AD Iceland (1,2) Cohort IM Clinical HA. 10 follicles 2 10 mm or > 10 cc volume. > 63 ng/dl NA >3.8 ng/ml. menses/yr.

Faculty of Agricultural and Nutritional Science

Pre-AP Biology Tuesday February 20. Introduction to Pedigrees

Studying Gene Frequencies in a Population of Domestic Cats

The Role of Genetics in Pigeon Racing

The Genetics of Color In Labradors

VIZSLA EPILEPSY RESEARCH PROJECT General Information

Bi156 Lecture 1/13/12. Dog Genetics

Cow Exercise 1 Answer Key

Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

September Population analysis of the Old English Sheepdog breed

Evolution in dogs. Megan Elmore CS374 11/16/2010. (thanks to Dan Newburger for many slides' content)

Economically important trait. Increased demand: Decreased supply. Sheep milk cheese. 2007: $2.9 million for milk production (Shiflett, 2008)

Dark Skin, Blond Hair: Surprise in the Solomon Islands

Inheritance of Livershunt in Irish Wolfhounds By Maura Lyons PhD

Lesson Overview. Human Chromosomes. Lesson Overview Human Chromosomes

Finnzymes Oy. PathoProof Mastitis PCR Assay. Real time PCR based mastitis testing in milk monitoring programs

Breeding Icelandic Sheepdog article for ISIC 2012 Wilma Roem

Genes and Alleles Genes - Genes PIECE CHROMOSOME CODE TRAIT HAIR COLOUR LEFT HANDEDNESS CHARACTERISTIC GENE

HEREDITARY STUDENT PACKET # 5

SHEEP SIRE REFERENCING SCHEMES - NEW OPPORTUNITIES FOR PEDIGREE BREEDERS AND LAMB PRODUCERS a. G. Simm and N.R. Wray

Breeding Bunnies. Purpose: To model the changes in gene frequency over several generations. 50 orange beads 50 purple beads 1 paper bag 3 cups

Genome 371; A 03 Berg/Brewer Practice Exam I; Wednesday, Oct 15, PRACTICE EXAM GENOME 371 Autumn 2003

Clarifications to the genetic differentiation of German Shepherds

Phenotypic and Genetic Variation in Rapid Cycling Brassica Parts III & IV

Patterns of heredity can be predicted.

Monarchs: Metamorphosis, Migration, Mimicry and More

September Population analysis of the Neapolitan Mastiff breed

TOPIC 8: PUNNETT SQUARES

September Population analysis of the Boxer breed

Population Structure and Biodiversity of Chinese Indigenous Duck Breeds Revealed by 15 Microsatellite Markers

September Population analysis of the Maltese breed

September Population analysis of the Poodle (Standard) breed

17 Inherited change Exam-style questions. AQA Biology

Assessment of coyote wolf dog admixture using ancestry-informative diagnostic SNPs

1 In 1958, scientists made a breakthrough in artificial reproductive cloning by successfully cloning a

Genetics Lab #4: Review of Mendelian Genetics

September Population analysis of the Spaniel (English Springer) breed

September Population analysis of the Dalmatian breed

September Population analysis of the Borzoi breed

Studying Mechanisms of Inheritance using Drosophila melanogaster

Punnett Squares. and Pedigrees. How are patterns of inheritance studied? Lesson ESSENTIAL QUESTION. J S7L3.b Reproduction and genetic variation

A Discrete-Event Simulation Study of the Re-emergence of S. vulgaris in Horse Farms Adopting Selective Therapy

Development and validation of a diagnostic test for Ridge allele copy number in Rhodesian Ridgeback dogs

Genetics and Probability

1. Describe the series of steps that you would perform to isolate arginine-requiring mutants from a wild-type haploid yeast strain.

Aalborg Universitet. Published in: Ecology and Evolution. DOI (link to publication from Publisher): /ece3.1815

welfare and optimise performance further.

Student Exploration: Mouse Genetics (One Trait)

A statistical approach for evaluating the effectiveness of heartworm preventive drugs: what does 100% efficacy really mean?

+ Karyotypes. Does it look like this in the cell?

September Population analysis of the Chesapeake Bay Retriever breed

9-2 Probability and Punnett. Squares Probability and Punnett Squares. Slide 1 of 21. Copyright Pearson Prentice Hall

Results for: HABIBI 30 MARCH 2017

September Population analysis of the Leonberger breed

2013 Holiday Lectures on Science Medicine in the Genomic Era

Washington State Department of Fish and Wildlife Fish Program, Science Division Genetics Lab

Inference of the Demographic History of the Domestic Dog (Canis lupus familiaris) by Julie Marie Granka January 2008 Dr.

No tail (Manx) is a dominant trait and its allele is represented by M The presence of a tail is recessive and its allele is represented by m

Genes What are they good for? STUDENT HANDOUT. Module 4

Management. of genetic variation in local breeds. Asko Mäki-Tanila. Reykjavik 30/4/2009. Embryocentre Ltd

Supplementary material to Forecasting with the Standardized Self-Perturbed Kalman Filter

COURSE SYLLABUS. Academic year

September Population analysis of the Belgian Shepherd Dog (Malinois) breed

COURSE SYLLABUS. Course name: Animal Breeding and Production (3 rd semester) Academic year

Genetics Lab #4: Review of Mendelian Genetics

EU Research on Antimicrobial drug resistance Anna Lönnroth Sjödén Unit Infectious Diseases, Directorate Health DG Research European Commission

Jerry and I am a NGS addict

Phylogeny Reconstruction

Ursula Gonzales-Barron 1, Ilias Soumpasis 1, Francis Butler 1 & Geraldine Duffy 2. UCD School of Agriculture, Food Sci. & Vet. Med.

Council on Dairy Cattle Breeding Genomic evaluations including crossbred animals. Ezequiel L. Nicolazzi and George Wiggans March 15 th, CDCB Webinar

September Population analysis of the Basset Griffon Vendeen (Grand) breed

Hunting Zika Virus using Machine Learning

13) PHENOTYPE: the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment.

Response to SERO sea turtle density analysis from 2007 aerial surveys of the eastern Gulf of Mexico: June 9, 2009

Mendelian Genetics and Punnett Squares 5/07 Integrated Science 2 Redwood High School Name: Period:

Muppet Genetics Lab. Due: Introduction

GENETIC DIVERSITY IN EIGHT PURE BREEDS AND URBAN FORM OF DOMESTIC PIGEON (COLUMBA LIVIA VAR. DOMESTICA) BASED ON SEVEN MICROSATELLITE LOCI ABSTRACT

Disease Ecology: The role of global change on emerging infectious diseases

Modeling and Control of Trawl Systems

Are Turtles Diapsid Reptiles?

Genetics Practice Problems. 1. For each genotype, indicate whether it is heterozygous (HE) or homozygous (HO) AA Bb Cc Dd.

GENETIC DRIFT Carol Beuchat PhD ( 2013)

B- indicates dominant phenotype

Lesson Overview. Human Chromosomes. Lesson Overview Human Chromosomes

1 - Black 2 Gold (Light) 3 - Gold. 4 - Gold (Rich Red) 5 - Black and Tan (Light gold) 6 - Black and Tan

Haveouts Guided Notes Pen/pencil DFAD Privacy Folder Silent after the bell rings

September Population analysis of the Akita breed

Visit for Videos, Questions and Revision Notes.

Transcription:

Bayesian Analysis of Population Mixture and Admixture Eric C. Anderson Interdisciplinary Program in Quantitative Ecology and Resource Management University of Washington, Seattle, WA, USA Jonathan K. Pritchard Department of Statistics University of Oxford, UK This Research Supported by: NSF BIR 9807747

Overview 1 A Motivating Problem Felis sylvestris in Scotland 2 A model for population mixture 3 A model for population admixture Block updating Gibbs sampler A Baum et al. type of computation 4 Simultaneous mixture/admixture analysis 4 Results

Felis sylvestris: History and Genetic Data Data Provided by Mark A. Beaumont (University of Reading, UK): 230 Wild-Living Cats Genotyped at 8 Microsatellite Loci

Genetics Background Each cell has many pairs of chromosomes Very precise locations in the genome may be reliably found and analyzed. Such a location is called a LOCUS (plural = LOCI). Genetic variants at a locus are known as alleles. Each individual has two copies of genetic material at a locus which determine its single-locus genotype. The probability that an individual carries a particular allele at a locus depends on how frequent that allele is in the population. For an individual from a population in equilibrium, the alleles carried are independent of one another within and between loci.

Model For Genetic Mixture Population 0 Population 1 Allele Freqs θ 0 Allele Freqs θ 1 π 1 π Using a sample from the mixture the goals are to: 1. Estimate the allele frequencies in Populations 0 and 1 2. Estimate the mixing proportion π 3. For each individual in the sample, compute the posterior probability that it is from Population 0 or 1

Goals 1 and 2 would be made very easy if we could observe for each cat a variable z i : { 0ifi th cat is from Pop. 0 z i = 1ifi th cat is from Pop. 1 Of course, we do not know z i,itisalatent variable. However, if we knew the allele frequencies and the mixing proportions, we could compute the probability distribution for z i given the i th cat s multilocus genotype: P (z i = 0 θ 0, θ 1, π, gtyp i )= πp(gtyp i θ 0,z i =0) πp(gtyp i θ 0,z i =0)+ (1 π)p (gtyp i θ 1,z i =1) Taking Dirichlet priors for θ 0, θ 1 and π, the inclusion of the variables z i makes Gibbs sampling straightforward in this model. Bayesian inference following Diebolt & Robert (1994)

A Schematic of Genetic Admixture Time This requires a different probability model with different latent variables

Latent Data, q i q Beta(α, α) For the i th cat: q i and w for the Admixture Model Pritchard et al. (2000) 0 0 1 0 1 - q i Pop 1 "Gene Pool" 1 1 0 0 Each gene copy comes from Pop 0, independently, with probability q i The t th gene copyinthei th cat gets w it =0or 1 (Flags in Diagram) Pop 0 "Gene Pool"

Hierarchical Structure of the Admixture Model α Rectangular Hyperprior on (0,A) q 1 q 2 q n w 1 w 2 w n gtyp 1 gtyp 2 gtyp 3 θ 0,θ 1 Independent Dirichlet Priors Allows straightforward Gibbs sampling for θ, w, and q Metropolis-Hastings update for α (slow mixing)

Eliminating the q i s After integrating out q i, the w it within the i th cat have a labelled beta-binomial distribution with parameters (α, α) This has an interpretation as a Pólya-Eggenberger urn scheme This, in turn, has a Markov chain interpretation 0 1 1 0 0 1 1 1 Forward-Backward algorithms for Hidden Markov Chains allow: Joint updating of the w it s from their full conditional dsn within the i th cat Better-mixing Metropolis updates for α Efficient calculation of P (gtyp i α, θ)

Simultaneous Mixture/Admixture Analysis If possible we would like to separate our sample into two groups: Pure individuals in a mixture governed by π Admixed individuals with admixture proportions governed by α. But we don t know for certain which individuals are Pure and which are Admixed. Different partitions of the sample into the Pure and the Admixed groups correspond to different models that we must average over.

ADG for Simultaneous Mixture/Admixture Analysis α Rectangular Hyperprior on (0,A) w 1 w 2 w na gtyp 1 gtyp 2 gtyp na Model at left corresponds to one partition of the cats in the sample into Pure and Admixed groups. PURE θ 0,θ 1 ADMIXED gtyp 1 gtyp 2 gtyp 3 gtyp 4 gtyp np Green (1995) describes reversible-jump methodology for general sampling over such partitions of data. z 1 z 2 z 3 z 4 π Rectangular Prior on (0,1) z np However, since we are able to integrate out the q i s, we may employ Gibbs sampling over the partitions.

This gives us: Very fast mixing between partitions of the data (cats mix appropriately quickly between Pure and Admixed groups) Rao-Blackwellized Monte Carlo estimates of the posterior probability that a sampled cat is Pure or Admixed. And allows inference of other interesting quantities: The proportion of Pure/Admixed cats in the population from which the sample was drawn The proportion of Pure Sylvestris cats in the population The proportion of Pure housecats The allele frequencies in the two putative gene pools

0.04 Results for Scottish Cats I Posterior Density 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 Proportion of Cats that are PURE Posterior Density 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 Proportion of PURE Cats that are putatively F. sylvestris

Posterior Prob("F. sylvestris" PURE) Results for Scottish Cats II 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Posterior Prob(PURE) for 230 Wild-living Cats Note: Known house cats, if included, cluster with the others on the bottom right half. Also: Estimated allele frequencies for the Non-Sylvestris gene pool are very close to those of English housecats.

Summary Genetic mixture model Pritchard et al. s genetic admixture model Novel computations that improve MCMC in the admixture model Simultaneous consideration of mixture and admixture models Example Dataset: Felis sylvestris in Scotland 43% to 81% of the cats may be of pure origin Between 6% and 31% of those may be feral housecats Individuals may be classified on the basis of their posterior probability of being pure or admixed. We anticipate that these methods will be widely useful in studying natural populations.