Section: 101 (2pm-3pm) 102 (3pm-4pm)

Stat 20 Midterm Exam Instructor: Tessa Childers-Day 12 July 2012 Please write your name and student ID below, and circle your section With your signature, you certify that you have not observed poor or dishonest conduct on the part of your classmates You also certify that you have not been a party to poor or dishonest conduct, and that the work on this exam is solely your own Name: Student ID: Signature: Date: Section: 101 (2pm-3pm) 102 (3pm-4pm) Answer the questions in the spaces provided There are questions on the front and back of each page This midterm covers the material from Lectures 1 through 13, and Homeworks 1 through 6 Show your work, including labeling quantities (such as z-scores) The clearer that your work is, the easier it is to award partial or full credit If you do not show your work, you will not receive credit You are welcome to leave your answers as fractions If you use decimals, please round all answers to two significant figures, and hold your rounding until the final calculation Question Points Score 1 8 2 2 3 6 4 3 5 4 6 20 7 7 8 5 9 5 Total: 60

Stat 20 Midterm Exam, Page 2 of 11 12 July 2012 1 I am interested in exploring rates of condom usage by college students in the United States, including factors that may influence usage I want to design a survey and collect student responses Please address each of the following issues, being as specific as possible (a) (2 points) (Survey or Census) A fellow investigator believes that we should not do a survey, and instead should perform a census (questioning all college students) How would you explain why a survey is preferable to a census in this case? Or is the other investigator correct? (b) (3 points) (Sample Design) I propose to create a list of all colleges in the United States, by state From each state, I will choose the 2 largest colleges From each college, I will sort students alphabetically, and mail every 100 th student a survey What kind of sampling plan is this? Comment on strengths and weaknesses of this plan, including what sort of biases, if any, I should worry about

Stat 20 Midterm Exam, Page 3 of 11 12 July 2012 (c) (3 points) (Survey Design) Recalling that the purpose of the study is to explore rates of condom usage and factors that influence usage, I decide to ask how often a student uses a condom, when engaging in intercourse Is this all I should ask about, or are there other variables that I should collect data about? If there are other variables, please give several examples How should questions be asked? What sort of biases, if any, should I worry about? 2 (2 points) The table below could be used to generate a histogram showing the distribution of the scores on a Stat 20 quiz (though the histogram is not shown here) There were 8 possible points, and due to partial credit, no students scored a 0 The table below gives the intervals and heights of the bars Intervals contain the right endpoint but not the left Fill in the height for the missing interval score (in points) 0-3 3-4 4-5 5-6 6-8 height (% per point) 333 17 19 21

Stat 20 Midterm Exam, Page 4 of 11 12 July 2012 3 Suppose that, on any given day, if the temperature reaches 70 (or above), I open my office window If the temperature does not reach 70 F, I do not open my office window All temperatures are measured in degrees Fahrenheit I collect data on 100 days of temperatures, as well as the status of my office window I record the information about the window as open or closed Later, I decide to arbitrarily assign numeric codes to window status, coding 1 if the window is open and 0 if the window is closed A sample of 5 days of data are shown below Temperature Window Status Window Status (coded) 71 open 1 69 closed 0 66 closed 0 66 closed 0 70 open 1 Circle the correct answer: (a) (1 point) The level of measurement for temperature is A Nominal B Ordinal C Interval D Ratio (b) (1 point) The level of measurement for window status is A Nominal B Ordinal C Interval D Ratio (c) (1 point) The level of measurement for window status (coded) is A Nominal B Ordinal C Interval D Ratio (d) (3 points) I tell my office-mate, Since we always open the window if the temperature is 70 or higher, knowing the temperature perfectly predicts the status of our window Thus, the correlation between temperature and window status is +1 She disagrees, claiming to have calculated the correlation, and found that it is 083 Is she right, am I right, or are we both wrong? How do you explain this contradiction? (Hint: Drawing a picture may help)

Stat 20 Midterm Exam, Page 5 of 11 12 July 2012 4 (3 points) I own 10 pairs of socks They can be described by two attributes coloring and pattern 4 pairs are brightly colored, while 6 pairs are striped 3 pairs are both brightly colored and striped What is the chance I wear brightly colored or striped socks three days in a row? (Assume that all of my socks are clean at the beginning of the first day, and that I neither do laundry nor re-wear socks) 5 Below is a histogram containing the number of drivers killed in the UK for each month, from January 1969 through December 1984 Histogram of Driver Deaths per Month in UK: 1969 1984 Density 0000 0005 0010 0015 60 80 100 120 140 160 180 200 Number of Drivers Killed (a) (2 points) Sketch and label the approximate locations of the mean, median, and mode on the histogram No calculations are necessary

Stat 20 Midterm Exam, Page 6 of 11 12 July 2012 (b) (2 points) There are 4 boxplots shown One of them corresponds to the same data set as the histogram in (a) (the other three do not) Indicate which boxplot corresponds to the histogram in (a) (Labels are shown below the corresponding boxplot) (b) Boxplot of Driver Deaths per Month in UK: 1969 1984 Boxplot of Driver Deaths per Month in UK: 1969 1984 Number of Drivers Killed 60 80 100 120 140 160 180 200 Number of Drivers Killed 60 80 100 120 140 160 180 200 Boxplot #1 Boxplot #2 Boxplot of Driver Deaths per Month in UK: 1969 1984 Boxplot of Driver Deaths per Month in UK: 1969 1984 Number of Drivers Killed 60 80 100 120 140 160 180 200 Number of Drivers Killed 60 80 100 120 140 160 180 200 Boxplot #3 Boxplot #4

Stat 20 Midterm Exam, Page 7 of 11 12 July 2012 6 Measurements of the weight of 44 snowy plover eggs (along with the weight of the 44 chicks after they are hatched) were collected by BLSS: The Berkeley Interactive Statistical System of Abrahams and Rizzardi The weight of the egg and of the newly hatched chick are measured in grams (gm) A scatterplot of chick weight versus egg weight is shown below, along with means, SDs, and the correlation coefficient Assume that the scatterplot is football shaped, and thus that egg weight and chick weight have approximately normal histograms Chick Weight vs Egg Weight Chick Weight (gm) 55 60 65 70 75 80 85 90 95 100 Egg Weight (gm) Egg Weight: mean = 863 gm sd = 048 gm Chick Weight: mean = 615 gm sd = 041 gm correlation: r = 085 (a) (3 points) Fill in the blank: About 40% of the chicks have weights between 58 grams and grams

Stat 20 Midterm Exam, Page 8 of 11 12 July 2012 (b) (2 points) One of the chicks is at the 75 th percentile for weight Predict the weight of the egg it hatched from Include units (c) (2 points) The prediction in (b) is likely to be off by how much? Include units (d) (4 points) Among all chicks at the 75 th percentile for weight, approximately what percent hatched from eggs heavier than 75% of all eggs?

Stat 20 Midterm Exam, Page 9 of 11 12 July 2012 (e) (2 points) Using the regression method, and predicting chick weight from egg weight, we find that eggs weighing 8 grams, produce chicks weighing 569 grams, on average True or false, and explain your answer: using the regression method, and predicting egg weight from chick weight, we would find that chicks weighing 569 grams hatch from eggs weighing 8 grams, on average (f) (4 points) Somehow, the data on chicks weighing more than 62 grams is lost between egg weight and chick weight for the remaining data points will be: The correlation A About the same (r 085) B Somewhat less (r < 085) C Somewhat greater (r > 085) Circle the correct choice, and explain why it is correct (g) (3 points) One of your classmates observes that chick that hatch from small eggs tend to be larger than the regression line predicts, while chicks that hatch from large eggs tend to be smaller than the regression line predicts They attribute this to small eggs having thin shells, and large eggs having thick shells (ie, the weight is due not to the chick, but to the shell) Does this make sense? Or is something else at work here?

Stat 20 Midterm Exam, Page 10 of 11 12 July 2012 7 There are 20 coins in a jar Of these, 8 are quarters, 5 are dimes, 3 are nickels, and 4 are pennies 8 coins are drawn at random, without replacement from the jar (a) (2 points) What is the chance that the first coin is a quarter and the second coin is a dime? (b) (2 points) What is the chance that the fourth coin is a quarter and the eighth coin is a dime? (c) (3 points) What is the chance that the last two coins are of the same denomination?

Stat 20 Midterm Exam, Page 11 of 11 12 July 2012 8 A fair die is rolled 5 times Find the chances of the following events (a) (2 points) All 6 s (b) (3 points) At least two 6 s 9 A fair die is rolled 10 times Find the chances of the following events (a) (2 points) Exactly five 5 s (b) (3 points) At most one 6 in the 10 rolls, given there are no 6 s in the first five rolls