# The world bank collects data on many variables related to world

MATH-220–Assignment #3

Question 1: The World Bank collects data on many variables related to world development for countries throughout the world. Two of these are Internet use (in number of users per 100 people) and life expectancy (in years). The data file is provided separately.

1. Make a scatterplot of life expectancy (the response variable) versus internet use. Describe the relationship. Is there an overall pattern? Do you see any deviation from that pattern?

2. Compute the correlation coefficient R between life expectancy versus internet use.

3. A friend looks at the scatterplot and concludes that using the internet will increase the length of your life. Would you agree with her? Explain your answer.

4. Make a scatterplot of life expectancy versus internet use, but this time use different symbols for European and non-European countries? Do you wish to modify your answer to question c). Explain.

Minitab Instructions: a) To obtain the scatterplot:

• Choose ‘Graph → Scatterplot’

• Click on the ‘Simple’ option

• In the first row of the ‘variables’ table enter

Copy and paste the graph in your Word file. Describe what you see.

b) To find the correlation coefficient:

• Choose ‘Stat → Basic statistics → Correlation’

• Select Life Expectancy and Internet Use for the Variables box

• Deselect the ‘Display P-value’ box and press Enter

The correlation coefficient appears in the Session window.

c) To obtain the scatterplot with different symbols for European and non-European countries:

• Choose ‘Graph → Scatterplot’

• Click on the ‘With groups’ option

• In the first row of the ‘variables’ table enter

• the response for the ‘Y’ variable

• the explanatory variable for the ‘X’ variable

• the Region variable in the ‘Categorical variables for groups’ box and press Enter

Question 2: Old Faithful Geyser in Yellowstone National Park is renowned, among other things, for the regularity of its eruptions. The eruption durations (X, in minutes) and the subsequent intervals before the next eruption (Y, in minutes) are provided in a separate file.

1. Make a scatterplot of the interval variable versus the duration variable. Describe the relationship. Is there an overall pattern? Do you see any deviation from that pattern?

2. Find the correlation coefficient R between interval and duration. What would happen to the value of R if the scales were transformed in hours for the interval and duration variables.

3. Find the equation of the regression line for predicting interval from duration. In simple language, what is the slope of the line telling us?

4. Add the regression line to the scatterplot.

5. Find the percent of variation in the interval variable that is explained by the model. Does the regression model provide a good fit?

6. Make a residual plot from the linear regression model you constructed above. Discuss the appropriateness of the model.

7. Use the equation of the regression line to predict the subsequent interval before the next eruption for an erution that lasted 5 minutes. How confident are you that the prediction is quite accurate?

Minitab Instructions:

a) Proceed as in question 1-a).

b) Proceed as in question 1-b).

c) and d) To obtain the equation of the regression line and to plot it on the scatterplot:

• Choose ‘Stat → Regression → Fitted line plot’

• Select the appropriate variable for the ‘Response’ box and press Tab

• Select the appropriate variable for the ‘Predictors’ box and press Enter

Paste the graph to your MSWord file. You will note that the equation of the regression line is printed on the graph, along with the value of R2 (ignore the adjusted R2). For ease of reference, rewrite the equation separately in your word document. What is the slope telling us?

f) To make a residual plot:

• Choose ‘Regression → Regression’

• Choose the interval variable for the Response box and press tab

• Choose the duration variable for the Predictors box

• Click Graphs open

• Move the cursor to the ‘Residuals vs. the variables:’ box and choose the duration variable (the explanatory variable) for that box.

• Press Enter twice

Question 3: One of the most dangerous contaminants deposited over European countries following the Chernobyl accident of April 1987 was radioactive cesium. To study cesium transfer from contaminated soil to plants, researchers collected soil samples and samples of mushroom mycelia from 17 wooded locations in Umbria, Central Italy, from August 1986 to November 1989. Measured concentrations (Bq/kg, Bq or becquerel, is a unit of radioactivity) of cesium in the soil are given in a separate data file.

1. Construct a scatterplot using Y = concentration in mushrooms and X = concentration in soil. Describe the relationship between the two variables.

2. Fit a linear model and and report the correlation coefficient.

3. Exclude sample number 17 and repeat parts a) and b).

4. What is the effect of case 17 on the linear model and the correlation coefficient.

Minitab instructions: Combine the instructions from questions 1 and 2.

Question 4: (Paper and pencil and/or Excel)

Read the set up in Problem 2.170, p. 160 (7e) in the textbook:

1.   Find the conditional distributions of the field of study variable for each region.

2. Construct the bar graphs of the three conditional distributions on the same page (Excel does this very nicely).

3. Provide a brief description of the relationship between field of study and region.