1. For this question, use the data set “teamval.dta”, posted on the course web site. I have used the data set in class demonstrations, but just to remind you, each of the 28 observations refers to a major league baseball team as of 1994. Some of the variables in the data are
WORTH = value of the team, in millions (how much it would cost to buy the team)
WIN = Team winning percentage over the past three years, measured on a scale from 0 to 1;
POP = population of the city which the team is located, in thousands;
ATTEND = home attendance in the previous year, in tens of thousands;
Estimate the regression that relates the value of the team to the population of the city in which it is located and the attendance in the previous year.
(The stata command is “reg worth pop attend”)
a. Interpret the coefficient of pop. Is the sign of the coefficient what you would expect? Explain.
b. Interpret the coefficient of attend. Is the sign of the coefficient what you would expect? Explain.
c. Estimate the regression again, except this time add “win” as an independent variable. The coefficient on win will have a negative sign. Does this mean that you should conclude from this regression that winning lowers the value of a major league baseball team? Explain.
2. Use Stata and the data in discrim.dta (on the website) to answer the questions below. Each observation in this sample is a zip code area. There are variables that measure the average prices (in dollars) in the zip code area of various fast food items, along with average characteristics of the people who live in the zip code area. (Descriptions of each variable can be seen in the “variables” window of Stata). The idea is to see whether fast food restaurants charge higher prices in areas with a larger concentration of minorities.
a. What is the average value, minimum value, and maximum value for prpblck in this sample? Describe the meaning of these summary statistics.
b. Consider the following population regression model, explaining the price of an entrée (a burger or chicken sandwich) in terms of prpblck and income:
pentree = β0 + β1prpblck + β2income + u
If there is discrimination against blacks, what will be the sign of β1?
c. Using Stata, estimate the population model in part (b).
Interpret the estimated coefficient on prpblck. (Note: Your answer should specify the units the variables are measured in. It should not simply say “a one unit increase”. Also, since prpblck is measured on a zero to one scale, the only possible “one unit” increase would be from zero to one. Keep this in mind when you answer this question.)
d. The variable pentree is measured in dollars. Suppose that it had been measured in cents instead. What would the OLS estimate of β1 have been in this case?
e. Now estimate this population model
lpfries = β0 + β1prpblck + β2lincome + β3prppov + u
where lpfries is the log of the price of a small order of fries, lincome is the log of the income variable, and prppov is the proportion of people living in the restaurant’s zip code who are below the poverty line. Note that because the dependent variable and the income variable are both in log form, the elasticity of the price of fries with respect to income is constant.
Based on this regression, what would be the predicted impact on the price of fries of a 10% increase in the median income of the restaurant’s zip code, other things equal?
f. In the poorest zipcode in the sample, prppov is about .42. For the average zipcode, prppov is .07. Based on the results of the model estimated for part e, fill in the blank in the statement below (your answer should include a number):
“Other things equal, the price of fries would be _______________________ in the average zipcode area, compared to the poorest zipcode area”
g. You can calculate the correlation coefficient between prppov and lincome by typing the command “corrprppovlincome”. What is the correlation between these two variables?
h. Estimate the model
lpfries = β0 + β1prpblck + β2lincome + u
You will see that the OLS coefficient on lincome for this model is still positive, but smaller than the OLS coefficient for lincome estimated for part e. Explain why this has occurred.