#delimit; set more 1; clear; cap log close; log using h:/1433/whales.log, replace; set memory 10m; use h:/1433/whales.dta, replace; ***********************************************************************; * * WHALES.DO * * An introduction to stata. * * edited - jmwilder 9/18/01 * ***********************************************************************; /* describing the data */ d; /* of note: the variable names are terrible. We'll rename them to be a little more descriptive. */ rename gs GLships; rename gw GLwhales; rename ds DSships; rename dw DSwhales; /* summarizing the data */ summ; summ if war != 1; bysort war: summ; /* equivalently, sort war; by war: summ; */ /* or even better, we can introduce the tabulate command */ tab war; tab war, missing; tab year if war == 1; /* important! == vs. = */ tab war, summ(GLships); /* alternatively, if we are interested in looking at the data itself. We can use the list command (which I use only for the smallest of tasks because it is dominated by the 'browse' command. However, output from that command can not be sent to a logfile. */ list year GLships GLwhales if war == 1; /* Histograms and scatterplots can also be useful. bin(20) gives the number of bars in the histogram in the first line. The second line gives a scatterplot. */ graph GLships, bin(20) saving(gph1,replace); graph GLships GLwhales, saving(gph2, replace); ************************ Empirical Analysis ***************************; /* Having taken a look at the data, I turn to the questions I wanted to ask in the first place: 1. Does technology improve over time? 2. What are the returns to scale? Does sending an additional ship to a fishing area crowd the others and reduce their yields? 3. Do fishermen respond with particular zeal to a good year by sending out a large fleet of ships? */ /* To start to get at the first question, I want to generate the average yield per ship in each year. */ gen GLw_s = GLwhales/GLships; gen DSw_s = DSwhales/DSships; /* now I look at how the two series trend over time */ graph GLw_s DSw_s year, connect(ll) saving(gph3,replace); /* that doesn't look good because STATA had no idea that the series was naturally ordered along the x axis. we'll need to sort first. */ sort year; graph GLw_s DSw_s year, connect(ll) saving(gph4,replace); /* there doesn't seem to be much evidence of this. But let's do some regressions just to get a feel for how to implement the procedures. I'll focus my attention on Greenland. */ /* Let's say I want to fit a quadratic, then I'll need a linear trend and a squared term. I'll create each. */ sort year; gen trend = _n; gen trend2 = trend*trend; /* this wasn't exactly necessary because we already have a linear time trend with the variable year. But I wanted to use it to introduce explicit subscripting via '_n' because it is among the most useful tools in STATA. Using it creatively can avoid looping as one might in C (this is much faster.) '_n' represents the observation number. This is not something intrinsic to a particular observation but depends on how the data are sorted. _N gives the total number of observations. Use of the 'by' command resets _n for each category _and_ gives each category its own _N. Assume we have hospital blood pressure data for individual patients over time. For each patient id at time t, I observe the patient's blood pressure. To calculate the total observations per patient: sort id date; by id: gen readings = _N; To calculate the number of each reading for a particular patient: sort id date; by id: gen number = _n; But there is much more that can be done with this subscripting. We'll see some examples in a second. For now, I return to the regression I suggested above. Note that STATA recognizes that 'war' does not vary for the observations in the data (missing values are dropped) and drops it owin to collinearity with the constant term. When STATA drops variables always make sure you know why, because it is a good sign in general that you made a mistake in constructing them. */ regress GLw_s trend trend2 war; /* is the quadratic significant? */ test trend trend2; /* So there is very little evidence of my technology theory. Perhaps the whale population is being depleted at the same time, and we are unable to see any effect. Now I ask the question regarding returns to scale: do additional ships tend to crowd at production? */ regress GLw_s GLships; /* Very little evidence of this. The number of ships sent doesn't decrease yields. Now I turn to the final question: do fishermen seem to respond to big yields in the prior year when deciding how many ships to send out in the current year? To answer this, we want to regress GLships at time t on GLw_s at time t-1. We can create such lags by virtue of the explicit subscripting introduced above. I control for the number of ships sent the previous year. */ sort year; gen lagGLw_s = GLw_s[_n-1]; gen lagGLships = GLships[_n-1]; regress GLships GLw_s lagGLships trend trend2; /* the R2 went up, but still nothing. Note that a regression on the change in the number of ships would be equivalent to constraining the coefficient of lagged ships to equal one. */ /* I might need the predicted values from the above regressions or the residuals. The respective commands to construct these variables are: */ predict GLs_hat; predict GLs_resid, resid; ***********************************************************************; set more 0; *save h:/14.33/whalenew, replace; log close;