#delimit;

set more 1;
clear;
cap log close;
log using h:/1433/whales.log, replace;
set memory 10m;
use h:/1433/whales.dta, replace;

***********************************************************************;
*
* WHALES.DO
*
* An introduction to stata.
*
* edited - jmwilder 9/18/01
*
***********************************************************************;

/* describing the data */

d;

/* of note: the variable names are terrible.  We'll rename them to be a 
little more descriptive.  */

rename gs GLships;
rename gw GLwhales;
rename ds DSships;
rename dw DSwhales;

/* summarizing the data */

summ;
summ if war != 1;
bysort war: summ; /* equivalently, sort war; by war: summ; */

/* or even better, we can introduce the tabulate command */

tab war;
tab war, missing;
tab year if war == 1; /* important! == vs. = */
tab war, summ(GLships);

/* alternatively, if we are interested in looking at the data itself.  We 
can use the list command (which I use only for the smallest of tasks because 
it is dominated by the 'browse' command.  However, output from that 
command can not be sent to a logfile. */

list year GLships GLwhales if war == 1;

/* Histograms and scatterplots can also be useful.  bin(20) gives the 
number of bars in the histogram in the first line.  The second line 
gives a scatterplot. */

graph GLships, bin(20) saving(gph1,replace); 
graph GLships GLwhales, saving(gph2, replace);

************************ Empirical Analysis ***************************;

/* Having taken a look at the data, I turn to the questions I wanted to 
ask in the first place:

1. Does technology improve over time?
2. What are the returns to scale?  Does sending an additional ship to a 
fishing area crowd the others and reduce their yields?
3. Do fishermen respond with particular zeal to a good year by sending 
out a large fleet of ships? */

/* To start to get at the first question, I want to generate the average 
yield per ship in each year. */

gen GLw_s = GLwhales/GLships;
gen DSw_s = DSwhales/DSships;

/* now I look at how the two series trend over time */

graph GLw_s DSw_s year, connect(ll) saving(gph3,replace);

/* that doesn't look good because STATA had no idea that the series was 
naturally ordered along the x axis.  we'll need to sort first. */

sort year;
graph GLw_s DSw_s year, connect(ll) saving(gph4,replace);

/* there doesn't seem to be much evidence of this.  But let's do some 
regressions just to get a feel for how to implement the procedures.  
I'll focus my attention on Greenland. */

/* Let's say I want to fit a quadratic, then I'll need a linear trend 
and a squared term.  I'll create each. */

sort year;
gen trend = _n;
gen trend2 = trend*trend;

/* this wasn't exactly necessary because we already have a linear time 
trend with the variable year.  But I wanted to use it to introduce 
explicit subscripting via '_n' because it is among the most useful tools 
in STATA.  Using it creatively can avoid looping as one might in C (this 
is much faster.)  

'_n' represents the observation number.  This is not something intrinsic 
to a particular observation but depends on how the data are sorted.  _N 
gives the total number of observations.  Use of the 'by' command resets 
_n for each category _and_ gives each category its own _N.  Assume we 
have hospital blood pressure data for individual patients over time.  
For each patient id at time t, I observe the patient's blood pressure.

To calculate the total observations per patient:

sort id date;
by id: gen readings = _N;

To calculate the number of each reading for a particular patient:

sort id date;
by id: gen number = _n;

But there is much more that can be done with this subscripting.  We'll 
see some examples in a second.  For now, I return to the regression I 
suggested above.  Note that STATA recognizes that 'war' does not vary 
for the observations in the data (missing values are dropped) and drops 
it owin to collinearity with the constant term.  When STATA drops 
variables always make sure you know why, because it is a good sign in 
general that you made a mistake in constructing them. */

regress GLw_s trend trend2 war;

/* is the quadratic significant? */

test trend trend2;

/* So there is very little evidence of my technology theory.  Perhaps 
the whale population is being depleted at the same time, and we are 
unable to see any effect.  Now I ask the question regarding returns to 
scale: do additional ships tend to crowd at production?  */

regress GLw_s GLships;

/* Very little evidence of this.  The number of ships sent doesn't 
decrease yields.  Now I turn to the final question: do fishermen seem 
to respond to big yields in the prior year when deciding how many 
ships to send out in the current year?  To answer this, we want to 
regress GLships at time t on GLw_s at time t-1.  We can create such lags
by virtue of the explicit subscripting introduced above.  I control for 
the number of ships sent the previous year. */

sort year;
gen lagGLw_s = GLw_s[_n-1];
gen lagGLships = GLships[_n-1];
regress GLships GLw_s lagGLships trend trend2;

/* the R2 went up, but still nothing.  Note that a regression on the 
change in the number of ships would be equivalent to constraining the 
coefficient of lagged ships to equal one. */

/* I might need the predicted values from the above regressions or the 
residuals.  The respective commands to construct these variables are: */

predict GLs_hat;
predict GLs_resid, resid;

***********************************************************************;

set more 0;
*save h:/14.33/whalenew, replace;

log close;