Geography 391  - Work Exercise 1 – Intro to Minitab

 

This exercise will introduce you to the Minitab statistical analysis package and some of its capabilities, and will allow you to get familiar with initiating the statistical analyses of random variables that have a spatial component to them (i.e. they correspond to discrete spatial entities and can be mapped).  You'll need:

 

 

Intro

 

This activity is intended to give you an introduction to the Minitab Statistical Analysis System.  In general I will expect that you will use the resources that have been given to solve simple problems.  Rather than explain the functioning of simple commands and modules I have provided you with a set of tools that will allow you to teach yourself (with our help) how to use Minitab 15.

 

Data Files

 

For our activities we will begin working with a simple data set derived from the 2000 census Standard Files (SF) tables.  The principal table to be used will be the data table titled tgr17031sf1grp.dbf in the zip folder cook census 2000.  Download and unzip this folder.  This table can be accessed in Minitab by using the file-open worksheet menu selection and selecting the ‘dbf’ filetype, then navigating to your folder.  This file contains raw values from census 2000 for over 4000 census subdivisions (block groups) in Cook County, Illinois.  There are also some median values (well-labeled) and the census-creatively-labeled values (e.g. MARHH_CHD = married households with children).  There are also some useful normalizing variables (e.g. POP2000, HOUSEHOLDS, and HSE_UNITS).

 

There are also some other useful files in here including documentation for the sf1 files.

 

Tasks

 

The work exercise will take the form of several “tasks” that I want you to complete.  Each task will typically have some form of “output”.  The type of output I expect will normally be obvious from the task definition.  I would appreciate you completing all tasks, formatting your output into a well-organized single MS Word document, and turning it in to me on the due date.  Late work may be penalized fairly depending on circumstances.

 

Task I

 

Generate descriptive statistics for the “Household” variables in Cook County (these are columns 28 through 35).  Make sure to include all commonly used descriptive statistics, including at least mean, SE mean, StDev, quartiles, minimum, median, and maximum.  Copy the table that is displayed into your Word document and comment on any distinctive, unusual, or odd characteristics of these variables in a brief paragraph.

 

Task II

 

Generate histograms for each of these variables (from task I above) – use the option to include multiple graphs in a single panel - with separate scales for each graph.  Generate the histograms “with fit” that displays a normal curve superimposed on the data distribution.  Copy the histograms panel into your word document and comment on any distinctive, unusual, or odd characteristics revealed by these graphs in a brief paragraph.

 

Task III

  1. Perform a “matrix plot” of all of the age variables (not including the median age).  These include columns 17 through 24.  Use the "Matrix of Plots" option "Simple".  Explore the "Matrix Options" to set it up with the plots appearing in the lower left part of the matrix and the variable labels as "Boundary".  In the ‘Data View...’ option set the type of regression as linear and fit the intercept.  Copy the graphic into your word document and comment on which age categories seem to be more or less related (correlated) in a brief paragraph.  (Remember the variables down the diagonal in the middle are perfectly correlated).
  2. Select two pairs of age variables and perform a simple, single (i.e. one y variable and one x variable) linear regression on them.  Select one pair that you think are an example of highly correlated age groups, select the second pair as an example of age groups with a low correlation.  Report the estimated regression equation and the R-square for each of these analyses.  In a brief paragraph give a hypothesized explanation of the results.

 

These tasks should be completed and returned to me by the start of class Tuesday, April 20.