Geography 243 - Lab 4
Supervised Classification, Chicagoland
We have been studying the spectral responses of different types of land cover and have seen that the signatures of each of these cover types were unique. Land covers, then, may be identified and differentiated from each other by their unique spectral response patterns. This is the logic behind image classification. Many kinds of maps, including land-cover, soils, and bathymetric maps, may be developed from the classification of remotely sensed imagery. For this activity we will work with a set of images in this zip file that cover the greater Chicago region. It is a bit large so be patient on the download. All of the previous warnings about copying and saving files apply in this lab activity - make sure to save early and often. The relevant part of the text can be found from pages 545-568.
There are two main methods of image classification: supervised and unsupervised. With supervised classification, the user develops the spectral signatures of known categories such as urban and forest, and then the software assigns each pixel in the image to the cover type to which its signature is most similar. With unsupervised classification, the software groups pixels into categories of similar signatures, and then the user identifies what cover types those categories represent.
The steps for supervised classification may be summarized as follows:
Take your time completing this lab activity, don't rush... we'll have plenty of time to finish this over the next two weeks. It's important that you get this. Respond to questions 1-5 located through the lab activity in a separate MS Word document. When you have finished this print one of your three classifications and hand this in with the answers to the questions.
We will begin by creating the training sites. The area we will classify is a windowed area that covers most of the Chicago region. In the Lab 4 folder there is a six (6) channel set of images from the Landsat 7 ETM. The set includes channels 1-5 and 7. The date of these images is September 11, 2001. This dataset will produce an analysis that the city could use to update their land use maps, or that will present a snapshot of the city as it existed at this moment.
The training sites created in this exercise will be based on your knowledge of land cover types across the metropolis. This is your city and you know locations shown in the images that typify the types of land cover we are trying to classify. A good tool to use during this phase is Google Earth. (Normally you would precede classification with fieldwork, and you will want to consider fieldwork if you choose this technique for your final project).
Each known land-cover type will be assigned a unique integer identifier, and one or more training sites will be identified for each.
A) Below you will find a list of the 13 land cover types identified in the field, along with a unique integer identifier that will signify each cover type. While the training sites can be digitized in any order, you may not skip any number in the series, so if you have 13 different land-cover classes, as we do here, your identifiers must be 1 to 13.
The suggested order (to fit the IDRISI qualitative default palette with automatic display) is:
1 -water 1(the cleanest water you can find in Lake Michigan)
2-water 2 (moderately turbid water in Lake Michigan)
3-water 3 (very turbid water in Lake Michigan)
(on 1-3 above work on the assumption that darker, less reflective water is cleaner - can you think what this assumption might be based on?)
4-low density residential (the outer suburbs)
5-medium density residential (inner suburbs and peripheral residential areas in the city)
6-high density residential (gold coast, south loop, west loop)
(on 4-6 above the key word is density - in the suburbs that means lot size - as we move into the inner city we really are looking at the number of stories in residential buildings)
7-Light Industrial (e.g. Elk Grove Village)
8-heavy industrial (e.g. Along the Stevenson)
9-Commercial/ Retail (Old Orchard perhaps? Woodfield Mall?))
10-urban grassland (find a golf course)
11-Forest Preserve (easy to find)
12-agricultural1 (a distinctive Ag land cover)
13-agricultural2 (another one different from the one above)
B) Display the image called CHI_P023R031_7T20010911_Z16_NN10 using the Grey 256 palette and autoscale the image. You can actually use any of the images for this process - you may even want to create a color composite that will be used for selecting training sites only. Use the onscreen digitizing feature of IDRISI for Windows to digitize polygons around your training sites. Start by digitizing a polygon that is representative of water (probably in Lake Michigan). This is done as follows:
Window in close around the section of Lake Michigan in the right side of the image. Then select the on-screen digitizing icon:
![]()
Create a vector file called TRSITES, choose to create polygons, and enter the feature identifier (1) you chose for water 1.
Your cursor will now appear as the digitize icon when in the image. Select the area that represents clear water. Move the cursor to a starting point for the boundary of your training site and press the left mouse button. Then move the cursor to the next point along the boundary and press the left mouse button again (you will see the boundary line start to form). The training site polygon should enclose a homogeneous area of the cover type, so avoid including the shoreline in this shallow water polygon. Continue digitizing until just before you have finished the boundary and then press the right mouse button. This will finish the digitizing for that training site and ensure that the boundary closes perfectly. (The finished polygon is displayed with a black line, which may be difficult to see in the deep water areas of the image.) If you made a mistake and do not want to save that polygon, click the following icon:
This will delete the last training site you digitized. It is possible to do this but it can be sometimes frustrating. Best results are achieved when you are able to work through the digitizing of the training sites without interruption. Window back out, then window in on your next training site area, referring to the online image. Select the on screen digitizing icon again. Enter an identifier for that new site. Keep the same identifier if you want to digitize another polygon around the same cover type. Otherwise, enter a new identifier.
Any number of training sites, or polygons with the same ID, may be created for each cover type. In total, however, there should be an adequate sample of pixels for each cover type for statistical characterization. A general rule of thumb is that the number of pixels in each training set (i.e., all the training sites for a single land cover class) should not be less than ten times the number of bands. Thus, in this exercise where we will use 6 bands, we should aim to have no less than 60 pixels per training site. Using larger sites than this however can be beneficial, as long as the sites are relatively homogeneous.
C) Continue until you have training sites digitized for each of the 13 different land cover classes. Then select the following icon:
This will save the vector file TRSITES and then redisplay it.
After you have a training site vector file you are ready for the third step in the process, which is to create the signature files. Signature files contain statistical information about the reflectance values of the pixels within each training set.
D) Run MAKESIG from the Analysis/Image Processing/Signature Development menu. Choose Vector as the training site file type and enter TRSITES as the file defining the training sites. Indicate that 6 bands of imagery will be processed, and six input name boxes will automatically appear. Enter the names of the bands that you will analyze: CHI_P023R031_7T20010911_Z16_NN10 (Blue band), CHI_P023R031_7T20010911_Z16_NN20 (Green band), CHI_P023R031_7T20010911_Z16_NN30 (Red band), CHI_P023R031_7T20010911_Z16_NN40 (IR band), CHI_P023R031_7T20010911_Z16_NN50 (IR band) and CHI_P023R031_7T20010911_Z16_NN70 (IR band). Select "enter signature file names" and then enter the names for each of the land-cover categories in the input boxes with the proper identifier labels, in ascending order. These land-cover class names will be used as signature file names, so they must follow some conventions (i.e., some characters don't work, e.g. "/" no spaces.. 8 characters or less). Make sure the "create signature group file" box is checked.
E) When MAKESIG is finished, use the Idrisi File Explorer
to view signature files and
check that all of your signatures were created. If you forgot any, repeat the process
described above to create a new training site vector file (for the forgotten cover type
only) and run MAKESIG again.
F) Run SIGCOMP from the Analysis/Image Processing/Signature Development menu. Choose to view 13 signature files, enter the names of all the signature files, (or just enter the handy signature group file name... the same as your signature file) and choose to display their means.
1 .Of the seven bands of imagery, which bands appear to differentiate vegetative covers the best?
G) Run SIGCOMP again, but this time choose to view only 2 signatures and enter your names for the WATER1 and the FOREST signature files. Indicate that you want to view their maximum, minimum and mean values. Notice that the reflectance values of these signatures overlap to varying degrees across the bands. This is a source of spectral concision between cover types.
2. Which of the two signatures has the most variation in reflectance values (widest range of values) in all the bands? Why?
H) We can learn more about these signature files, using HISTO
. Run HISTO from the toolbar
and select the input file type as a "signature file". Select the WATER1
signature file and look for evidence of a normal (bell-shaped) curve, which
would suggest that a homogeneous set of pixels was selected to represent deep water. If
the signature is strongly bimodal (having two peaks), this would suggest that two cover
types, with unique spectral response patterns, have been included in the training site.
Extreme outliers that do not seem to belong to the main curve also suggest that the
training site is not homogeneous.
For the SHALLOW WATER signature, view the histograms of each of the other bands in turn. Choose the bands by pressing the right arrow under the band name. You may compare several histograms at the same time and then close each histogram window as you are finished looking at it. Notice how the variability in reflectance values changes throughout the bands. Now choose to use the WATER signature file with HISTO in the same fashion and view the histogram for CHI_P023R031_7T20010911_Z16_NN40.
3. What is the shape of this histogram?
I) Now that we have signature files for all of our categories, we are ready for the last step in the classification process--to classify the images based on these signature files. Each pixel in the study area has a value in each of the6 bands of imagery. These values form a unique signature which can be compared to each of the signature files we just created. The pixel is then assigned to the cover type that has the most similar signature. There are several different statistical techniques that can be used to evaluate how similar signatures are to each other. These statistical techniques are called classifiers. We will create classified images with three of the hard classifiers that are available in IDRISI for Windows.
The first classifier we will use is a minimum distance to means classifier. This classifier calculates the distance of a pixel's reflectance values to the spectral mean of each signature file, and then assigns the pixel to the category with the closest mean. There are two choices on how to calculate distance with this classifier. The first calculates the Euclidean, or raw, distance from the pixel's reflectance values to each category's spectral mean. For an illustration of how this works see figure 7.43 in Lillesand and Kiefer.
J) All of the classifiers we will explore in this exercise may be found in the Analysis/Image Processing/Hard Classifiers menu.
Run MINDIST (the minimum distance to means classifier) and indicate that you will use a group file to specify signatures. Enter its name, SIGS, and the signature names will appear in the corresponding input boxes in the order specified in the group file. Choose the raw distance option, and calculate infinite distances. Call the output file RAW and enter a descriptive title for the new image. Continue to the next dialog box, and maintain all bands selected for analysis. Examine the resulting land cover image.
We will try the minimum distance to means classifier again, but this time using the second kind of distance calculation -- normalized distances. In this case, the classifier will evaluate the standard deviations of reflectance values about the mean -- creating contours of standard deviations. It then assigns a given pixel to the closest category in terms of standard deviations (Z-scores).
K) To illustrate this method, run MINDIST again. Choose the normalized standard deviation units, and call the result MRNSTD. Enter a descriptive title for the image being created.
4. Compare the two results. How would you describe the effect of standardizing the distances with the minimum distance to means classifier?
The next classifier we will use is the maximum likelihood classifier. Here, the distribution of reflectance values in a training site is described by a probability density function, developed on the basis of Bayesian statistics. The theory behind this classifier is shown in figures 7.46 and 7.47 in Lillesand and Kiefer. This classifier evaluates the probability that a given pixel will belong to a category and classifies the pixel to the category with the highest probability of membership.
L) Run MAXLIKE. Indicate the name of the signature group file (SIGS). Choose to classify all pixels, and elect to give equal weight (i.e., to use equal probabilities) to each class. Call the result MAX. Press the Next button and enter a descriptive title for the output image. Press Continue again and choose to retain all bands and press OK. Maximum likelihood is the slowest of the techniques, but if the training sites are good, it tends to be the most accurate.
Finally, we will look at the parallelepiped classifier. This classifier creates 'boxes' using standard deviation units (Z-scores) or minimum and maximum reflectance values within the training sites. The theory behind this is described in figures 7.44 and 7.45 in Lillesand and Kiefer. If a given pixel falls within a signature's 'box,' or band space, it is assigned to that category. This is the fastest of classifiers and the option with Min/Max values was used as a quick-look classifier when computer speed was still an issue. It is prone, however, to incorrect classifications. Due to the correlation of information in the spectral bands, pixels tend to cluster into cigar or zeppelin shaped clouds. The 'boxes' can become too encompassing and capture pixels that probably should be assigned to other categories
M) Run PIPED and enter the name of the signature group file to input signature filenames. Choose the Min/Max option and call the image PIPEORIG, with a descriptive title. Press the Next button and retain all bands. Note the zero-value pixels in the output image. These pixels did not fit within the Min/Max range of any training set and were thus assigned a category of zero.
N) Run PIPED again, with the default Z-score option, using the signature group file SIGS again and call this new image PIPEDST. Enter a descriptive title and retain all bands.
5. How much did using standard deviations instead of minimum and maximum values affect the parallelepiped classification?
O) Compare each of the classifications you created: MAX, RAW, MFNSTD, PIPEORIG, and PIPEDST. To do this, display all of them with the Qualitative 256 palette. You may wish to use a smaller expansion factor to fit them all on the screen.
As a final note, consider the following. If your training sites are very good, the Maximum Likelihood classifier should produce the best result. However, when training sites are not well defined, it often performs very poorly. In these cases, the Minimum Distance classifier with the standardized distances option often performs much better. The Parallelepiped classifier with the standard deviation option also performs rather well and is the fastest of the considered classifiers.
When you have answered the 5 questions above, print one of your classifications formatted as a C-size document using the Designjet plotter and submit it with your word document. Aces... This activity is due at the start of class Thursday, May 21.
This lab activity incorporated parts of an early version of the Idrisi student tutorial.