Geog 258: Maps and GIS

March 3, 2006

Spatial data quality

 

Reading: Ch10

 


Table of contents

 

Why bother with spatial data quality?

Clarifying on terms related to accuracy

Components of spatial data quality

Assessing spatial data quality

 


Why is spatial data quality an issue?

 

Examples

 

John bought a new land parcel for $10,000. The price is determined by the area of a land parcel. The area is calculated on a land parcel (paper) map. When he measures it using a GPS receiver, he found that a land parcel map overstates the area of his land parcel.

 

Jennifer is doing a research on land use change. According to her research, urban land uses have 20% increased in the study area. She used land use/land cover data from USGS. This time, she decides to conduct the same research in different data set. The result is that urban land uses have 5% increased. Which is right? Which data should she use?

 

Geospatial data is increasingly used in diverse applications, influencing decisions related to spatial information such as climate change, disaster management, business geographics and so on.

 

Erroneous data → Information → Decision

Erroneous data will nullify any rigorous analysis methods or effective presentation

The consequences of critical decisions (e.g. disaster preparedness) based on inaccurate data?

 


Terms related to accuracy

 

Accuracy: closeness to the truth; adherence to reality

Error: inverse of accuracy

 

Precision

1) Storage precision: amount of details (e.g. number of decimal place)

2) Statistical precision: related to variability among repeated measurement

 

Resolution: minimum distance which can be recorded (e.g. pixel value)

Map scale: ratio of ground distance to earth distance

 

Accuracy and (statistical) precision

 

Components of error

1) Systematic error: whole data set is “biased” (a uniform amount)

2) Random error: each measurement has some inherent deviation 

 

 

Accuracy is calculated from total error; closeness of an observation to a true value

Precision is calculated from random error

 

Accuracy and others

 

Higher storage precision ∞ higher accuracy

Higher resolution ∞ higher accuracy

Larger map scale ∞ higher accuracy

 

Questions: which can be considered more accurate in general?

1) Spatial resolution of satellite image: 10 meter or 1 km

2) Map scale of topographic map: 1:1000 or 1:100,000?

 


Components of spatial data quality

 

Another critical sets of skills in GIS or map use is to pick the data that meets the need of particular applications

 

Is accuracy really enough for determining the fitness-for-use of data?

Data can be perfectly accurate, but it is possible that data do not have information required (completeness) (thus can’t answer intended questions) or can exhibit internal contradiction (consistency). Highly accurate data (let’s say very high resolution data) can also overkill the potential use of data

 

 

Space

Time

Attribute

Accuracy

 

 

 

Consistency

 

 

 

Completeness

 

 

 

 

where

Column: components of geographic information

Row: components of data quality

 

·                   Accuracy: lack of discrepancy between measurement and values considered true

·                   Consistency: whether given components conform to logical rules

·                   Completeness: whether what’s required is encoded in data (i.e. anything missing)

 

Spatial (positional) inaccuracy: how much location in the test data is deviated from true location; Where do you obtain true value? Use “well-defined points” (e.g. bench mark, geodetic control point); For lines and areas, error is a mixture of positional error and generalization error

Temporal inaccuracy: lack of agreement between encoded temporal coordinate and true temporal coordinate; different from “datedness”

Attribute inaccuracy: how much value in the test data is deviated from true values (e.g. land use value in the map doesn’t correspond to real land use value on the ground)

 

Spatial inconsistency: e.g. digitized features indicating lake is not closed

Temporal inconsistency:  e.g. the constraint that only one event can occur at a given location at a given time

Attribute inconsistency: e.g. there is King in the State column (should be in the County column)

 

Spatial incompleteness: some features are missing

Temporal incompleteness: some temporal events are missing

Attribute incompleteness: some attributes are missing

 

One common issue is related to the specification of reality

 

Accuracy is measured relative to “true” value (e.g. land use classification: residential is not 100% residential, but rather a mixture of residential and commercial; it is necessary to define the requirements of residential)

 

Completeness is measured relative to given definition of features (e.g. some lake features can be said to be missing. What do you mean by lake by the way?)

 

See Box 10.2 (p. 195)

 


Different approaches to assessing data quality

 

1) Top-down approach: conformance to standards (e.g. NMAS); if data products fail the standards, data cannot be published

2) Bottom-up approach: data producers inform users the content/quality of data and data users determine fitness-for-use of the data (e.g. metadata)

3) Interactive approach: market approach based on interactive feedbacks between producers and users

 

How do I obtain data quality information?

 

Geospatial data (in public domain) comes along with metadata, and metadata contains data quality information.