Geog 258: Maps and GIS
March 3, 2006
Spatial data quality
Table of contents
Why bother with spatial data
quality?
Clarifying on terms related
to accuracy
Components of spatial data
quality
Assessing spatial data
quality
Why is spatial data quality an issue?
Examples
John bought a new land parcel
for $10,000. The price is determined by the area of a land parcel. The area is calculated
on a land parcel (paper) map. When he measures it using a GPS receiver, he
found that a land parcel map overstates the area of his land parcel.
Jennifer is doing a research
on land use change. According to her research, urban land uses have 20%
increased in the study area. She used land use/land cover data from USGS. This
time, she decides to conduct the same research in different data set. The
result is that urban land uses have 5% increased. Which is right? Which data
should she use?
Geospatial data is
increasingly used in diverse applications, influencing decisions related to
spatial information such as climate change, disaster management, business geographics and so on.
Erroneous data →
Information → Decision
Erroneous data will nullify any
rigorous analysis methods or effective presentation
The consequences of critical
decisions (e.g. disaster preparedness) based on inaccurate data?
Terms related to accuracy
Accuracy: closeness to the
truth; adherence to reality
Error: inverse of accuracy
Precision
1) Storage precision: amount
of details (e.g. number of decimal place)
2) Statistical precision:
related to variability among repeated measurement
Resolution: minimum distance
which can be recorded (e.g. pixel value)
Map scale: ratio of ground
distance to earth distance
Accuracy and (statistical)
precision
Components of error
1) Systematic error: whole
data set is “biased” (a uniform amount)
2) Random error: each
measurement has some inherent deviation
Accuracy is calculated from
total error; closeness of an observation to a true value
Precision is calculated from
random error
Accuracy and others
Higher storage precision ∞
higher accuracy
Higher resolution ∞
higher accuracy
Larger map scale ∞
higher accuracy
Questions: which can be
considered more accurate in general?
1) Spatial resolution of
satellite image: 10 meter or 1 km
2) Map scale of topographic
map: 1:1000 or 1:100,000?
Components of spatial data
quality
Another
critical sets of skills in GIS or map use is to pick the data that meets the
need of particular applications
Is
accuracy really enough for determining the fitness-for-use of data?
Data
can be perfectly accurate, but it is possible that data do not have information
required (completeness) (thus can’t answer intended questions) or can exhibit
internal contradiction (consistency). Highly accurate data (let’s say very high
resolution data) can also overkill the potential use of data
|
Space |
Time |
Attribute |
Accuracy |
|
|
|
Consistency |
|
|
|
Completeness |
|
|
|
where
Column:
components of geographic information
Row:
components of data quality
·
Accuracy: lack of discrepancy between measurement and values considered
true
·
Consistency: whether given components conform to logical rules
·
Completeness: whether what’s required is encoded in data (i.e. anything
missing)
Spatial
(positional) inaccuracy: how much location in the test data is deviated from
true location; Where do you obtain true value? Use
“well-defined points” (e.g. bench mark, geodetic control point); For lines and areas, error is a mixture of positional error
and generalization error
Temporal
inaccuracy: lack of agreement between encoded temporal coordinate and true
temporal coordinate; different from “datedness”
Attribute
inaccuracy: how much value in the test data is deviated from true values (e.g.
land use value in the map doesn’t correspond to real land use value on the
ground)
Spatial
inconsistency: e.g. digitized features indicating lake is not closed
Temporal
inconsistency: e.g. the constraint that only one event can occur at a
given location at a given time
Attribute
inconsistency: e.g. there is King in the State column (should be in the County
column)
Spatial
incompleteness: some features are missing
Temporal
incompleteness: some temporal events are missing
Attribute
incompleteness: some attributes are missing
One
common issue is related to the specification of reality
Accuracy
is measured relative to “true” value (e.g. land use classification: residential
is not 100% residential, but rather a mixture of residential and commercial; it
is necessary to define the requirements of residential)
Completeness
is measured relative to given definition of features (e.g. some lake features
can be said to be missing. What do you mean by lake by the way?)
See
Different approaches to
assessing data quality
1)
Top-down approach: conformance to standards (e.g. NMAS); if data products fail
the standards, data cannot be published
2)
Bottom-up approach: data producers inform users the content/quality of data and
data users determine fitness-for-use of the data (e.g. metadata)
3)
Interactive approach: market approach based on interactive feedbacks between
producers and users
How
do I obtain data quality information?
Geospatial
data (in public domain) comes along with metadata, and metadata contains data
quality information.