Textbook Notes
Important findings within the textbook.
Data Models
General
- entities: features we care about (roads, mountains, accident locations)
- spatial object: entity-object correspondence
- land cover: set of polygons
- characteristics of spatial detail subjectively chosen based on developer and needs
- spatial data model: objects in spatial database and their relationships
- representing and manipulating spatially referenced information
- thematic layers are cartographic objects
- coordinate systems:
- y-axis: “northing axis”; increases upward
- x-axis: “easting axis”; increases to the right
- spherical coordinates (non-Cartesian)
- larger area calculations on a 2-dimensional map can be distorted due to spherical nature of Earth
- spherical coordinate calculations increase accuracy and precision
- calculates via formulas using angles of rotation
- meridians: lines of equal longitude (east-west); think vertical
- parallels; lines of equal latitude; think horizontal
- convergence at the poles causes distortion due to the degree of latitude spanning a greater distance near there
- latitudes: usually preceded by N or S (based on equator)
- longitudes: usually preceded by E or W (based on Prime Meridian / Greenwhich Meridian)
- signed coordinates (instead of preceding cardinal direction)
- northern latitudes are positive
- southern latitudes are negative
- eastern longitudes are positive
- western longitudes are negative
Common Spatial Data Models
- discrete objects defined via vector data models
- uses discrete elements such as points, lines, and polygons to represent geometry of real-world entities
- discrete vector object examples: fields, roads, wetlands, cities
- points: small objects such as wells, buildings, ponds (again, what is subjectively chosen based on scale)
- lines: network objects such as roads and rivers, or to enclose polygons for are entities
- nodes: starting and ending points in a line
- vertices: intermediate points in a line or polygon
- conceptualization via grid cells is a raster data model
- splitting a map into rows and columns for “wall-to-wall” coverage
- depending on resolution, decent for representing continuous changes across a region
- raster-vector is possible, but can come at the cost of computation or accuracy prices
- Triangulated Irregular Network (TIN): combination of point, line, and area features (less common than raster and vector data models)
- attribute data: non-spatial characteristics
Vector Data Models (Discrete)
- uses groups of coordinates to represent entities
- lines can be a collection of smaller line segments or mathematical equations to represent geometric shape
- multi-part-features: a many-to-one relationship which is used to describe groupings (island, buildings, etc)
- see Forms of Feature Generalization on Generalization and the MAUP
- vector data has two common features:
- polygon inclusion:
- areas within a polygon that are different but still considered part of it
- imagine a polygon just representing a golf course, where the golf course also has fairways, roughs, ponds, pathways, etc.
- those other features are “inclusions”, but will often not be featured as their own polygon
- this is due to some set lower limit, known as a minimum mapping unit
- boundary generalization: incomplete representation of boundary locations
- sampling of the true curve, typically contains some deviation from the “true” curved boundary
- polygon inclusion:
Raster-Vector Model Conversion
Vector to Raster
- assign cell value for each position occupied by vector features
- points are usually assigned to the cell containing the point coordinate
- if cell size is too large, two or more points could fall in a single cell, leading to an ambiguous identification
- typically, cell size is chosen such that diagonal cell dimension is smaller than the distance between the two closest point features
- can also be used for lines (versus coordinate points), where a cell-line intersection assigns the cell
- can lead to wider approximation in adjacent cells taking on the same value
Raster to Vector
- identify set of continuous connected cells that form lines and apply smoothing algorithm
Maps, Data Entry, Editiing, and Output
- graticular lines: lines representing constant latitude and longitude
- lines may be curved depending on projection and scale
- accuracy vs. resolution (position accuracy vs. spatial resolution):
- accuracy: how close a value is to the truth
- resolution: smallest feature is individually resolved
- many image data have average resolution smaller than their average accuracy
- scale: distance on a screen or map to the distance on Earth (reality)
- can be unitless
- large vs. small:
- large scale maps: closer to reality (i.e. zoomed-in); more detailed; larger features; smaller area; trends towards less geometric error
- small scale maps: further from reality (i.e. zoomed-out); less detailed; smaller features; larger area; trends towards more geometric error
More on Data Models
Triangulated Irregular Networks (TINs)
- commonly used to represent terrain heights
- can be more complex than raster models, but often more efficient at storing terrain data
- higher variable terrain requires more points
Object Data Model
- often more logical: portrays a user’s view of the real world, entities of interest, and relationships between them
- example: pipe models
- a pipe model could contain a valve class with child classes of valves which inherit the valve attributes, but have unique attributes themselves
- there is no widely accepted definition of what constitutes an object model
3-D Data Models
- attribute data of heights (i.e. buildings)
- can use a voxel (volume element) which is essentially a cube version of a raster model
- reality mesh: 3-D TIN with image based texture surface
Multiple Models
- model decision comes down to choosing the best model for the task
- raster and TIN models are often called:
- DEMs: digital elevation models
- DTMs: digital terrain models
- analyzes are often more difficult using contour lines, thus are stored as raster or TIN models
Projections
- standard sets of projections have been developed from:
- lambert conformal conic
- transverse mercator
- state plane coordinate system (SPCS)
- states can contain multiple zones, and different projection parameters to limit distortion
- most states have adopted zones such that distortion is less than a single part per 10,000
- application from lambert conformal conic and transverse mercator
- the transverse mercator’s distortion increases with distance from a central meridian: good for states with long North-South axis
- the lambert conformal conic’s distortion increases with distance from a middle parallel: good for states with long East-West axis
- Universal Transverse Mercator Coordinate System (UTM)
- divides the Earth into zones which are 6 degree wide
- common for data or study areas spanning large regions
- Other common projections:
- maps good for visualization but not great for spatial analysis: tradeoff between continuous map surface and distortion
- distortion can be reduced via cutting / interrupting, depending on study area
- homosoline projections are useful when studying area breaks such as continents or oceans
- conversion between coordinate systems:
- inverse and forward projection equations
- careful when converting projections using different datums
- requires a datum transformation (a change in geographic coordinates)
- may not be an automatic change in GIS software
- requires a datum transformation (a change in geographic coordinates)
- coordinate system identifiers
- many societies/industries share specific coordinate systems (datums)
- oil
- standardized for north american GIS systems
- many societies/industries share specific coordinate systems (datums)
Topology
- node and line snapping:
- positional errors are inevitable when data are manually digitized
- undershoots: nodes and lines don’t quite reach line / other node
- overshoots: lines cross over existing nodes or lines
- positional errors are inevitable when data are manually digitized
- node / line snapping:
- automatically setting nearby points to have the same coordinates (becomes “magnetic”)
- based on snap tolerance or snap distance
- snap tolerance needs to be set carefully (should be smaller than the desired positional accuracy)
- automatically setting nearby points to have the same coordinates (becomes “magnetic”)
- line smoothing and thinning:
- smooth via spline functions to interpolate curves, which can also densify the set
- data can also be digitized with too many points, and using functions, redundant vertices can be removed without sacrificing spatial accuracy
- many methods use a perpendicular weed distance to accomplish this
- Lang Method: spanning line connects two nonadjacent vertices in a line, and any points closer than the “weed” distance are marked for removal
- many methods use a perpendicular weed distance to accomplish this
- scan digitizing: skeletonizing lines which are thick on the scan, and then are thinned (think raster square de-buffer)
- slivers: small gaps or overlaps along shared polygon boundaries
- creates spurious data nd doesn’t accurately represent adjacency
- slivers result in an additional entry in the attribute table, thus adds storage and processing overhead without increasing the value of the dataset
- if this is numerous, it can be a problem
- can be created while digitizing with too small of a tolerance, but topological digitizing accounts for this by building from an existing edge, thus no line is repeated twice
- features common to several layers:
- given multiple layers or images, the layers rarely line up precisely
- can be solved via redrafting, which is time and processing expensive
- preferably, a “master” boundary can be used via the highest accuracy composite of the avilable data sets
- might lead to only cosmic improvements, however
- given multiple layers or images, the layers rarely line up precisely