Textbook Notes

Important findings within the textbook.

Data Models

General

  • entities: features we care about (roads, mountains, accident locations)
  • spatial object: entity-object correspondence
    • land cover: set of polygons
  • characteristics of spatial detail subjectively chosen based on developer and needs
  • spatial data model: objects in spatial database and their relationships
    • representing and manipulating spatially referenced information
  • thematic layers are cartographic objects
  • coordinate systems:
    • y-axis: “northing axis”; increases upward
    • x-axis: “easting axis”; increases to the right
  • spherical coordinates (non-Cartesian)
    • larger area calculations on a 2-dimensional map can be distorted due to spherical nature of Earth
    • spherical coordinate calculations increase accuracy and precision
    • calculates via formulas using angles of rotation
    • meridians: lines of equal longitude (east-west); think vertical
    • parallels; lines of equal latitude; think horizontal
    • convergence at the poles causes distortion due to the degree of latitude spanning a greater distance near there
    • latitudes: usually preceded by N or S (based on equator)
    • longitudes: usually preceded by E or W (based on Prime Meridian / Greenwhich Meridian)
    • signed coordinates (instead of preceding cardinal direction)
      • northern latitudes are positive
      • southern latitudes are negative
      • eastern longitudes are positive
      • western longitudes are negative

Common Spatial Data Models

  • discrete objects defined via vector data models
    • uses discrete elements such as points, lines, and polygons to represent geometry of real-world entities
    • discrete vector object examples: fields, roads, wetlands, cities
    • points: small objects such as wells, buildings, ponds (again, what is subjectively chosen based on scale)
    • lines: network objects such as roads and rivers, or to enclose polygons for are entities
      • nodes: starting and ending points in a line
      • vertices: intermediate points in a line or polygon
  • conceptualization via grid cells is a raster data model
    • splitting a map into rows and columns for “wall-to-wall” coverage
    • depending on resolution, decent for representing continuous changes across a region
  • raster-vector is possible, but can come at the cost of computation or accuracy prices
  • Triangulated Irregular Network (TIN): combination of point, line, and area features (less common than raster and vector data models)
  • attribute data: non-spatial characteristics

Vector Data Models (Discrete)

  • uses groups of coordinates to represent entities
  • lines can be a collection of smaller line segments or mathematical equations to represent geometric shape
  • multi-part-features: a many-to-one relationship which is used to describe groupings (island, buildings, etc)
  • vector data has two common features:
    • polygon inclusion:
      • areas within a polygon that are different but still considered part of it
      • imagine a polygon just representing a golf course, where the golf course also has fairways, roughs, ponds, pathways, etc.
      • those other features are “inclusions”, but will often not be featured as their own polygon
      • this is due to some set lower limit, known as a minimum mapping unit
    • boundary generalization: incomplete representation of boundary locations
      • sampling of the true curve, typically contains some deviation from the “true” curved boundary

Raster-Vector Model Conversion

Vector to Raster

  • assign cell value for each position occupied by vector features
  • points are usually assigned to the cell containing the point coordinate
  • if cell size is too large, two or more points could fall in a single cell, leading to an ambiguous identification
    • typically, cell size is chosen such that diagonal cell dimension is smaller than the distance between the two closest point features
  • can also be used for lines (versus coordinate points), where a cell-line intersection assigns the cell
    • can lead to wider approximation in adjacent cells taking on the same value

Raster to Vector

  • identify set of continuous connected cells that form lines and apply smoothing algorithm

Maps, Data Entry, Editiing, and Output

  • graticular lines: lines representing constant latitude and longitude
    • lines may be curved depending on projection and scale
  • accuracy vs. resolution (position accuracy vs. spatial resolution):
    • accuracy: how close a value is to the truth
    • resolution: smallest feature is individually resolved
    • many image data have average resolution smaller than their average accuracy
  • scale: distance on a screen or map to the distance on Earth (reality)
    • can be unitless
    • large vs. small:
      • large scale maps: closer to reality (i.e. zoomed-in); more detailed; larger features; smaller area; trends towards less geometric error
      • small scale maps: further from reality (i.e. zoomed-out); less detailed; smaller features; larger area; trends towards more geometric error

More on Data Models

Triangulated Irregular Networks (TINs)

  • commonly used to represent terrain heights
  • can be more complex than raster models, but often more efficient at storing terrain data
  • higher variable terrain requires more points

Object Data Model

  • often more logical: portrays a user’s view of the real world, entities of interest, and relationships between them
  • example: pipe models
    • a pipe model could contain a valve class with child classes of valves which inherit the valve attributes, but have unique attributes themselves
  • there is no widely accepted definition of what constitutes an object model

3-D Data Models

  • attribute data of heights (i.e. buildings)
  • can use a voxel (volume element) which is essentially a cube version of a raster model
  • reality mesh: 3-D TIN with image based texture surface

Multiple Models

  • model decision comes down to choosing the best model for the task
  • raster and TIN models are often called:
    • DEMs: digital elevation models
    • DTMs: digital terrain models
    • analyzes are often more difficult using contour lines, thus are stored as raster or TIN models

Projections

  • standard sets of projections have been developed from:
    • lambert conformal conic
    • transverse mercator
  • state plane coordinate system (SPCS)
    • states can contain multiple zones, and different projection parameters to limit distortion
    • most states have adopted zones such that distortion is less than a single part per 10,000
    • application from lambert conformal conic and transverse mercator
      • the transverse mercator’s distortion increases with distance from a central meridian: good for states with long North-South axis
      • the lambert conformal conic’s distortion increases with distance from a middle parallel: good for states with long East-West axis
  • Universal Transverse Mercator Coordinate System (UTM)
    • divides the Earth into zones which are 6 degree wide
    • common for data or study areas spanning large regions
  • Other common projections:
    • maps good for visualization but not great for spatial analysis: tradeoff between continuous map surface and distortion
    • distortion can be reduced via cutting / interrupting, depending on study area
      • homosoline projections are useful when studying area breaks such as continents or oceans
  • conversion between coordinate systems:
    • inverse and forward projection equations
    • careful when converting projections using different datums
      • requires a datum transformation (a change in geographic coordinates)
        • may not be an automatic change in GIS software
  • coordinate system identifiers
    • many societies/industries share specific coordinate systems (datums)
      • oil
      • standardized for north american GIS systems

Topology

  • node and line snapping:
    • positional errors are inevitable when data are manually digitized
      • undershoots: nodes and lines don’t quite reach line / other node
      • overshoots: lines cross over existing nodes or lines
  • node / line snapping:
    • automatically setting nearby points to have the same coordinates (becomes “magnetic”)
      • based on snap tolerance or snap distance
    • snap tolerance needs to be set carefully (should be smaller than the desired positional accuracy)
  • line smoothing and thinning:
    • smooth via spline functions to interpolate curves, which can also densify the set
    • data can also be digitized with too many points, and using functions, redundant vertices can be removed without sacrificing spatial accuracy
      • many methods use a perpendicular weed distance to accomplish this
        • Lang Method: spanning line connects two nonadjacent vertices in a line, and any points closer than the “weed” distance are marked for removal
  • scan digitizing: skeletonizing lines which are thick on the scan, and then are thinned (think raster square de-buffer)
  • slivers: small gaps or overlaps along shared polygon boundaries
    • creates spurious data nd doesn’t accurately represent adjacency
    • slivers result in an additional entry in the attribute table, thus adds storage and processing overhead without increasing the value of the dataset
      • if this is numerous, it can be a problem
    • can be created while digitizing with too small of a tolerance, but topological digitizing accounts for this by building from an existing edge, thus no line is repeated twice
  • features common to several layers:
    • given multiple layers or images, the layers rarely line up precisely
      • can be solved via redrafting, which is time and processing expensive
      • preferably, a “master” boundary can be used via the highest accuracy composite of the avilable data sets
        • might lead to only cosmic improvements, however