Raster Data
This section deals with raster data and its operations.
Raster Data Introduction
Raster data is composed of squares, called grid cells. These are analogous to to pixels in remote sensing images and computer graphics.
Resolution
The distance that one side of a grid cell represents on the ground.
Fine vs. Coarse <-> High vs. Low: finer resolution data have smaller grid cells and high precision with greater cost in data storage than coarser data.
The Mixed Pixel Problem and Storage / Resolution Tradeoff
- Want the smallest detectable feature (i.e. phenomenon of interest) to be twice as large as the resolution in areal units.
- Finer resolution requires more data storage: decreasing the cell size by one half quadruples the resolution, thus directly quadruples the storage space required.
Continuous vs. Discrete Features
- Continuous Features: numerical study value in each cell
- Discrete Features: single number representation in each cell
Data Type and Bit Depth
Integers, floating points, binary, etc. require different storage. The values are then assigned to each grid cell.
Bit Basics
- 1 bit: 2 values: [0: 1]: \(2^1\)
- 2 bit: 4 values: [0: 3]: \(2^2\)
- 4 bit: 16 values: [0: 15]: \(2^4\)
- n bit: \(2^n\) values
Signed and Unsigned Values
- Unsigned 8-bit: [0, 255]
- Signed 8-bit: [-128, 127]
Floating Point Values
Uses large number values ranges!
No Data Values
2-bits aren’t exactly used in GIS, as the NO DATA type has a reserved value, usually a large number.
Converting Between Bits
Converting between bit-types doesn’t have a direct estimate like the storage / resolution tradeoff does.
Encoding and Conversion Effects
Encoding real world entities in a raster data model has implications. And converting between vector to raster and vice-versa does as well.
Presence / Absence Coding Method
- 1: object is present
- 0: object is not present
- Uses the anywhere rule, which has an implication of the mixed pixel problem
Any Cell vs. Near Cell Rules
- Generalization and approximation with implications of thickness and connectivity.
Vector Lines to Raster
- Should be done in different layers (i.e. different road types)
Vector Polygons to Raster
- Boundaries: can lose precision
- Efficiency: can have inefficient data storage
Attribute Tables
- each cell will have a count / frequency
- “carefully” produce estimates, such as area, due to the implications!
Attribute Tables for Discrete vs. Continuous
- Discrete: example is landcover where a cell value has a direct analogy
- Continuous: example is elevation layer, where each value has a direct analogy, WHICH DOESN’T MAKE SENSE!
Vector Point Values to Raster
- Interpolation: measured points of continuous phenomenon to make estimates at non-measured location through interpolation.
Raster to Vector
- Essentially does cell-center point and apply smoothing and connectivity rules
- Have to consider precision and shape implications.
Raster Processing
Registration, snapping, and resampling are considered in this subsection.
- Geo-registration and co-registration is problematic when pixels don’t line up, thus the data structure needs to be changed when comparing raster layers through resampling.
Resampling - Nearest Neighbors
Takes the cell value of the nearest neighbor.
Resampling - Bilinear Interpolation
- Distance weighted averaging
- Two-directional
- Simplest way to estimate heights
Resampling - Majority Rule
- gives majority of label cells as the new aggregation
Resampling Implications
- larger jumps should use careful consideration
- think about generalization (over-generalization) during the process
- DISCRETE VS CONTINUOUS:
- bilinear resampling is better for continuous data
- don’t use bilinear resampling for discrete / categorical data
- will just create classes that don’t exist
Data Compression
The goal of data compression to reduce storage size on disk with lossless methods. Efficient for homogenous data, i.e. discrete or categorical. Will actually increase storage size on disk if used on continuous data.
Run Length Encoding
- dictionary type encoding which produces {value: count} combinations for each row
Value Point Encoding
- Initial Table: Scans rows for number of values
- Secondary Table: Sums the total number of values for each unique discrete value
Quadtree Encoding
- Partitioning of heterogeneous space into quarter sections that are homogeneous, repeated until homogeneity of a quarter is reached.
- Decent for zooming properties
- Complex, not efficient for higher heteorogenous data
Raster Analysis
Map Algebra
- Cell-by-cell combination of raster data (i.e. addition, subtraction, multiplication, and logical operators)
- Normally, resolutions are aligned before applying functions
- Objects: datasets, layers, values
- Actions: performed on objects (operators and functions)
- Qualifiers: parameters determining the conduction of a function
- Mimics functions from vector operations
Function Types in Map Algebra
- Local: cell-by-cell
- Neighborhood (focal): neighborhood analysis
- Block: whole block of cells
- Zonal: within homogeneous areas, uses zonal analysis/statistics
- Global: incorporation of the full dataset
- Creates uniform definition of entitites in raster data
Moving Windows
- Positioned over input raster
- Defines the input for an operation to be applied
- Result associated with center and written to the output
- Focal: overlapping
- Block: jumping
- Kernel: constraints for a window size, shape, and function
Slope is a focal function because it is inherently a difference function (requires difference between two cell values).
Margin Erosion
- Loss of margins in the original raster dataset (width of cells from the center cell away)
- Solutions are to enlarge the study area or use kernel modifications at the corners.
Applications
- Edge Detection (i.e. concentration changes, discover contrasts and differences)
- Noise Filtering (smoothing): can remove spikes or outliers
Careful with smoothing! This removes variations, creating highly autocorrelated data.
Zonal and Block Function Information
- stats retained for every cell within the region dependent on stat used
- concept of homogeneity
- Still creates a new raster layer!
- always creating a rectangle with raster data and applying no data case
- Size of cell is retained, as compared to resampling!