Gridding module
The AEIC.gridding module converts simulated trajectory data into
three-dimensional gridded emissions inventories. This is the final stage of the
AEIC pipeline: trajectory stores produced by aeic run are binned onto a
latitude/longitude/altitude grid and written to a CF-compliant NetCDF4 file
with one variable per chemical species.
Gridding uses a map-reduce architecture so that it can be parallelized across many machines. In the map phase, each parallel worker processes a slice of the trajectories and writes an intermediate zarr file. In the reduce phase, a single process sums all the zarr slices and writes the final NetCDF output.
The trajectories-to-grid command
The aeic trajectories-to-grid CLI command drives the gridding pipeline. It is
invoked with either --mode map or --mode reduce.
Command-line options
Option |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
Path |
Yes |
Input trajectory store ( |
|
|
Path |
Yes |
Grid definition file (TOML) |
|
|
|
Yes |
Processing mode |
|
|
String |
Yes |
Path prefix for intermediate zarr files |
|
|
Path |
No |
Mission database file (SQLite). Required when using |
|
|
Path |
No |
Trajectory filter definition file (TOML) |
|
|
Path |
No |
Final NetCDF output path. Required in reduce mode |
|
|
|
No |
|
Output time resolution. Only |
|
Integer |
No |
|
Number of parallel processing slices (map phase only) |
|
Integer |
No |
|
Zero-based index of the slice to process (map phase only) |
Map mode
Map mode (--mode map) processes a subset of trajectories from the input store
and accumulates their emissions onto the grid. The output is a single zarr
file named {map-prefix}-{slice-index:05d}.zarr.
A minimal single-process invocation:
aeic trajectories-to-grid \
--input-store output/trajectories.aeic-store \
--grid-file src/AEIC/data/grids/era5.toml \
--mode map \
--map-prefix output/grid/map-slice
This produces output/grid/map-slice-00000.zarr.
Reduce mode
Reduce mode (--mode reduce) discovers all zarr slice files matching the
{map-prefix}-NNNNN.zarr pattern, validates that they were all produced with
the same grid and filter settings, sums them, and writes the final NetCDF
output.
aeic trajectories-to-grid \
--input-store output/trajectories.aeic-store \
--grid-file src/AEIC/data/grids/era5.toml \
--mission-db-file oag-2019.sqlite \
--mode reduce \
--map-prefix output/grid/map-slice \
--output-file output/inventory.nc
The --mission-db-file is required in reduce mode because the mission database
is queried to determine the inventory time range (earliest and latest scheduled
departure timestamps).
Grid definition files
Grid definitions are TOML files with three sections: [latitude],
[longitude], and [altitude]. Several built-in grid definitions are shipped
in src/AEIC/data/grids/.
Horizontal axes
The [latitude] and [longitude] sections share the same structure:
Key |
Type |
Default |
Description |
|---|---|---|---|
|
float |
Cell width in degrees |
|
|
|
|
Domain extent in degrees |
Cell edges are placed at range[0], range[0] + resolution, ... and cell
centers fall at the midpoints (i.e. at range[0] + 0.5 * resolution,
range[0] + 1.5 * resolution, etc.).
Vertical axis
The [altitude] section is discriminated by its mode field.
Height mode (mode = "height") defines a uniform vertical grid in meters:
Key |
Type |
Description |
|---|---|---|
|
|
Selects uniform height grid |
|
float |
Cell height in meters |
|
|
Altitude extent in meters |
ISA pressure mode (mode = "isa_pressure") defines a non-uniform vertical
grid using explicit pressure levels:
Key |
Type |
Description |
|---|---|---|
|
|
Selects ISA pressure level grid |
|
list of floats |
Pressure levels in hPa |
When using isa_pressure mode, trajectory altitudes (in meters) are converted
to ISA pressure values before binning. The output NetCDF stores pressure levels
in descending order (following the ERA5 convention).
Examples
A 1-degree uniform height grid (basic-1x1.toml):
[longitude]
resolution = 1.0
[latitude]
resolution = 1.0
[altitude]
mode = "height"
resolution = 500
range = [0, 15500]
An ERA5-compatible pressure level grid (era5.toml):
[latitude]
resolution = 1.0
[longitude]
resolution = 1.0
[altitude]
mode = "isa_pressure"
levels = [500, 450, 400, 350, 300, 250, 225, 200, 175, 150, 125, 100]
Filtering trajectories
By default, all trajectories in the input store are gridded. To process only a
subset, provide a --filter-file (TOML) along with --mission-db-file. The
filter file defines conditions that are combined with AND logic to select
matching flights from the mission database.
aeic trajectories-to-grid \
--input-store output/trajectories.aeic-store \
--mission-db-file oag-2019.sqlite \
--filter-file my-filter.toml \
--grid-file src/AEIC/data/grids/era5.toml \
--mode map \
--map-prefix output/grid/map-slice
Filter files use the same Filter model
described in the missions documentation. Available filter fields include
distance ranges, seat capacity, airport codes, countries, continents, and
geographic bounding boxes.
Note
The --filter-file option is only supported in map mode. In reduce mode,
the filter metadata embedded in each zarr slice is used for consistency
validation instead.
Map-reduce architecture
Map phase
The map phase iterates over trajectories and accumulates per-segment emissions onto the grid:
Load trajectories from the store, either sequentially (
iter_range) or by flight ID (iter_flight_idswhen a filter is active).Split at the antimeridian using
Trajectory.dateline_split(). Trajectories that cross the +/-180 degree longitude line are split into sub-trajectories at the crossing point, with emissions distributed proportionally.Extract segments. Each sub-trajectory of N points yields N-1 segments. For ISA pressure grids, segment altitudes are converted from meters to hPa.
Batch and dispatch. Segments are accumulated into batches of 1000 before being passed to the Numba-jitted voxel traversal kernel. The kernel uses a fast path for segments that stay within a single grid cell (the common case) and an Amanatides-Woo 3D traversal for segments that cross multiple cells.
Write zarr. The accumulated grid array (shape: latitude x longitude x altitude x species) is written to a zarr file with grid and filter metadata attached as attributes.
Reduce phase
The reduce phase combines all map outputs into a single NetCDF file:
Discover all zarr files matching the
{map-prefix}-NNNNN.zarrpattern. Validates that slice indices are contiguous starting from 0.Validate that all slices have identical array shapes and consistent grid and filter metadata.
Accumulate by summing all slice arrays into a single float32 grid.
Query the mission database for the inventory time range.
Write the final CF-compliant NetCDF4 file via
OutputGrid.
Parallel execution
The --slice-count and --slice-index options split the trajectory store into
roughly equal chunks for parallel processing. Each map worker receives a
different --slice-index (0-based) and writes its own zarr file. After all map
jobs complete, a single reduce job combines them.
The slice calculation divides the total number of missions (or trajectories)
into slice-count groups of ceil(N / slice-count), with the last slice
clamped to the remainder.
Example with GNU parallel
Generate a task list and run 10 concurrent workers:
# Generate map tasks
NJOBS=100
for i in $(seq 0 $((NJOBS-1))); do
echo "aeic trajectories-to-grid \
--input-store output/trajectories.aeic-store \
--grid-file src/AEIC/data/grids/era5.toml \
--mode map \
--map-prefix output/grid/map-slice \
--slice-count $NJOBS \
--slice-index $i"
done > tasks.txt
# Run with 10 concurrent workers
parallel -j 10 < tasks.txt
# Reduce all slices into the final output
aeic trajectories-to-grid \
--input-store output/trajectories.aeic-store \
--grid-file src/AEIC/data/grids/era5.toml \
--mission-db-file oag-2019.sqlite \
--mode reduce \
--map-prefix output/grid/map-slice \
--output-file output/inventory.nc
SLURM job arrays
For HPC clusters, the same pattern maps naturally to SLURM job arrays. Set
--slice-count to the array size and --slice-index to $SLURM_ARRAY_TASK_ID
(0-based). Run the reduce step as a dependent job that waits for all map
tasks to complete.
Output format
The reduce phase writes a CF-compliant NetCDF4 file with the following structure.
Dimensions
Dimension |
Size |
Description |
|---|---|---|
|
unlimited |
Time steps (1 for annual output) |
|
grid-dependent |
Number of latitude bins |
|
grid-dependent |
Number of longitude bins |
|
grid-dependent |
Number of vertical bins |
|
2 |
Vertex count for bounds variables |
Coordinate variables
time— seconds since 1970-01-01 UTC. For annual output, contains a single value at the start of the inventory period.latitude— cell center values in degrees north, withlat_bndsbounds.longitude— cell center values in degrees east, withlon_bndsbounds.altitude(height grids) — cell center values in meters,positive = 'up', withaltitude_bndsbounds.pressure_level(ISA pressure grids) — pressure values in hPa,positive = 'down', stored in descending order (ERA5 convention).
Species variables
One variable per chemical species (e.g. co2, h2o, nox), with dimensions
(time, vertical, latitude, longitude). Values are in grams, stored as
float32 with zlib compression (level 4, shuffle enabled).
Reproducibility provenance
The output file includes a _reproducibility group containing two subgroups:
trajectory_generation— provenance from the input trajectory store (AEIC version, git state, configuration, files accessed, sampling parameters).gridding— provenance from the gridding run (AEIC version, git state, grid definition, filter expression, input paths, number of slices).
API reference
Grid classes
- class AEIC.gridding.grid.Grid(*, latitude, longitude, altitude)
- Parameters:
latitude (LatitudeGrid)
longitude (LongitudeGrid)
altitude (Annotated[HeightGrid | ISAPressureGrid, FieldInfo(annotation=NoneType, required=True, discriminator='mode')])
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AEIC.gridding.grid.LatitudeGrid(*, resolution, range=(-90.0, 90.0))
- Parameters:
resolution (float)
range (tuple[float, float])
- class AEIC.gridding.grid.LongitudeGrid(*, resolution, range=(-180.0, 180.0))
- Parameters:
resolution (float)
range (tuple[float, float])
- class AEIC.gridding.grid.HeightGrid(*, mode, resolution, range)
- Parameters:
mode (Literal['height'])
resolution (float)
range (tuple[float, float])
- property edges: ndarray
N+1 bin edge values synthesized from level midpoints, with outer boundaries extended symmetrically.
- property levels: ndarray
Bin center values (meters).
Output
- class AEIC.gridding.output.OutputGrid(grid, species, accum, min_ts, n_slices, input_store, mission_db_file, traj_repro, traj_comments, filter_json)
Accumulated gridded emissions ready to be written to a NetCDF file.
accumis a float32 array with shape(nlat, nlon, nalt, nspecies)produced by the map phase.write()converts it to a CF-compliant NetCDF4 inventory file with coordinate variables, per-species emissions variables, and structured reproducibility provenance groups.- Parameters:
- write(output_file)
Write the accumulated gridded data to a NetCDF file.
Always passes
keepweakref=Trueper the netcdf4-python issue #1444 workaround used elsewhere in this repo. The accumulator is laid out as(lat, lon, alt, species)and each species slab is permuted to(alt, lat, lon)when written.- Parameters:
output_file (Path)
- Return type:
None
Command functions
- AEIC.commands.trajectories_to_grid.map_phase(ntrajs, species, traj_iter, grid, map_output, filter_expr=None)
- Parameters:
ntrajs (int)
species (list[Species])
traj_iter (Generator[Trajectory])
grid (Grid)
map_output (str)
filter_expr (Filter | None)
- AEIC.commands.trajectories_to_grid.reduce_phase(grid, species, map_prefix, output_file, mission_db_file, input_store, traj_repro, traj_comments)
Combine map-phase zarr slice files into a single gridded NetCDF file.
The map phase writes one bare zarr array per slice, named
{map_prefix}-NNNNN.zarr(whereNNNNNis the zero-padded slice index), each with shape(nlat, nlon, nalt, nspecies)and dtypef4. This function discovers all such slice files undermap_prefix, sums them into a single accumulator, queries the mission database for the inventory time range, and writes a NetCDF file containing one variable per species plus CF-style coordinate variables.Grid and filter metadata are read from the zarr slice attributes (written during the map phase) and cross-validated for consistency.