Skip to content

stacpkg

stacpkg is an Apache Arrow-native CLI and library for turning a selected set of STAC Items into a reproducible package. It supports both package distribution and asset relocation: packages can be pushed and pulled through OCI registries, and referenced assets can be relocated to new locations and recorded back into STAC metadata. Through assets.lock.parquet, external asset references become verifiable assets: STAC Assets with locked locations and object facts that can be checked, relocated, and handed over later.

At a high level, a package captures three things: the selected STAC Items, the asset evidence needed to make external references checkable, and any optional context files such as reports, licenses, or bundled asset bytes. That package can then be inspected, transferred through OCI, or used as the starting point for relocating assets.

The canonical package is a directory:

stacpkg.pkg/
  items.parquet
  assets.lock.parquet
  <optional content>      # included files or bundled asset bytes

items.parquet keeps STAC metadata for selected Items. assets.lock.parquet keeps the package asset lock, turning referenced STAC Assets into verifiable assets by recording location and object facts such as size, ETag, and last-modified metadata when available. Keeping these row types separate is the main design choice: STAC GeoParquet is item-oriented, while asset facts are asset-oriented.

Why Packaging?

STAC searches, catalog exports, and external asset locations can change over time. A package records the selected Items, asset references, and optional context in one place.

Reproducible packaging has two basic choices: include asset bytes in the package, or keep asset references and make those references verifiable. Spatiotemporal assets, such as Earth Observation imagery, are often too large or distributed to embed directly, so stacpkg keeps a compact package and records the asset facts needed to audit, validate, relocate, and publish alternate references. These locked rows are the package's verifiable assets. When distribution needs a self-contained package, assets may also be bundled.

Why Relocate Assets?

Workflows may need to do more than cache assets and truly relocate them. Common reasons include:

  • long-term durability when upstream assets might move or be deleted;
  • faster or data-local processing near compute;
  • lower egress costs and fewer repeated provider reads;
  • controlled credential boundaries and recipient-specific access;
  • governance, data residency, and audit requirements;
  • cross-party handover, offline delivery, and provider outage isolation.

stacpkg can relocate assets and record relocated locations in asset lock rows. It can project those locations back into STAC metadata, either updating the main references or keeping the original references visible through STAC alternate asset hrefs.

When packages move through OCI, stacpkg publishes typed layers rather than a separate metadata file: application/vnd.stacpkg.items.v1.parquet, application/vnd.stacpkg.asset-lock.v1.parquet, application/vnd.stacpkg.files.v1+zip for optional directories, and application/vnd.stacpkg.asset.v1 or application/vnd.stacpkg.asset.v1+zip for materialized asset bytes.

Start with Create STAC Package for a guided package creation example, then see Relocate Assets for an example relocation workflow.

Table Pipeline Model

The core package path is deliberately table-oriented:

STAC JSON / STAC GeoParquet
        |
        v
Arrow RecordBatch stream
        |
        v
Parquet items and asset-lock tables

CLI commands compose as Arrow IPC pipelines first. Parquet files enter and leave pipelines through explicit from-parquet and to-parquet adapter commands. Those adapters preserve RecordBatch flow: from-parquet iterates Parquet batches, and to-parquet writes incoming IPC batches without loading the whole file or stream as one Arrow table. build takes its package directory as a positional argument and reads items streams from stdin.

Current Scope

  • Read STAC Item and ItemCollection JSON.
  • Read and write STAC GeoParquet-style items tables.
  • Create one asset lock row per STAC asset, with object facts for later validation.
  • Pass STAC items and asset-lock tables through Arrow IPC pipelines.
  • Materialize pipeline results explicitly as STAC GeoParquet tables, asset-lock Parquet tables, or package directories.
  • Relocate bytes between source and destination asset locations.
  • Project relocated hrefs back into STAC items metadata.
  • Build package directories with items tables, asset locks, and optional content.
  • Inspect packages as YAML, JSON, or Markdown.
  • Push and pull packages through OCI registries with oras-py.