Skip to content

Asset Lock

assets.lock.parquet records verifiable assets for a package: STAC Asset references with locked locations and store facts. One row names one locked STAC Asset location and the facts needed to check or move that asset later. The term does not require bundled bytes; it means the package has enough recorded evidence to validate, relocate, and hand over referenced assets intentionally.

Schema

The current asset-lock schema version is v1. stacpkg writes a stable, wide schema for every asset-lock table so files from different operations can be concatenated and streamed through the same Arrow contract. item_id and asset_key are required; the other columns are nullable because different operations populate different facts.

Column Arrow type Required Meaning
item_id string Yes STAC Item id.
asset_key string Yes STAC Asset key inside that item.
store_type string No Obstore storage type name: file, s3, gs, az, http, or https.
store_container string No Store container such as an S3 bucket, GCS bucket, Azure container, or HTTP origin.
store_endpoint_url string No Optional object-store endpoint URL, such as https://s3.amazonaws.com or a MinIO endpoint, used when the bucket name alone is ambiguous.
key string No Object key, path, or package-relative file path inside the store container.
size_bytes int64 No Full object size in bytes, copied from STAC file:size, observed from object metadata, or observed after relocation.
etag string No Backend object validator observed by metadata probing or relocation operations.
last_modified string No Backend object last-modified timestamp observed by metadata probing or relocation operations, serialized as an ISO-8601 string when reported.

Validation results are not asset-lock columns. asset-lock validate computes valid and errors dynamically and prints JSON lines.

Assets whose key is metadata are skipped by default because these sidecar objects often repeat item or asset metadata already preserved in items.parquet. Use --include-metadata-assets to include them with the rest of the lock. An explicit --asset-keys metadata filter also includes them.

Deferred fields such as media type, checksums, provider object identity, and content type are intentionally outside the active schema. Draft checksum semantics are captured in ADR-004: Asset-Lock Checksum Facts and STAC Projection. That draft plans to store checksum facts in the asset lock first and then project them to STAC file:checksum.

Locations

Asset locations are structured instead of stored as a single href. The storage type names follow obstore (file, s3, gs, az, http, https), store_endpoint_url can record the S3-compatible endpoint that makes a bucket/key location unambiguous. Credentials stay outside the lock, including bucket-scoped S3 runtime variables such as STACPKG_S3_ACCESS_KEY_ID_<BUCKET>.

When store_endpoint_url is empty, S3 operations fall back to runtime endpoint configuration from STACPKG_S3_ENDPOINT_<BUCKET>, STACPKG_S3_ENDPOINTS_JSON, AWS_ENDPOINT_URL, or AWS_ENDPOINT.

STAC Mapping

items enrich writes lock size facts back to STAC Assets using the STAC File Info extension:

Asset lock STAC Asset field
size_bytes file:size

The active lock does not write checksum facts back to STAC metadata. Planned checksum support will extend this mapping with file_checksum to file:checksum after checksum facts become part of the asset-lock schema.

When items enrich --alternate-key is used, reconstructed hrefs are written through the STAC Alternate Assets extension. If alternate hrefs are a mapped view of existing lock locations, create that mapped lock first with asset-lock relocate --dry-run, then pass it to items enrich.

References:

Validation

asset-lock validate compares a lock row with the current asset at its structured location. It validates against assets.lock; upload and download transfer correctness remains the responsibility of the object-store library.

Validation compares size_bytes, etag, and last_modified when those facts are locked and the backend reports comparable current values. A missing nullable fact is skipped.

ETag Handling

etag is stored as a backend object validator. It is not treated as a portable file checksum. S3 multipart ETags and weak HTTP ETags are validators with store-specific meaning, so byte-level checksum semantics are deferred.