Skip to content

ADR-003: Asset Lock Row Structure

Status Date Implementation
Accepted 2026-05-17 Implemented in active asset-lock schema v1 as of 0.1.0.

Context

assets.lock.parquet is the package asset-lock table. One row names one STAC Asset location and stores the facts needed to write STAC hrefs back out, move the asset, and validate the asset later.

The schema must stay stable when a package moves between local directories, object stores, and OCI registries.

A single href string is convenient for display, but it mixes several things together: store type, bucket or container, object key, endpoint, and sometimes credentials or signed query parameters. That makes validation and relocation harder.

Decision

Use flat, structured asset-lock rows. Each row has:

  • identity fields;
  • location fields;
  • nullable object facts.

Nullable means a column can be empty when an operation cannot know that fact. The physical schema stays the same across operations, but each operation only fills the facts it can observe.

Required identity fields:

item_id
asset_key

Location fields:

store_type
store_container
store_endpoint_url
key

store_type uses obstore storage type names: file, s3, gs, az, http, or https. These names describe the kind of storage. They are not rclone remote names or config section names.

store_endpoint_url may record the service endpoint needed to make S3-compatible bucket/key locations unambiguous. Credentials and runtime auth configuration stay outside assets.lock.parquet.

Current object facts that can be carried between stores:

size_bytes
etag
last_modified

asset-lock derive --no-probe-metadata normally fills only size_bytes, and only when STAC file:size is present. Default metadata probing and relocation operations can fill object facts such as ETag and last modified time when the storage service reports them.

Do not store a main href column. Rebuild hrefs from store_type, store_container, store_endpoint_url, and key when projecting back to STAC or when calling current object-store APIs.

This ADR defines the active row structure only. Draft checksum rules for file_checksum are covered separately in ADR-004.

Alternatives Considered

  • Keep only href: Simple to display, but hard to check and move because storage type, container, key, endpoint, and possible query secrets are mixed together.
  • Use rclone remote names: Familiar to some users, but it would tie a package to local runtime configuration. A package should describe the object location without depending on a user's rclone config.
  • Store validation results in the row: Easy to read later, but validation results are observations at one time. The lock should store facts about the expected object, not the result of a past command.

Consequences

The lock format is easier to validate and relocate because the storage fields are explicit. It also avoids tying the package to rclone-style config or remote names.

Existing STAC hrefs are still accepted at package boundaries and normalized into the structured fields.

Commands that need a display href or STAC href must rebuild it from the structured fields.