Skip to content

Fairspec Catalog

Authors Evgeny Karev
Profile https://fairspec.org/profiles/latest/catalog.json

Fairspec Catalog is a simple replication format that allows to sync a catalog of Fairspec Datasets. Dynamic search, sorting and similar capabilities are no goals of this specification.

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

A Fairspec Catalog is a JSON Lines resource with each line represents a Fairspec Dataset and its last updated time. The dataset locations MUST be unique within a catalog and the datasets MUST be sorted by the last updated time in descending order.

A top-level descriptor MUST be an array of Dataset objects.

For example:

{"loc": "https://example.com/dataset1.json", "upd": "2023-10-01T00:00:00Z"}
{"loc": "https://example.com/dataset3.json", "upd": "2023-09-01T00:00:00Z"}
{"loc": "https://example.com/dataset2.json", "upd": "2023-08-01T00:00:00Z"}

A catalog entry pointing to a Fairspec Dataset. It MUST have the following properties (all required):

URI to the Fairspec Dataset descriptor. The property MUST be JSON Schema URI.

For example:

{
"loc": "https://example.com/dataset.json"
}

The last updated time of the dataset. The property MUST be JSON Schema date-time (this format requires a timezone component).

For example:

{
"upd": "2023-10-01T00:00:00Z"
}

Fairspec Catalog does not support extension.

Fairspec Catalog is designed to be used in a streaming manner. Because datasets are sorted by their last updated time in descending order, clients SHOULD read the catalog line-by-line and terminate reading once they encounter a dataset with an timestamp that has already been seen. This allows efficient synchronization without processing the entire catalog file.