Fairspec Dataset
| Authors | Evgeny Karev |
|---|---|
| Profile | https://fairspec.org/profiles/latest/dataset.json |
Fairspec Dataset is a simple JSON based format that allows to describe a single dataset and its resources. It is compatible with DataCite for metadata and JSON Schema for structured data.
Language
Section titled “Language”The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
Descriptor
Section titled “Descriptor”A Fairspec Dataset is a JSON resource that MUST be an object compatible with the Dataset structure outlined below.
Dataset
Section titled “Dataset”A top-level descriptor object describing an individual dataset. It MIGHT have the following properties (all optional unless otherwise stated):
$schema
Section titled “$schema”External Path to one of the officially published Fairspec Dataset profiles or to a Fairspec Dataset Extension profile with default value https://fairspec.org/profiles/latest/dataset.json.
For example for version X.Y.Z of the profile:
{ "$schema": "https://fairspec.org/profiles/X.Y.Z/dataset.json"}resources
Section titled “resources”A list of resources. It MUST be an array with search item MUST be a Resource.
For example for a single resource:
{ "resources": [ { "data": "https://example.com/file.csv" } ]}For multiple resources:
{ "resources": [ { "data": "https://example.com/file1.csv", "format": { "name": "csv", "delimiter": ";" }, "integrity": { "type": "sha256", "hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" }, }, { "data": "https://example.com/file2.json", } ]}<datacite>
Section titled “<datacite>”Dataset supports all the properties defined in DataCite Metadata Schema 4.6.
For example for a dataset with DOI and some other properties:
{ "doi": "10.1234/5678", "title": "My Dataset", "creators": [ { "name": "John Doe", "nameType": "Personal" } ]}Resource
Section titled “Resource”A resource within a dataset. It MUST have the following properties (all optional unless otherwise stated):
Data or content of the resource. It MUST be in one of the following:
When multiple files are provided, they MUST all follow the same Format and their contents MUST be physically concatenable in case of binary formats and logically concatenable in case of textual formats (i.e., combining them should produce a valid single file of that format).
For example, for a single internal file:
{ "data": "file.csv"}For multiple external files:
{ "data": [ "https://example.com/file1.csv", "https://example.com/file2.csv" ]}For a JSON object:
{ "data": { "name": "John Doe", "age": 30 }}For a JSON array of objects:
{ "data": [ { "name": "John Doe", "age": 30 }, { "name": "Jane Doe", "age": 25 } ]}An optional name for the resource. It MUST be a string consisting of alphanumeric characters and underscores. If provided, it can be used to reference resource within a dataset context. For example, a name of the resource is used in Foreign Keys specified in Fairspec Schema.
For example:
{ "name": "measurements"}dialect
Section titled “dialect”It MUST be a Path to a Dialect or an object with the Dialect. The Dialect MUST be compatible with the Fairspec File specification. If present, Data MUST be compatible with the provided dialect.
For example, for a file with CSV format:
{ "dialect": { "format": "csv", "delimiter": ";" }}textual
Section titled “textual”A boolean indicating whether the file is text-based. When true, the file MUST be utf-8 encoded.
For example:
{ "data": "document.md", "textual": true}integrity
Section titled “integrity”The integrity check of the file. It MUST be a JSON object with the following properties:
type
The type of the integrity check. It MUST be one of the following values:
md5sha1sha256sha512
hash
The hash of the file. It MUST be a string.
For example for a file with SHA-256 hash:
{ "integrity": { "type": "sha256", "hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" }}dataSchema
Section titled “dataSchema”It MUST be a Path to or an object with Fairspec Data Schema. If present, Data MUST be a JSON document that is compatible with the provided schema.
For example as an external path:
{ "dataSchema": "https://example.com/schema.json"}For example as an object:
{ "dataSchema": { "type": "object", "properties": { "name": { "type": "string" }, "age": { "type": "integer" } }, "required": ["name", "age"] }}tableSchema
Section titled “tableSchema”It MUST be a Path to or an object with Fairspec Table Schema. If present, Data MUST be a table that is compatible with the provided schema.
For example as an external path:
{ "tableSchema": "https://example.com/schema.json"}For example as an object:
{ "tableSchema": { "required": ["name", "age"], "properties": { "name": { "type": "string" }, "age": { "type": "integer" } } }}<datacite>
Section titled “<datacite>”Resource supports all the properties defined in DataCite Metadata Schema 4.6.
For example for a resource with geolocation:
{ "geoLocations": [ { "geoLocationPoint": { "pointLongitude": 12.34, "pointLatitude": 56.78 }, "geoLocationBox": { "westBoundLongitude": 12.34, "eastBoundLongitude": 56.78, "southBoundLatitude": 12.34, "northBoundLatitude": 56.78 } } ]}Common
Section titled “Common”Common properties shared by multiple entities in the descriptor.
It MUST be is a string representing a file location. It MUST be one of the following:
Internal Path
Section titled “Internal Path”It MUST be a string representing a relative path, using Unix-style forward slashes (/) as path separators. The path is resolved relative to the location of the dataset descriptor file. Unix-style paths MUST be converted to the appropriate platform-specific format when accessing files (e.g., converting / to \ on Windows).
Internal path MUST point to a file in the same directory as the descriptor file or in a subdirectory of it. Files outside of the descriptor directory are not supported.
The path MUST NOT contain any of the following:
- Absolute path indicators (starting with
/or~) - Directory traversal sequences (
..) - Windows-style backslashes (
\) - only forward slashes (/) are allowed as separators - Windows drive letters (
C:,D:, etc.) - URI schemes (
://)
The path MAY contain unicode characters, spaces, and special characters in file and directory names.
For example:
{ "data": "measurements.csv"}For example with subdirectories:
{ "data": "data/experiments/results-2024.json"}For example with unicode and special characters:
{ "data": "données/résultats (final).csv"}External Path
Section titled “External Path”It MUST be a string representing an HTTP or HTTPS URL to a remote file.
For example:
{ "data": "https://example.com/datasets/measurements.csv"}Extension
Section titled “Extension”Fairspec Dataset has a simple yet powerful extension mechanism based on the JSON Schema standard. An extension is a domain-specific Fairspec Dataset flavour that enriches the standard with additional metadata properties and validation rules.
Creation
Section titled “Creation”A custom JSON Schema can be provided as a $schema property in the dataset descriptor. The profile instructs to validate the descriptor using JSON Schema rules defined by the extension. The extension’s schema MUST include base Fairspec Dataset schema in the root allOf property.
Using JSON Schema features with custom profiles allows you to:
- Add new domain-specific properties
- Require existing properties to meet specific requirements
- Define expected resource types and their schemas
- Combine existing profiles as part of a high-level extension
Example
Section titled “Example”For example, a Spectroscopy Fairspec extension that requires spectral metadata:
{ "$schema": "https://spectroscopy.org/profiles/1.0.0/dataset.json", "resources": [ { "data": "spectrum.csv", "spectralRange": { "min": 400, "max": 4000, "unit": "cm-1" } } ]}The extension profile would include the base Fairspec Dataset schema and add domain-specific requirements:
{ "$schema": "http://json-schema.org/draft/2020-12/schema", "title": "Faispec Spectroscopy Profile", "allOf": [ { "$ref": "https://fairspec.org/profiles/1.0.0/dataset.json" }, { "$ref": "#/$defs/spectroscopyMixin" } ], "$defs": { "spectroscopyMixin": { "type": "object", "properties": { "resources": { "type": "array", "items": { "properties": { "spectralRange": { "type": "object", "required": ["min", "max", "unit"], "properties": { "min": { "type": "number" }, "max": { "type": "number" }, "unit": { "type": "string" } } } } } } } } }}