Skip to content

Fairspec Dataset

Authors Evgeny Karev
Profile https://fairspec.org/profiles/latest/dataset.json

Fairspec Dataset is a simple JSON based format that allows to describe a single dataset and its resources. It is compatible with DataCite for metadata and JSON Schema for structured data.

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

A Fairspec Dataset is a JSON resource that MUST be an object compatible with the Dataset structure outlined below.

A top-level descriptor object describing an individual dataset. It MIGHT have the following properties (all optional unless otherwise stated):

External Path to one of the officially published Fairspec Dataset profiles or to a Fairspec Dataset Extension profile with default value https://fairspec.org/profiles/latest/dataset.json.

For example for version X.Y.Z of the profile:

{
"$schema": "https://fairspec.org/profiles/X.Y.Z/dataset.json"
}

A list of resources. It MUST be an array with search item MUST be a Resource.

For example for a single resource:

{
"resources": [
{
"data": "https://example.com/file.csv"
}
]
}

For multiple resources:

{
"resources": [
{
"data": "https://example.com/file1.csv",
"format": {
"name": "csv",
"delimiter": ";"
},
"integrity": {
"type": "sha256",
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
},
},
{
"data": "https://example.com/file2.json",
}
]
}

Dataset supports all the properties defined in DataCite Metadata Schema 4.6.

For example for a dataset with DOI and some other properties:

{
"doi": "10.1234/5678",
"title": "My Dataset",
"creators": [
{
"name": "John Doe",
"nameType": "Personal"
}
]
}

A resource within a dataset. It MUST have the following properties (all optional unless otherwise stated):

Data or content of the resource. It MUST be in one of the following:

  • Path to a file
  • Array of Paths to files
  • Inline JSON object
  • Inline JSON array of objects

When multiple files are provided, they MUST all follow the same Format and their contents MUST be physically concatenable in case of binary formats and logically concatenable in case of textual formats (i.e., combining them should produce a valid single file of that format).

For example, for a single internal file:

{
"data": "file.csv"
}

For multiple external files:

{
"data": [
"https://example.com/file1.csv",
"https://example.com/file2.csv"
]
}

For a JSON object:

{
"data": {
"name": "John Doe",
"age": 30
}
}

For a JSON array of objects:

{
"data": [
{
"name": "John Doe",
"age": 30
},
{
"name": "Jane Doe",
"age": 25
}
]
}

An optional name for the resource. It MUST be a string consisting of alphanumeric characters and underscores. If provided, it can be used to reference resource within a dataset context. For example, a name of the resource is used in Foreign Keys specified in Fairspec Schema.

For example:

{
"name": "measurements"
}

It MUST be a Path to a Dialect or an object with the Dialect. The Dialect MUST be compatible with the Fairspec File specification. If present, Data MUST be compatible with the provided dialect.

For example, for a file with CSV format:

{
"dialect": {
"format": "csv",
"delimiter": ";"
}
}

A boolean indicating whether the file is text-based. When true, the file MUST be utf-8 encoded.

For example:

{
"data": "document.md",
"textual": true
}

The integrity check of the file. It MUST be a JSON object with the following properties:

type

The type of the integrity check. It MUST be one of the following values:

  • md5
  • sha1
  • sha256
  • sha512

hash

The hash of the file. It MUST be a string.

For example for a file with SHA-256 hash:

{
"integrity": {
"type": "sha256",
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}

It MUST be a Path to or an object with Fairspec Data Schema. If present, Data MUST be a JSON document that is compatible with the provided schema.

For example as an external path:

{
"dataSchema": "https://example.com/schema.json"
}

For example as an object:

{
"dataSchema": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}
},
"required": ["name", "age"]
}
}

It MUST be a Path to or an object with Fairspec Table Schema. If present, Data MUST be a table that is compatible with the provided schema.

For example as an external path:

{
"tableSchema": "https://example.com/schema.json"
}

For example as an object:

{
"tableSchema": {
"required": ["name", "age"],
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
}
}
}
}

Resource supports all the properties defined in DataCite Metadata Schema 4.6.

For example for a resource with geolocation:

{
"geoLocations": [
{
"geoLocationPoint": {
"pointLongitude": 12.34,
"pointLatitude": 56.78
},
"geoLocationBox": {
"westBoundLongitude": 12.34,
"eastBoundLongitude": 56.78,
"southBoundLatitude": 12.34,
"northBoundLatitude": 56.78
}
}
]
}

Common properties shared by multiple entities in the descriptor.

It MUST be is a string representing a file location. It MUST be one of the following:

It MUST be a string representing a relative path, using Unix-style forward slashes (/) as path separators. The path is resolved relative to the location of the dataset descriptor file. Unix-style paths MUST be converted to the appropriate platform-specific format when accessing files (e.g., converting / to \ on Windows).

Internal path MUST point to a file in the same directory as the descriptor file or in a subdirectory of it. Files outside of the descriptor directory are not supported.

The path MUST NOT contain any of the following:

  • Absolute path indicators (starting with / or ~)
  • Directory traversal sequences (..)
  • Windows-style backslashes (\) - only forward slashes (/) are allowed as separators
  • Windows drive letters (C:, D:, etc.)
  • URI schemes (://)

The path MAY contain unicode characters, spaces, and special characters in file and directory names.

For example:

{
"data": "measurements.csv"
}

For example with subdirectories:

{
"data": "data/experiments/results-2024.json"
}

For example with unicode and special characters:

{
"data": "données/résultats (final).csv"
}

It MUST be a string representing an HTTP or HTTPS URL to a remote file.

For example:

{
"data": "https://example.com/datasets/measurements.csv"
}

Fairspec Dataset has a simple yet powerful extension mechanism based on the JSON Schema standard. An extension is a domain-specific Fairspec Dataset flavour that enriches the standard with additional metadata properties and validation rules.

A custom JSON Schema can be provided as a $schema property in the dataset descriptor. The profile instructs to validate the descriptor using JSON Schema rules defined by the extension. The extension’s schema MUST include base Fairspec Dataset schema in the root allOf property.

Using JSON Schema features with custom profiles allows you to:

  • Add new domain-specific properties
  • Require existing properties to meet specific requirements
  • Define expected resource types and their schemas
  • Combine existing profiles as part of a high-level extension

For example, a Spectroscopy Fairspec extension that requires spectral metadata:

{
"$schema": "https://spectroscopy.org/profiles/1.0.0/dataset.json",
"resources": [
{
"data": "spectrum.csv",
"spectralRange": {
"min": 400,
"max": 4000,
"unit": "cm-1"
}
}
]
}

The extension profile would include the base Fairspec Dataset schema and add domain-specific requirements:

{
"$schema": "http://json-schema.org/draft/2020-12/schema",
"title": "Faispec Spectroscopy Profile",
"allOf": [
{ "$ref": "https://fairspec.org/profiles/1.0.0/dataset.json" },
{ "$ref": "#/$defs/spectroscopyMixin" }
],
"$defs": {
"spectroscopyMixin": {
"type": "object",
"properties": {
"resources": {
"type": "array",
"items": {
"properties": {
"spectralRange": {
"type": "object",
"required": ["min", "max", "unit"],
"properties": {
"min": { "type": "number" },
"max": { "type": "number" },
"unit": { "type": "string" }
}
}
}
}
}
}
}
}
}