Skip to content

Fairspec Dialect

Authors Evgeny Karev
Profile https://fairspec.org/profiles/latest/dialect.json

Fairspec Dialect is a simple JSON based format that defines Dialect to describe a file’s format options and features.

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

A Fairspec Dialect is a JSON resource that MUST be an object compatible with the Dialect structure outlined below.

A top-level descriptor object describing a file dialect. It MIGHT have the following properties (all optional unless otherwise stated):

External Path to one of the officially published Fairspec Dialect profiles with default value https://fairspec.org/profiles/latest/dialect.json.

For example for version X.Y.Z of the profile:

{
"$schema": "https://fairspec.org/profiles/X.Y.Z/dialect.json"
}

The file format of the dialect. It MUST be a string.

For example for a CSV file:

{
"dialect": {
"format": "csv"
}
}

An optional human-readable title for the format.

For example:

{
"dialect": {
"title": "My custom format"
}
}

An optional detailed description of the format.

For example:

{
"dialect": {
"title": "My custom format",
"description": "You can open this file with OpenOffice"
}
}

A format for comma-separated values files. It MUST have format set to "csv". It MUST be utf-8 encoded. Empty cells (,,) are null values.

Metadata example:

{
"dialect": {
"format": "csv"
}
}

Data example:

name,age,city
Alice,30,New York
Bob,25,London
Charlie,35,Tokyo

Format properties:

A format for tab-separated values files. It MUST have format set to "tsv". It MUST be utf-8 encoded. Empty cells (,,) are null values.

Metadata example:

{
"dialect": {
"format": "tsv",
"nullSequence": ["NA", "N/A", ""]
}
}

Data example:

name age city
Alice 30 New York
Bob 25 London
Charlie 35 Tokyo

Format properties:

A format for JSON array files. It MUST have format set to "json".

Metadata example:

{
"dialect": {
"format": "json",
"jsonPointer": "/data/items"
}
}

Data example:

[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Tokyo"}
]

Format properties:

A format for JSON Lines files (newline-delimited JSON). It MUST have format set to "jsonl".

Metadata example:

{
"dialect": {
"format": "jsonl",
"rowType": "object"
}
}

Data example:

{"name": "Alice", "age": 30, "city": "New York"}
{"name": "Bob", "age": 25, "city": "London"}
{"name": "Charlie", "age": 35, "city": "Tokyo"}

Format properties:

A format for Microsoft Excel files. It MUST have format set to "xlsx". Empty cells are null values.

Metadata example:

{
"dialect": {
"format": "xlsx",
"sheetNumber": 2
}
}

Data example:

<binary data>

Format properties:

A format for OpenDocument Spreadsheet files. It MUST have format set to "ods". Empty cells are null values.

Metadata example:

{
"dialect": {
"format": "ods",
"sheetName": "Data Sheet"
}
}

Data example:

<binary data>

Format properties:

A format for Apache Parquet files. It MUST have format set to "parquet".

Metadata example:

{
"dialect": {
"format": "parquet"
}
}

Data example:

<binary data>

A format for Apache Arrow files. It MUST have format set to "arrow".

Metadata example:

{
"dialect": {
"format": "arrow"
}
}

Data example:

<binary data>

A format for SQLite database files. It MUST have format set to "sqlite".

Metadata example:

{
"dialect": {
"format": "sqlite"
}
}

Data example:

<binary data>

Format properties:

A format for custom data. It MUST have format not supported by the formats above.

Metadata example:

{
"format": {
"format": "custom",
"title": "Custom format",
"description": "Custom format description"
}
}

Data example:

<binary data>

The name of the custom format. It MUST be a string.

For example:

{
"dialect": {
"name": "custom",
"title": "Custom format",
"description": "Custom format description"
}
}

It MUST be a string of one character length. This property specifies the character sequence which separates fields in the data file.

For example:

{
"dialect": {
"format": "csv",
"delimiter": ";"
}
}

For a file like:

id;name;price
1;apple;1.50
2;orange;2.00

It MUST be a string. This property specifies the character sequence which terminates rows in the file. Common values are \n (Unix), \r\n (Windows), \r (old Mac).

For example:

{
"dialect": {
"format": "csv",
"lineTerminator": "\r\n"
}
}

It MUST be a string of one character length. This property specifies a character to use for quoting in case the delimiter needs to be used inside a data cell.

For example:

{
"dialect": {
"format": "csv",
"quoteChar": "'"
}
}

For a file like:

id,name
1,'apple,red'
2,'orange,citrus'

It MUST be a string or an array of strings. This property specifies the null sequence representing missing values in the data.

For example with a single sequence:

{
"dialect": {
"format": "csv",
"nullSequence": "NA"
}
}

For example with multiple sequences:

{
"dialect": {
"format": "csv",
"nullSequence": ["NA", "N/A", "null", ""]
}
}

For a file like:

id,name,notes
1,apple,fresh
2,orange,NA
3,banana,N/A

It MUST be false or an array of positive integers starting from 1. This property specifies the row numbers for the header.

For example with a single header row:

{
"dialect": {
"format": "csv",
"headerRows": [1]
}
}

For example with multi-line headers:

{
"dialect": {
"format": "csv",
"headerRows": [1, 2]
}
}

For a file like:

fruit
id,name,price
1,apple,1.50
2,orange,2.00

This would produce headers: “fruit id”, “fruit name”, “fruit price”

For example with no headers:

{
"dialect": {
"format": "csv",
"headerRows": false
}
}

It MUST be a string. This property specifies how multiline-header files have to join the resulting header rows.

For example:

{
"dialect": {
"format": "csv",
"headerRows": [0, 1],
"headerJoin": "_"
}
}

For a file like:

fruit
id,name,price
1,apple,1.50

This would produce headers: “fruit_id”, “fruit_name”, “fruit_price”

It MUST be an array of positive integers starting from 1. This property specifies what rows have to be omitted from the data.

For example:

{
"dialect": {
"format": "csv",
"commentRows": [1, 5, 10]
}
}

For a file like:

id,name
# This is a comment row
1,apple
2,orange

With "commentRows": [2], the second row would be skipped.

It MUST be a string. This property specifies what rows have to be omitted from the data based on the row’s prefix.

For example:

{
"dialect": {
"format": "csv",
"commentPrefix
}
}

For a file like:

id,name
# This row is ignored
1,apple
# Another comment
2,orange

Rows starting with # will be skipped.

It MUST be an array of strings. This property specifies explicit column names to use instead of deriving them from the file.

For example:

{
"dialect": {
"format": "csv",
"headerRows": false,
"columnNames": ["id", "name", "price"]
}
}

For a file without headers:

1,apple,1.50
2,orange,2.00

It MUST be a string in JSON Pointer format (RFC 6901). This property specifies where a data is located in the document.

For example:

{
"dialect": {
"format": "json",
"jsonPointer": "/data/items"
}
}

For a JSON file like:

{
"metadata": { "version": "1.0" },
"data": {
"items": [
{ "id": 1, "name": "apple" },
{ "id": 2, "name": "orange" }
]
}
}

It MUST be one of the following values: array, object. This property specifies whether the data items are arrays or objects.

For example with array of objects:

{
"dialect": {
"format": "json",
"rowType": "object"
}
}

For data like:

[
{ "id": 1, "name": "apple" },
{ "id": 2, "name": "orange" }
]

For example with array of arrays:

{
"dialect": {
"format": "json",
"rowType": "array",
"columnNames": ["id", "name"]
}
}

For data like:

[
[1, "apple"],
[2, "orange"]
]

It MUST be an integer. This property specifies a sheet number of a table in the spreadsheet file. If not provided, a first sheet is used.

For example:

{
"dialect": {
"format": "xlsx",
"sheetNumber": 2
}
}

This reads the second sheet from the spreadsheet.

It MUST be a string. This property specifies a sheet name of a table in the spreadsheet file.

For example:

{
"dialect": {
"format": "xlsx",
"sheetName": "Data Sheet"
}
}

It MUST be a string. This property specifies a name of the table in the database. If not provided, a first table is used (sorted by name in ascending order).

For example:

{
"dialect": {
"format": "sqlite",
"tableName": "measurements"
}
}

Common properties shared by multiple entities in the descriptor.

It MUST be a string representing an HTTP or HTTPS URL to a remote file.

For example:

{
"data": "https://example.com/datasets/measurements.csv"
}

Fairspec Dialect does not support extension.