Published on 11/10/2014 – Text version

TOML v0.3.0

Tom's Obvious, Minimal Language.

By Tom Preston-Werner.

Be warned, this spec is still changing a lot. Until it's marked as 1.0, you should assume that it is unstable and act accordingly.

Objectives

TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages.

Spec

TOML is case sensitive.
Whitespace means tab (0x09) or space (0x20).

Comment

Speak your mind with the hash symbol. They go from the symbol to the end of the line.

# I am a comment. Hear me roar. Roar.
key = "value" # Yeah, you can do this.

String

There are four ways to express strings: basic, multi-line basic, literal, and multi-line literal. All strings must contain only valid UTF-8 characters.

Basic strings are surrounded by quotation marks. Any Unicode character may be used except those that must be escaped: quotation mark, backslash, and the control characters (U+0000 to U+001F).

"I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF."

For convenience, some popular characters have a compact escape sequence.

\b         - backspace       (U+0008)
\t         - tab             (U+0009)
\n         - linefeed        (U+000A)
\f         - form feed       (U+000C)
\r         - carriage return (U+000D)
\"         - quote           (U+0022)
\/         - slash           (U+002F)
\\         - backslash       (U+005C)
\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

Any Unicode character may be escaped with the \uXXXX or \UXXXXXXXX forms. Note that the escape codes must be valid Unicode code points.

Other special characters are reserved and, if used, TOML should produce an error.

ProTip™: You may notice that the above string specification is the same as JSON's string definition, except that TOML requires UTF-8 encoding. This is on purpose.

Sometimes you need to express passages of text (e.g. translation files) or would like to break up a very long string into multiple lines. TOML makes this easy. Multi-line basic strings are surrounded by three quotation marks on each side and allow newlines. If the first character after the opening delimiter is a newline (0x0A), then it is trimmed. All other whitespace remains intact.

# The following strings are byte-for-byte equivalent:
key1 = "One\nTwo"
key2 = """One\nTwo"""
key3 = """
One
Two"""

For writing long strings without introducing extraneous whitespace, end a line with a \. The \ will be trimmed along with all whitespace (including newlines) up to the next non-whitespace character or closing delimiter. If the first two characters after the opening delimiter are a backslash and a newline (0x5C0A), then they will both be trimmed along with all whitespace (including newlines) up to the next non-whitespace character or closing delimiter. All of the escape sequences that are valid for basic strings are also valid for multi-line basic strings.

# The following strings are byte-for-byte equivalent:
key1 = "The quick brown fox jumps over the lazy dog."

key2 = """
The quick brown \


  fox jumps over \
    the lazy dog."""

key3 = """\
       The quick brown \
       fox jumps over \
       the lazy dog.\
       """

Any Unicode character may be used except those that must be escaped: backslash and the control characters (U+0000 to U+001F). Quotation marks need not be escaped unless their presence would create a premature closing delimiter.

If you're a frequent specifier of Windows paths or regular expressions, then having to escape backslashes quickly becomes tedious and error prone. To help, TOML supports literal strings where there is no escaping allowed at all. Literal strings are surrounded by single quotes. Like basic strings, they must appear on a single line:

# What you see is what you get.
winpath  = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted   = 'Tom "Dubs" Preston-Werner'
regex    = '<\i\c*\s*>'

Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem. Multi-line literal strings are surrounded by three single quotes on each side and allow newlines. Like literal strings, there is no escaping whatsoever. If the first character after the opening delimiter is a newline (0x0A), then it is trimmed. All other content between the delimiters is interpreted as-is without modification.

regex2 = '''I [dw]on't need \d{2} apples'''
lines  = '''
The first newline is
trimmed in raw strings.
   All other whitespace
   is preserved.
'''

For binary data it is recommended that you use Base64 or another suitable ASCII or UTF-8 encoding. The handling of that encoding will be application specific.

Integer

Integers are whole numbers. Positive numbers may be prefixed with a plus sign. Negative numbers are prefixed with a minus sign.

Leading zeros are not allowed. Hex, octal, and binary forms are not allowed. Values such as "infinity" and "not a number" that cannot be expressed as a series of digits are not allowed.

64 bit (signed long) range expected (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).

Float

A float consists of an integer part (which may be prefixed with a plus or minus sign) followed by a fractional part and/or an exponent part. If both a fractional part and exponent part are present, the fractional part must precede the exponent part.

# fractional
+1.0
3.1415
-0.01

# exponent
5e+22
1e6
-2E-2

# both
6.626e-34

A fractional part is a decimal point followed by one or more digits.

An exponent part is an E (upper or lower case) followed by an integer part (which may be prefixed with a plus or minus sign).

64-bit (double) precision expected.

Boolean

Booleans are just the tokens you're used to. Always lowercase.

true
false

Datetime

Datetimes are RFC 3339 dates.

1979-05-27T07:32:00Z
1979-05-27T00:32:00-0700
1979-05-27T00:32:00.999999-0700

Array

Arrays are square brackets with other primitives inside. Whitespace is ignored. Elements are separated by commas. Data types may not be mixed.

[ 1, 2, 3 ]
[ "red", "yellow", "green" ]
[ [ 1, 2 ], [3, 4, 5] ]
[ [ 1, 2 ], ["a", "b", "c"] ] # this is ok
[ 1, 2.0 ] # note: this is NOT ok

Arrays can also be multiline. So in addition to ignoring whitespace, arrays also ignore newlines between the brackets. Terminating commas are ok before the closing bracket.

key = [
  1, 2, 3
]

key = [
  1,
  2, # this is ok
]

Table

Tables (also known as hash tables or dictionaries) are collections of key/value pairs. They appear in square brackets on a line by themselves. You can tell them apart from arrays because arrays are only ever values.

[table]

Under that, and until the next table or EOF are the key/values of that table. Keys are on the left of the equals sign and values are on the right. Keys start with the first character that isn't whitespace or [ and end with the last non-whitespace character before the equals sign. Keys cannot contain a # character. Key/value pairs within tables are not guaranteed to be in any specific order.

[table]
key = "value"

You can indent keys and their values as much as you like. Tabs or spaces. Knock yourself out. Why, you ask? Because you can have nested tables. Snap.

Nested tables are denoted by table names with dots in them. Name your tables whatever crap you please, just don't use #, ., [ or ].

[dog.tater]
type = "pug"

In JSON land, that would give you the following structure:

{ "dog": { "tater": { "type": "pug" } } }

You don't need to specify all the super-tables if you don't want to. TOML knows how to do it for you.

# [x] you
# [x.y] don't
# [x.y.z] need these
[x.y.z.w] # for this to work

Empty tables are allowed and simply have no key/value pairs within them.

As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it.

[a.b]
c = 1

[a]
d = 2

You cannot define any key or table more than once. Doing so is invalid.

# DO NOT DO THIS

[a]
b = 1

[a]
c = 2

# DO NOT DO THIS EITHER

[a]
b = 1

[a.b]
c = 2

All table names and keys must be non-empty.

# NOT VALID TOML
[]
[a.]
[a..b]
[.b]
[.]
 = "no key name" # not allowed

Array of Tables

The last type that has not yet been expressed is an array of tables. These can be expressed by using a table name in double brackets. Each table with the same double bracketed name will be an element in the array. The tables are inserted in the order encountered. A double bracketed table without any key/value pairs will be considered an empty table.

[[products]]
name = "Hammer"
sku = 738594937

[[products]]

[[products]]
name = "Nail"
sku = 284758393
color = "gray"

In JSON land, that would give you the following structure.

{
  "products": [
    { "name": "Hammer", "sku": 738594937 },
    { },
    { "name": "Nail", "sku": 284758393, "color": "gray" }
  ]
}

You can create nested arrays of tables as well. Just use the same double bracket syntax on sub-tables. Each double-bracketed sub-table will belong to the most recently defined table element above it.

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

The above TOML maps to the following JSON.

{
  "fruit": [
    {
      "name": "apple",
      "physical": {
        "color": "red",
        "shape": "round"
      },
      "variety": [
        { "name": "red delicious" },
        { "name": "granny smith" }
      ]
    },
    {
      "name": "banana",
      "variety": [
        { "name": "plantain" }
      ]
    }
  ]
}

Attempting to define a normal table with the same name as an already established array must produce an error at parse time.

# INVALID TOML DOC
[[fruit]]
  name = "apple"

  [[fruit.variety]]
    name = "red delicious"

  # This table conflicts with the previous table
  [fruit.variety]
    name = "granny smith"