Datashape is a data layout language for array programming. It is designed to describe in-situ structured data without requiring transformation into a canonical form.

Similar to NumPy, datashape includes `shape` and `dtype`, but combined
together in the type system.

Single named types in datashape are called `unit` types. They represent
either a dtype like `int32` or `datetime`, or a single dimension
like `var`. Dimensions and a single dtype are composed together in
a datashape type.

DataShape includes a variety of dtypes corresponding to C/C++ types, similar to NumPy.

Bit type | Description |
---|---|

bool | Boolean (True or False) stored as a byte |

int8 | Byte (-128 to 127) |

int16 | Two’s Complement Integer (-32768 to 32767) |

int32 | Two’s Complement Integer (-2147483648 to 2147483647) |

int64 | Two’s Complement Integer (-9223372036854775808 to 9223372036854775807) |

uint8 | Unsigned integer (0 to 255) |

uint16 | Unsigned integer (0 to 65535) |

uint32 | Unsigned integer (0 to 4294967295) |

uint64 | Unsigned integer (0 to 18446744073709551615) |

float16 | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |

float32 | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |

float64 | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |

complex[float32] | Complex number, represented by two 32-bit floats (real and imaginary components) |

complex[float64] | omplex number, represented by two 64-bit floats (real and imaginary components) |

Additionally, there are types which are not fully specified at the bit/byte level.

Bit type | Description |
---|---|

string | Variable length Unicode string. |

bytes | Variable length arrays of bytes. |

json | Variable length Unicode string which contains JSON. |

date | Dates in the proleptic Gregorian calendar. |

time | Times not attached to a date. |

datetime | Points in time, combination of date and time. |

units | Associates physical units with numerical values. |

Many python types can be mapped to datashape types:

Python type | Datashape |
---|---|

int | int32 |

bool | bool |

float | float64 |

complex | complex[float64] |

str | string |

unicode | string |

datetime.date | date |

datetime.time | time |

datetime.datetime | datetime or datetime[tz=’<timezone>’] |

datetime.timedelta | units[‘microsecond’, int64] |

bytes | bytes |

bytearray | bytes |

buffer | bytes |

To Blaze, all strings are sequences of unicode code points, following
in the footsteps of Python 3. The default Blaze string atom, simply
called “string”, is a variable-length string which can contain any
unicode values. There is also a fixed-size variant compatible with
NumPy’s strings, like `string[16, "ascii"]`.

An asterisk (*) between two types signifies an array. A datashape
consists of 0 or more `dimensions` followed by a `dtype`.

For example, an integer array of size three is:

```
3 * int
```

In this type, 3 is is a `fixed` dimension, which means it is a dimension
whose size is always as given. Other dimension types include `strided`
and `var`.

Comparing with NumPy, the array created by
`np.empty((2, 3), 'int32')` has datashape `2 * 3 * int32`.

Record types are ordered struct dtypes which hold a collection of types keyed by labels. Records look similar to Python dictionaries but the order the names appear is important.

Example 1:

```
{
name : string,
age : int,
height : int,
weight : int
}
```

Example 2:

```
{
r: int8,
g: int8,
b: int8,
a: int8
}
```

Records are themselves types declaration so they can be nested, but cannot be self-referential:

Example 2:

```
{
a: { x: int, y: int },
b: { x: int, z: int }
}
```

While datashape is a very general type system, there are a number of patterns a datashape might fit in.

Tabular datashapes have just one dimension, typically `fixed` or
`var`, followed by a record containing only simple types, not
nested records. This can be intuitively thought of as data which
will fit in a SQL table.:

```
var * { x : int, y : real, z : date }
```

Homogenous datashapes are arrays that have a simple dtype, the kind of data typically used in numeric computations. For example, a 3D velocity field might look like:

```
100 * 100 * 100 * 3 * real
```

Type variables are a separate class of types that express free variables scoped within type signatures. Holding type variables as first order terms in the signatures encodes the fact that a term can be used in many concrete contexts with different concrete types.

For example the type capable of expressing all square two dimensional
matrices could be written as a datashape with type variable `A`,
constraining the two dimensions to be the same:

```
A * A * int32
```

A type capable of rectangular variable length arrays of integers can be written as two free type vars:

```
A * B * int32
```

An option type represents data which may be there or not. This is like
data with `NA` values in R, or nullable columns in SQL.

For example a optional int field:

```
option[int]
```

Indicates the presense or absense of a integer. For example a
`5 * option[int]` array can model the Python data:

```
[1, 2, 3, None, None, 4]
```