Data Cooker JSON

Release 5.0, April 2025

Table of Contents

Top Level
Direction Object
··· DS Object
··· ··· Source Object
··· ··· Transform Object
··· ··· Destination Object
Parameters Object
Columns Object

Top Level

Each JSON file consists of multiple tasks, referred as Directions — in a sense 'copy data from source location to compute cluster', 'copy results to storage from cluster', and so on. Each Direction has unique name.

{
    "Task 1 Name" : Direction,
    "Task 2 Name" : Direction,
    ...
}

At each invocation Data Cooker Dist executes only the task, which name is supplied via command line switch.

Direction Object

Each Direction is an Array of DS (see Data Cooker SQL specification), that are independently copied from Source to Destination, with optional chain of Transformations in between.

[
    DS,
    DS,
    ...
]

DS Object

DS Object has mandatory properties of "name", "source", and "dest".

Optionally, a property of "transform" (array of Transform) can be specified.

{
    "name" : "DS Name",
    "source" : Source,
    "transform" : [ Transform, Transform, ... ],
    "dest" : Destination
}

DS "name" may be used by Pluggable verbatim, so if it works with some file system, it should adhere to that file system's restrictions and conventions.

Source Object

Mandatory properties are "path" (Pluggable-specific) and "adapter" (distribution-specific).

Default values are: "part_count" = 1, "part_by" = "HASHCODE", "wildcard" = false.

{
    "path" : String,
    "adapter" : "Input Storage Adapter Name",
    "params" : Parameters,
    "part_count" : Number,
    "part_by" : "HASHCODE" | "SOURCE" | "RANDOM",
    "wildcard" : Boolean,
    "columns" : Columns
}

Transform Object

Mandatory property is "adapter".

{
    "adapter" : "Transform-type Pluggable Name",
    "params" : Parameters,
    "columns" : Columns
}

Destination Object

Mandatory properties are "path" and "adapter".

{
    "path" : String,
    "adapter" : "Output Storage Adapter Name",
    "params" : Parameters,
    "columns" : Columns
}

Parameters Object

A hash map of Parameters of a Pluggable. Each Pluggable has its own set of Parameters, or none at all. Parameter names are unique, and values may be Numbers, Strings, Booleans, and even Arrays. They are documented in distro docs.

{
    "Parameter 1 Name" : Any,
    "Parameter 2 Name" : Any,
    ...
}

Columns Object

This object is optional for some Pluggables (in that case 'copy all available columns' semantics is implied), and mandatory for others, reflected in the distro docs.

Columns are organized into Levels, each Level being an Array of column names. See SQL specification for supported Level names.

{
    "Level 1 Name" : [ "Column 1 Name", "Column 2 Name", ... ],
    "Level 2 Name" : [ "Column 1 Name", "Column 2 Name", ... ],
    ...
}