Release 5.0, April 2025
Top Level
Direction Object
··· DS Object
··· ··· Source Object
··· ··· Transform Object
··· ··· Destination Object
Parameters Object
Columns Object
Each JSON file consists of multiple tasks, referred as Direction
s — in a sense 'copy data from source location to compute cluster', 'copy results to storage from cluster', and so on. Each Direction
has unique name.
{ "Task 1 Name" : Direction, "Task 2 Name" : Direction, ... }
At each invocation Data Cooker Dist executes only the task, which name is supplied via command line switch.
Each Direction is an Array
of DS (see Data Cooker SQL specification), that are independently copied from Source
to Destination
, with optional chain of Transform
ations in between.
[ DS, DS, ... ]
DS
Object has mandatory properties of "name"
, "source"
, and "dest"
.
Optionally, a property of "transform"
(array of Transform
) can be specified.
{ "name" : "DS Name", "source" : Source, "transform" : [ Transform, Transform, ... ], "dest" : Destination }
DS "name"
may be used by Pluggable verbatim, so if it works with some file system, it should adhere to that file system's restrictions and conventions.
Mandatory properties are "path"
(Pluggable-specific) and "adapter"
(distribution-specific).
Default values are: "part_count" = 1
, "part_by" = "HASHCODE"
, "wildcard" = false
.
{ "path" : String, "adapter" : "Input Storage Adapter Name", "params" : Parameters, "part_count" : Number, "part_by" : "HASHCODE" | "SOURCE" | "RANDOM", "wildcard" : Boolean, "columns" : Columns }
Mandatory property is "adapter"
.
{ "adapter" : "Transform-type Pluggable Name", "params" : Parameters, "columns" : Columns }
Mandatory properties are "path"
and "adapter"
.
{ "path" : String, "adapter" : "Output Storage Adapter Name", "params" : Parameters, "columns" : Columns }
A hash map of Parameters of a Pluggable. Each Pluggable has its own set of Parameters, or none at all. Parameter names are unique, and values may be Number
s, String
s, Boolean
s, and even Array
s. They are documented in distro docs.
{ "Parameter 1 Name" : Any, "Parameter 2 Name" : Any, ... }
This object is optional for some Pluggables (in that case 'copy all available columns' semantics is implied), and mandatory for others, reflected in the distro docs.
Columns
are organized into Levels, each Level being an Array
of column names. See SQL specification for supported Level names.
{ "Level 1 Name" : [ "Column 1 Name", "Column 2 Name", ... ], "Level 2 Name" : [ "Column 1 Name", "Column 2 Name", ... ], ... }