Saturday, December 5, 2009

DataStage Parallel Job Compilation & OSH Script...

Parallel Job Compilation:
----------------------------------------------------------------------
DataStage Designer generates all code
Validates link reqirements, mandatrory stage
options, transformer logic, etc.,

Generates OSH representation of data flow and stages
Stages are representations of Framework "operators"

Generates transform code for each Transformer
Compiled into C++ and then to corresponding native operators.
==============================================================

Stage to Operator Mapping:

Sequential File
Source: Import operator
Target: Export operator

DataSet: copy operator

Sort(DataStage): tsort

Aggregator: group operator

Row Generator, Column Generator, Surrogate Key Generator: generator operator

Oracle
Source: oraread operator
Sparse Lookup: oralookup operator
Target Load: orawrite operator
Target Upsert: oraupsert

Lookup File Set
Target lookup - createOnly operator
===============================================================

Generated OSH Primer

Comment blocks introduce each operator
Operator order is determined by the order stages
were added to the canvas

OSH uses the familiar syntax of the UNIX shell
Operator name
Schema
Operator options ("-name value"format)
Input(indicated by n< where n is the input # )
Output ( indicated by n> where n is the output #)
may include modify
For every operator, input and/or output datasets are
numbered sequentially starting from 0. E.g:
op1 0> dst
op1 1> src
Virtual datasets are generated to connect operators.

Note: The actual execution order of operators is dictated by input/output designators, not by placement on the diagram.
The datasets connect the osh operators. These are "virtual datasets", that is, in-memory data flows.
Link names are used in dataset names. So good practice is to name links meaningfully.
============================================================

No comments: