Parallel Job Compilation:
----------------------------------------------------------------------
DataStage Designer generates all code
Validates link reqirements, mandatrory stage
options, transformer logic, etc.,
Generates OSH representation of data flow and stages
Stages are representations of Framework "operators"
Generates transform code for each Transformer
Compiled into C++ and then to corresponding native operators.
==============================================================
Stage to Operator Mapping:
Sequential File
Source: Import operator
Target: Export operator
DataSet: copy operator
Sort(DataStage): tsort
Aggregator: group operator
Row Generator, Column Generator, Surrogate Key Generator: generator operator
Oracle
Source: oraread operator
Sparse Lookup: oralookup operator
Target Load: orawrite operator
Target Upsert: oraupsert
Lookup File Set
Target lookup - createOnly operator
===============================================================
Generated OSH Primer
Comment blocks introduce each operator
Operator order is determined by the order stages
were added to the canvas
OSH uses the familiar syntax of the UNIX shell
Operator name
Schema
Operator options ("-name value"format)
Input(indicated by n< where n is the input # )
Output ( indicated by n> where n is the output #)
may include modify
For every operator, input and/or output datasets are
numbered sequentially starting from 0. E.g:
op1 0> dst
op1 1> src
Virtual datasets are generated to connect operators.
Note: The actual execution order of operators is dictated by input/output designators, not by placement on the diagram.
The datasets connect the osh operators. These are "virtual datasets", that is, in-memory data flows.
Link names are used in dataset names. So good practice is to name links meaningfully.
============================================================
The Song Jane [Doe, CEO] Likes
4 years ago
No comments:
Post a Comment