SURENDRA PULAGAM: January 2010

Monday, January 25, 2010

WebSphere QualityStage

Replacement stages to select the WebSphere DataStage stage to substitute for the QualityStage 7 stage.

Sunday, January 24, 2010

Datastage Runtime Column propagation

Enable the RCP feature, stages in parallel jobs can handle undefined
columns that they encounter when the job is run, and propagate these
columns through to the rest of the job.

Enable Runtime Column Propagation for Parallel Jobs
If you enable this feature, stages in parallel jobs can handle undefined columns that they encounter when the job is run, and propagate these columns through to the rest of the job. This check box enables the feature, to actually use it you need to explicitly select the option on each stage

If runtime column propagation is enabled in the DataStage Administrator, you can select the Runtime column propagation to specify that columns encountered by a stage in a parallel job can be used even if they are not explicitly defined in the meta data. You should always ensure that runtime column propagation is turned on if you want to use schema files to define column meta data.

There are some special considerations when using runtime column propagation with certain stage types:

# Sequential File
# File Set
# External Source
# External Target

RCP Set at DataStage Adminstrator:

RCP Set at DataStage Stage Output:

Merge Stage: Ensure required column meta data has been specified (this may be done in another stage, or may be omitted altogether if you are relying on Runtime Column Propagation).

Lookup Stage: Ensure required column meta data has been specified (this may be omitted altogether if you are relying on Runtime Column Propagation).

Shared Container: When inserted in a job, a shared container instance already has meta data defined for its various links. This meta data must match that on the link that the job uses to connect to the container exactly in all properties. The Inputs page enables you to map meta data as required. The only exception to this is where you are using runtime column propagation (RCP) with a parallel shared container. If RCP is enabled for the job, and specifically for the stage whose output connects to the shared container input, then meta data will be propagated at run time, so there is no need to map it at design time.

For parallel shared containers, you can take advantage of runtime column propagation to avoid the need to map the meta data. If you enable runtime column propagation, then, when the jobs runs, meta data will be automatically propagated across the boundary between the shared container and the stage(s) to which it connects in the job.

Saturday, January 16, 2010

Data Warehouse Bus Architecture

Friday, January 15, 2010

ClearCase Commands

ClearCase Commands
-----------------------------------------------------------------

IBM Rational ClearCase is a powerful, elegant version control system.

Since it's heritage is Unix, the manual is ... obscure. These battle tested commands should get you up and running...

Before you dive in take note!

'cl' is the accepted shortform for 'ClearCase' / 'Cleartool' when entering package commands.

A period in a sample command line is not punctuation, it's meant to be typed!

Bon appetit.

Basics
01 To Get Help
02 To enter cleartool.
03 To exit cleartool.
04 To navigate inside cleartool.
05 To enter graphics mode (slow but gorgeous).

File Checkouts and Checkins
06 To Check out a file.
07 To cancel a checkout.
08 To check out a previous version of a file.
09 To find what you have checked out.
10 To find out what other files are checked out.
11 To Check in a file.
12 To checkin or checkout multiple files.

Labels
13 To list existing labels.
14 To observe the naming conventions for branches and labels.
15 To Make a label.
16 To label a version of a file.
17 To apply a label to the entire source tree.
18 To move an existing label down to the latest elements.
19 To see if a label is on the latest version of a file or on a branch.

File and Version Information
20 To change the comment for a revision.
21 To change the content description for a file.
22 To list the history (versions with comments) of a file.
23 To find out what files have been worked on since ....
24 To find out the file_type of a file.
25 To setup cleartool to send email on a file change.

Directory Operations
26 To add a new directory to the source tree.
27 To rename a directory in the /gl/vobs/netman/src tree.
28 To rename a vob directory or element.

File Operations : Add Move Rename
29 To Add a new file to an existing directory.
30 To use symbolic links avoid duplicating an archive file in different directories.
31 To move an atria file from one directory to another.
32 To make a view private copy of a file.
33 To remove 'rm' a Cleartool archive file.
34 To look at a file that has been Cleartool 'rm'd'.

Branches
35 To List existing branches.
36 To Make a branch.
37 To merge a branch back to the mainline.

Releases
38 To build a release.
39 To obtain a directory listing of the source tree for a release.
40 To obtain a list of file changes since the last release.

Administrator
41 To remove (destroy) the latest version(s) of a file.
42 To look at your configspec.
43 To change your configspec.

ClearCase Commands : Basics

01. To get help
DOS bigots will appreciate the intuitive nature of the Unix equivalent of 'Help': 'man' (aka Manual Pages)
To get help on the entire command set enter:
cleartool man
To get help on a specific command enter:
cleartool man command
(eg: cleartool man co for help on the checkout command)
________________________________________

02. To enter Cleartool
cl
The prompt will change to cleartool.
________________________________________

03. To exit Cleartool.
exit
or
quit
________________________________________

04. To navigate inside Cleartool
Use the standard cd or cd .. commands to navigate the directory tree.

Warning! Be aware that if you exit from Cleartool back to the shell after having moved around in the directory tree within Cleartool, that you will NOT be returned to the directory that you were in when you entered Cleartool from the shell.
The pwd (present working directory) command can help you keep track of where you are.
________________________________________
05. To enter graphics mode (slow but gorgeous).
cl xlsvtree filename
or
xcleartool
The xlsvtree graphics viewer is a powerful tool for viewing the version tree of a file. It is very useful for visual diffing of one version of the file against another, labelling versions or moving labels around.

ClearCase Commands : File Checkin Checkout
06. To check out a file.
cleartool checkout file.ext
or
cl co file.ext
cl co -c "comment" file.ext
End comment with: .or
Cleartool will prompt for a comment if you do not use the third format to supply one.
Although you may not be used to providing a description of the changes at checkout time ot is good practice to enter the reason you are checking out the file.
Other team members can then see who has the file out and why.
When you check the file back in this text will be presented as the default ci change description, which you can override. If you are forethoughtful about the comment most often it will not need to be re-typed when you do the checkin.
The file checked out stays in the same directory but the permission changes from read only to read/write.
________________________________________

07. To cancel a checkout.
cl unco file.ext
________________________________________

08. To check out a previous version of a file.
This should be:
cl co file.ext@@/main/version#>
and it is a defect in the tool that this command is not implemented.
Instead, you will have to do a tap dance to get the version you want in your view:
#1. cl co file.ext
#2. cl exit to return to the shell
#3. pwd to make sure you are in the right directory at the shell level. If not, navigate to the right directory.
#4. cp file.ext/main/version# tmpfile.ext
You can then change tmpfile as required, cp/copy it to the atria directory with the correct name, enter cleartool again and check the file in. Exit cleartool to the shell to rm/delete the temporary file. Using a tmp_name will help ensure that you don't screw up and delete the atria archive file by mistake!
________________________________________

09. To find what you have checked out.
cl lsco -me -all
Remember to do this at the end of your session to make sure you have not forgotten to check back in files and directories.
Other team members cannot access the files for update until you have checked them back in, and will have to waste time diffing and merging your changes with theirs if they are forced to use private copies.
________________________________________

10. To find out what other files are checked out.
cl lsco -all
________________________________________

11. To check in a file.
cleartool checkin file.ext
or
cl ci -c "comment" file.ext
Cleartool will display the comment you entered at co time as the default. You may override this if you want to.
End comment with . or -d
Do NOT use -nc which means no comment.
Comment your versions!!!!
________________________________________

12. To checkin or checkout multiple files.
To check in, or check out multiple files at once with the same comment:
cl ci -c "comment" *
cl co -c "comment" *

ClearCase Commands : Labels
13. To list existing labels.
cl lstype -lbtype
________________________________________

14. To observe the naming conventions for branches and labels.
branch names: all lower case eg: forwarding, ip, gap
LABELS : ALL UPPER CASE eg: BETA_2.5, PRE_GAP_MERGE
________________________________________

15. To Make a label.
cl mklbtype BETA_1.1B
Note: The convention for labels is as above: all caps.
________________________________________

16. To label a version of a file.
cl mklabel -c "comment" -rec x.
Warning! Do not confuse these foolishly named operations:
mklbtype makes (creates) a new label.
whereas
mklabel ATTACHES the label made by mklbtype to a specific file.
________________________________________

17. To apply a label to the entire source tree.
This is typically done for alpha, beta, final releases so that the entire set of files with the label can be built.
#1 Use 'catcs/edcs' (Tip 43) to set up your configspec with
element * MyLabel
element * /main/LATEST
#2 mklbtype MyLabel and enter appropriate comments.
#3 Be in the top level (/src) directory.
To apply the label throughout the tree:
mklabel -recurse -c "X Release" MyLabel .
Wait a long time.
________________________________________

18. To move an existing label down to the latest elements.
mklabel -recurse -replace MY_LABEL
will move existing label MY_LABEL down to the latest revision of each file in the directory, and below, from which you invoke this command.
'replace' is a poorly named option: it really means 'move-down'.
________________________________________

19. To see if a label is on the latest version of a file or on a branch.
cl find -version 'lbtype(LABELNAME) && version(/main/LATEST)' -print
or
cl find . -version 'version(/main/LATEST) && ! version(/main/GAP)' -print
You can do lots of mix and match to find what you want.

ClearCase Commands : File and Version Information
20. To change the comment for a revision.
Many revision control tools won't let you change a comment once committed, on grounds of theology (Tichy didn't permit it), or control freakieness (typically MIS installations).
Cleartool is realistic, and accomodating, so you have a chance to correct and revise comments as needed.
chevent -c "comment" -rep file.name@@/main/version#
will replace an existing comment for revision n on the main trunk.
chevent -c "comment" -rep *
will replace the existing comments on the latest revisions of all files in the current directory.
________________________________________

21. To change the content description for a file.
When you set up a file initially on atria with -mkelem, you can use the -c "comment" qualifier to record a description of the contents of the file.
To change this subsequently:
chevent -c "content description" -rep file.ext@@/main/version#
________________________________________

22. To list the history (versions with comments) of a file.
cl lshist -all -since 08:00 file.ext
will list all changes made to the file since the time specified.
________________________________________

23. To Find out what files have been worked on since a given date.
It is sometimes very useful to get a list of all the changes made to the source tree since a given date.
cl lshist -all -since 31-Jul-96.23:59

________________________________________

24. To find out the file_type of a file.
This is useful to confirm that a file is stored by Cleartool in an appropriate format.
Element type : Use with
text_file : Flat ascii text file
file : Any binary file that you don't mind Cleartool compressing for storage.
compressed_file : Binary file that's already compressed and you want Cleartool to store as-is.
To find this out enter:
describe file.ext
or
describe *
________________________________________

25. To setup cleartool to send email on a file change.
You may want to cause e-mail to be sent to email handle 'supercon' when a particular file has been updated.
This is particularly useful, for example, if you have patched the source in a commercial library and you want to be sure that when the library is upgraded your changes do not get overwritten, as the newest version of the library may or may not have corrected the error your patches peviously corrected.
The e-mail notification acts as a helpful reminder to re-test for the bug which was encountered in the earlier version, and re-update and re-build the commercial library if the library vendor did not fix the problem.
How to do it:
#1: Get VOB access.
#2: Create an e-mail notification script in unix format and add it to atria. For example:

Courtesy aric kollmyer:#!/bin/sh
mailx -s "Cleartool warning: Reminder on zLib update" supercon
ENDMAIL New Version of dialog.cpp
create: $CLEARCASE_PN by:
$CLEARCASE_USER Comment string:
-------------
Someone has really done it now, they installed a new version of zLib!.
This may cause problems as I made the following patches to :
dialog.cpp in the zDialogFrame::create() method to correct .....
So, you had better re-test with the new zApp libraries linked in and
verify that trhe new library fixes this problem
-Me Self (555) 123-4567 Quincy, CA meself@hotmail.com
ENDMAIL
#3: Register the script file as a trigger.
cl mktrtype -nc -element -postop checkout -exec

Example:

cl mktrtype -nc -element -postop checkout -exec
/gl/vobs/netman/src/zapp/source/glenayre.sh glenTrigger
#4: Attach the trigger to an existing element:

cl mktrigger -nattach
Example:

cl mktrigger -nattach glenTrigger dialog.cpp
The examples shown will send e-mail document to supercon when dialog.cpp is checked-out.

ClearCase : Directory Operations

26. To add a new directory to the source tree.
Again a multi part operation as, to achieve transparency, directories are themselves vob (versioned) objects.
#1 First checkout the directory under which you want to create the new sub-directory.
cl co -c "Add directory xxx" .
The period indicates that you are checking out a directory.
#2 mkdir xxx
#3 Give the new directory privilege level 775
cl protect -chmod 775 .
#4 To verify the privilege level is 775
ls -adl .
The directory attribute string should be: d rwr rwr r-x (7 7 5)
These last two steps will ensure that someone else can later manipulate the directory without having to come back to you to beg for the privilege!
________________________________________

27. To rename a directory in the /gl/vobs/netman/src tree.
Eg: /src/ui/thunk_ar
#1 Change to the directory you want to rename eg: /ui/thunk_ar and check it out:
cl co .
#2 Checkout the directory above the directory you want to rename : eg /ui
cl cd..
cl co .
#3 Rename the directory by mv'ing it:
cl mv thunk_ar thunk_31
#4 Check in the directories:
cl ci .
cl cd thunk_31
cl ci .

________________________________________

28. To rename a vob directory or element.
Checkout the directory that contains the directory or element to be renamed.
Do
cl mv oldname newname
for the directory or element. Check it in.

ClearCase Commands : File Add Move Rename
29. To Add a new file to an existing directory.
This is multi part operation as to achieve transparency, cleartool directories are themselves versioned. So ...
#1 First, you have to checkout the directory, because adding a file will change the directory. Enter a checkout comment which can also serve as the check in comment later.
cl co -c "Add file xxx" .
The period indicates that you are checking out a directory.
#2 Make an (empty) file 'element'
cl mkelem -c "file description" file.ext
NOTE! This assumes that the new element has an extension that is known to Cleartool. If not, and you get a gobbledegook message 'Cant pick element type from...' then you must specify an element type eg:
cl mkelem -c "Config stuff" -eltype text_file cb.cfg
The file types are:

text_file : Flat ascii text file
file : Any binary file that you don't mind Cleartool compressing for storage.
compressed_file : Binary file that's already compressed and you want Cleartool to store as-is.
#3 Copy the source code to the new file.
#4 Checkin the file. This will be the first revision.
cl ci -c "comment" file.ext
#5 Check the directory back in.
cl ci .

The default comment you supplied in #1 above sbould be ok.
#6 You should check that you have set up the file with an apppropriate storage type. See: Tip 24 'To find out the file type of a File'.
________________________________________

30. To use symbolic links avoid duplicating an archive file in different directories.
Suppose we have directory A with archived file.ext eg: /../VOBS/.../A/file.ext and we want to have the same file in Directory B.
Rather than duplicate the file, we can place a link in directory /../VOBS/.../B to point to the 'real' archive file in directory A.
To setup the link:

Change to the B directory

Checkout the B directory: cl co -c "Link to file A" .

Change to the A directory

Enter: cl ln -slink /../vobs/.../A/file.ext /../vobs/.../B

Change to the B directory and do a ls -al to inspect the links.

________________________________________

31. To move an atria file from one directory to another.
Check out the source directory :
co .
Check out the target directory :
co .
In the source directory type :
mv file.ext /gl/vobs/netman/src/etc/etc/newfile.ext
________________________________________

32. To make a view private copy of a file.
co -unreserved
________________________________________

33. To Cleartool 'rm' an atria archive file.
First enter the ClearCase tool:
cl
Once INSIDE the ClearCase tool
rm filename
This will remove the archive from the view, but it is still accessible and can be viewed or retrieved if you need it again at some later time.
Warning! If you use rm filename outside Cleartool ie at the shell level the file is GONE!
________________________________________

34. To look at a file that has been Cleartool 'rm'd'.
more .@@/main/5/file.ext@@/main/LATEST
Replace '5' with the last version of the directory that contained the file.
Instead of more you can use less - see man less for info.

ClearCase Commands : Branches

35. To list existing branches.
cl lstype -brtype
________________________________________

36. To Make a branch.
#1 cl mkbrtype Eagle
and enter appropriate comments (doesn't matter where in the tree you are).
#2 mklbtype Eaglebase and enter appropriate comments.
#3 Be in the top level (/src) directory.
mklabel -recurse Eaglebase .
to apply the label throughout the tree.
Wait a long time.
#4 Edit your configspec to use the branch.
Note that if you use the graphical (xlsvtree) viewer to look at a file at this point, the new branch will not show up. No, you didn't do anything wrong. The new branch will not be Created until you check the file out.
________________________________________

37. To merge a branch back to the mainline.
Finally, a chance to screw up bigtime!
#1 Use 'catcs' (Tip 23 : Look at your configspec) to ensure that your view is currently set to the main line.
#2 Change the directory to the root of the source tree.
#3 Label the entire source tree PRE_WHATEVER_MERGE so that if you screw up big time it'll be easier for people to check out files bsu.
#4 cl findmerge . -fver /main/BRANCH_NAME -merge -xmerge
This command will, for all the files on the branch:

present side-by-side Mainline-Branch-FinalMerged windows for all non-trivial merges back to the mainline.

Check out the file, and put the contents of the FinalMerged window in it.

Create a matching file.contrib file.
We're not going to try and verify the merge here and now. Click 'y' to accept the generated merge files, until all the files are processed.
#5 In the root directory you will find a FILExxxx log, listing all the files merged. Print it.
#6 Using the log printout as a control:

Use xlsvtree or something equivalent to diff the checked out file against the predecessor file on the main-line.

Verify the merged file is the way you want it and edit it if not.

Build the app to verify the changes are (syntactically) valid.

Regression testing the app would be nice.

Checkin the verified files to the mainline.

- rm the file.contrib files. Try not to rm the file object by mistake.
#7 Note! There is no such thing as a valid merged .dlg file!
Bring up the latest mainline and latest branch .dlg files in a resource editor and hand tool the changes!

ClearCase Commands : Releases
38. To build a release.
#1 Use catcs / edcs to set up your configspec with
element * LabelToBuildFrom
#2 IMPORTANT! Make sure that you have commented out with the '#' symbol:
#element * /main/LATEST
Failure to do this will mean that later file versions updated AFTER the labelled versions will get into the build!
________________________________________

39. To obtain a directory listing of the source tree for a release.
#1 Use Tip 13 command to get a list of existing labels.
Write down the label for the release you want eg 'RELEASE_2.02'.
Get it right!
#2 Use Tip 43 to add this line to your configspec:
element * RELEASE_2.02
before the existing:
element /gl/vobs/netman/src/docs/.../main/LATEST
element * main/LATEST
Save the configuration, exit, and reply yes to confirm the configspec change.
#3 Open /install/bmp/splash.bmp or title.bmp in PaintBrush to confirm that the configspec change has worked and you are accessing the source version you require.
#4 Do a:
dir s: /s > release.xxx
to do a recursive directory listing of the s: source drive using the view set by the new configspec, and pipe the result to a file.
#5 Don't forget to change your configspec back to normal!
________________________________________

40. To obtain a list of file changes since the last release.
This is useful when you want to do a code review prior to a new release.
In the directory at the top of the src tree:
find . -version 'version(/main/LATEST)
&& !lbtype(RELEASE_3.0)' -exec 'echo $CLEARCASE_PN >> pooh.bar'
This will recursively examine the source tree and output to pooh.bar a list of the files with head versions with no label ie the changes since the last label was applied to all the files.

ClearCase Commands : Administrator

41. To remove (destroy) the latest version(s) of a file.
Warning! Don't do this without being sure you know what you are doing!
This command will chop off the latest version from a file, including versions above it if the version is not the latest one!
rmver file.exe@@/main/version#
________________________________________

42. To look at your configspec.
catcs
This will display the directives that Cleartool is currently using to filter files into your logical 's:' drive. eg:
# Setup to access the main branch
element * /main/LATEST

#Setup to build an entire labelled version
#and ONLY files labelled this way
#element */main/LATEST

must be commented out
#or files added/changed later than this will get included in the build
#element *RELEASE_3.01

# Setup to access the tigger branch
# element * /main/tigger/LATEST
# element * /main/tiggerBase -mkbranch tigger
# element * /main/LATEST -mkbranch tigger
________________________________________

43. To change your configspec.
edcs

Rational Clear Case

Connect to ClearCase
Before using Rational ClearCase commands, you must use one of the following methods to connect to ClearCase:

In the Rational Developer product, click the ClearCase > Connect to Rational ClearCase menu option
In the Rational Developer product, click the Connect to ClearCase icon

Use the ClearCase preference setting Automatically Connect to ClearCase at Startup to establish a ClearCase connection each time you start the Rational Developer product

Connect to Rational ClearCase
To work online with Rational ClearCase from the Rational Developer product, click ClearCase > Connect to Rational ClearCase or click the Connect to Rational ClearCase icon.

You must select this option to make the other ClearCase menu items available. The availability of other ClearCase menu options then varies according to your current selection in the Rational Developer product.

When this option is active, a check mark appears next to the option name and the icon is indented.

To disconnect from ClearCase, click this option again.

You can also set the ClearCase SCM Adapter Automatically connect to ClearCase at startup preference to establish the ClearCase connection.
Automatically connect to ClearCase on startup
This ClearCase SCM Adapter preference establishes a ClearCase connection each time you start a Rational Developer product.

By default, this option is off.

To check out files, edit them, and check them in:

In the Rational Developer product, select the files to check out, right-click, and click ClearCase > Check Out or click the Check Out icon.
In the Check Out Element(s) dialog box, click OK. The file names appear in the Rational Developer product with a checked-out ClearCase decoration next to them.

Edit and save the files in the Rational Developer product.

In the Rational Developer product, select the files to check in, right-click, and click ClearCase > Check In or click the Check In icon.
In the ClearCase Check In Element(s) dialog box, click OK.
By default, ClearCase does not check in files that have not changed. To change this behavior, click Advanced and select the Checkin even if identical option in the ClearCase - Checkin dialog box.

Rational Clear Case Command Reference

How to become a Master in Modify Stage

Modify Stage Purpose
With a Modify stage you can perform five classes of function, but one major limitation is that Modify stage functions are limited to a single argument - straight away arithmetic and concatenation are ruled out, for example.
The five classes of function are:

•null handling
•date/time manipulation
•date/time interval calculation (limited)
•string trimming and substring
•data type conversion
Th Robinson had another great comment pointing out it is better than the Transformer for null handling:

I used to routinely get bitten by the transformer stage rejecting rows in which some fields were participating in derivations or stage variables and the dang things had NULLS in them.
2. Solving the NULLS problem with IF IsNull() for every single field being evaluated in some way can get overly complex and very messy.
Instead I put a Modify stage before the Transformer, call the stage MhandleNull and handle_null() for all fields being evaluated in the transformer. This simplifies the already cumbersome IF THEN ELSE syntax of the transformer and/or the stage variables.

Modify versus Transformer

I've put together a table that compares the Modify Stage to the Transformer Stage to show the limitations - the Modify Stage is a highly specialist stage while Transformer is an all-rounder:

There are a lot of NOs in the Modify column! It is a specialized stage, it is very fast at converting metadata and unlike the Transformer stage it can use job parameters for just about anything. The last No is very important.

There are many wrong paths to Modify Stage enlightenment and very few beacons to light the way, take a torch

So I am learning how to use the Modify Stage through a test harness - I've built some independent DataStage jobs that can be used to test out Modify Stage syntax without having to recompile or change the job. The hardest part of debugging Modify Stage code is that you don't know which specification had the error and it is time consuming to keep changing the code, compiling and retesting. The test harness lets you isolate and test specifications one at a time.

Here are the three jobs.

Using Job Parameters

A Job Parameter is to DataStage what a Swiss Army Knife is to the camper

The examples shown below use job parameters to test different Modify Stage scenarios, there are three ways to set the value of job parameters:

- In the job properties before a compile: this is where you set the default values that come up when the job is run.

- In the Director using Set Job Parameters: this is a convenient place to change the default values created at design time as DataStage will remember and use these defaults between job runs.

- In the Director when a job is run: the set job properties window pops up when you run a job letting you change values, but these values are reset to default the next time you run the job so it's not a good place to put complex Modify test code while you are debugging.

Courtesy: Vincent McBurney
-----------------------------------------------------------------------------------
Learning the Modify Stage

The Modify stage is a metadata panel beater. It does just one thing: converts columns from one data type to another. It is quite restrictive, you cannot nest functions or use constants as values. Almost all the functions in the Modify stage are also available in the all rounder Transformer stage.
The Transformer is an automatic, the Modify Stage is a manual

Transformers are easy to use, which is one of the reasons why DataStage has been successful as it is the most commonly used stage. When adding derivation / mapping / transformation code into the text entry windows there is a right mouse click menu for finding a list of transformation functions:

It all started a couple of AscentialWorld conferences ago when Ascential DataStage experts started telling people to avoid the parallel transformer if they wanted fast performance in DataStage jobs. If a parallel job is a racing car the transformer stage was a roof rack and luggage.

Historically parallel jobs did not even have a transformer stage.

Back when the parallel engine was a product called Orchestrate there was a Transform function that worked much the same way as other parallel commands. It required a lot of manual coding. When Ascential turned Orchestrate into parallel jobs they took the transformer stage from Server Edition and sat it over the top of the parallel transform function. This is when the trouble began. The transformer stage and the Transform function didn't quite fit together, so each parallel transformer needs a C++ compilation, a kind of automated custom stage build, even in the latest versions. The message is that made it slower then other parallel stages.

The Allrounder Stage
Most of the functions of a transformer can be performed by other stages.
Transformer constraints = filter stage
Transformer metadata conversion = Modify stage
Transformer copy of fields and multiple outputs = Copy stage
Transformer creation of new fields = Column Generator stage
Transformer counters = Surrogate Key Stage

As you can see each of the parallel stages are highly specialised while the transformer is an allrounder. The specialised stages are faster because they do not carry as much overhead.

Ease of Use versus Performance
Let's start with the Modify stage. The only people out there who enjoy using this stage are also experts at crytic crosswords or were part of the team that cracked the enigma code in world war two or are members of opus dei and are looking for the DataStage equivelent of self flagulation. With no friendly right mouse click menus, no red text highlighting syntax errors, not warnings or helpful messages on compile there is simply no easy way to ensure you have correct Modify stage code without compiling and waiting for a (hopefully) helpful error message. Trial and error is the order of the day.

The Filter stage is not much better.

The Column Generator stage is just bizarre and involves navigating your way into the column properties like a trip through the pipe system on Prison Break.

The Transformer is hands down the best stage to write any type of transformation code in the entire DataStage product given the right mouse click menu, the red text marking for syntax errors and the grid layout.

Robustness
The transformer stage is one of the few stages that has a robust reject link for rows that fail transformation. The Modify and Filter stages lack this link so if a piece of data defies logic the job will abort rather then drop a row down a reject link.

Schema Files and Partial Schemas

Schema Files and Partial Schemas
You can also specify the meta data for a stage in a plain text file known as a schema file.This is not stored in the DataStage Repository but you could,for example,keep it in a document management or source code control system, or publish it on an intranet site.
The format of schema files is described in Appendix A of this manual.

Note:If you are using a schema file on an NLS system,the schema file needs to be in UTF-8 format.It is,however,easy to convert text files between two different maps with a DataStage job.Such a job would read data from a text file using a Sequential File stage and specifying the appropriate character set on the NLS Map page. It would write the data to another file using a Sequential File stage, specifying the UTF-8 map on the NLS Map page.Some parallel job stages allow you to use a partial schema. This means that you only need define column definitions for those columns that you are actually going to operate on. Partial schemas are also described below.

Remember that you should turn runtime column propagation on if you intend to use schema files to define column meta data.
Data Types

Schemas are an alternative way for you to specify column definitions for the data used by parallel jobs. By default, most parallel job stages take their meta data from the Columns tab, which contains table definitions, supplemented,where necessary by format information from the Format tab. For some stages, you can specify a property that causes the stage to take its meta data from the specified schema file instead. Some stages also allow you to specify a partial schema. This allows you to describe only those columns that a particular stage is processing and ignore the rest. The schema file is a plain text file, this appendix describes its format.A partial schema has the same format.
Note: If you are using a schema file on an NLS system, the schema file needs to be in UTF-8 format. It is, however, easy to convert text files between two different maps with a DataStage job. Such a job would read data from a text file using a Sequential File stage and specifying the appropriate character set on the NLS Map page. It would write the data to another file using a Sequential File stage, specifying the UTF-8 map on the NLS Map page. Schema Format A schema contains a record (or row) definition. This describes each column (or field) that will be encountered within the record, giving column name and data type. The following is an example record schema:
record (
name:string[255];
address:nullable string[255];
value1:int32;
A-2 DataStage Enterprise Edition Parallel Job Developer’s Guide
value2:int32
date:date)
(The line breaks are there for ease of reading, you would omit these if you
were defining a partial schema, for example
record(name:string[255];value1:int32;date:date)is a valid
schema.)
The format of each line describing a column is:
column_name:[nullability]datatype;
• column_name. This is the name that identifies the column. Names
must start with a letter or an underscore (_), and can contain only
alphanumeric or underscore characters. The name is not case sensitive.
The name can be of any length.
• nullability. You can optionally specify whether a column is allowed
to contain a null value, or whether this would be viewed as invalid.
If the column can be null, insert the word ’nullable’. By default
columns are not nullable.
You can also include ’nullable’ at record level to specify that all
columns are nullable, then override the setting for individual
columns by specifying ‘not nullable’. For example:
record nullable (
name:not nullable string[255];
value1:int32;
date:date)
• datatype. This is the data type of the column. This uses the internal
data types as described on page 2-32, not SQL data types as used
on Columns tabs in stage editors.
You can include comments in schema definition files. A comment is
started by a double slash //, and ended by a newline.
The example schema corresponds to the following table definition as specified
on a Columns tab of a stage editor:

The following sections give special consideration for representing various
data types in a schema file.
Date Columns
The following examples show various different data definitions:
record (dateField1:date; ) // single date
record (dateField2[10]:date; ) // 10-element date vector
record (dateField3[]:date; ) // variable-length date vector
record (dateField4:nullable date;) // nullable date
(See “Complex Data Types” on page 2-36 for information about vectors.)
Decimal Columns
To define a record field with data type decimal, you must specify the
column’s precision, and you may optionally specify its scale, as follows:
column_name:decimal[ precision, scale];
where precision is greater than or equal 1 and scale is greater than or equal to 0 and less than precision. If the scale is not specified, it defaults to zero, indicating an integer value.
The following examples show different decimal column definitions:
record (dField1:decimal[12]; ) // 12-digit integer
record (dField2[10]:decimal[15,3]; )// 10-element
//decimal vector
record (dField3:nullable decimal[15,3];) // nullable decimal
Floating-Point Columns To define floating-point fields, you use the sfloat (single-precision) or
dfloat (double-precision) data type, as in the following examples:
record (aSingle:sfloat; aDouble:dfloat; ) // float definitions
record (aSingle: nullable sfloat;) // nullable sfloat
record (doubles[5]:dfloat;) // fixed-length vector of dfloats
record (singles[]:sfloat;) // variable-length vector of sfloats
Integer Columns
To define integer fields, you use an 8-, 16-, 32-, or 64-bit integer data type
(signed or unsigned), as shown in the following examples:
record (n:int32;) // 32-bit signed integer
record (n:nullable int64;) // nullable, 64-bit signed integer
record (n[10]:int16;) // fixed-length vector of 16-bit
//signed integer
record (n[]:uint8;) // variable-length vector of 8-bit unsigned
//int
Raw Columns
You can define a record field that is a collection of untyped bytes, of fixed or Variable length. You give the field data type raw. The definition for a raw field is similar to that of a string field, as shown in the following examples:
record (var1:raw[];) // variable-length raw field
record (var2:raw;) // variable-length raw field; same as raw[]
record (var3:raw[40];) // fixed-length raw field
record (var4[5]:raw[40];)// fixed-length vector of raw fields
You can specify the maximum number of bytes allowed in the raw field with the optional property max, as shown in the example below:
Schemas A-5
record (var7:raw[max=80];)
The length of a fixed-length raw field must be at least 1.
String Columns
You can define string fields of fixed or variable length. For variable-length strings, the string length is stored as part of the string as a hidden integer.The storage used to hold the string length is not included in the length of
the string.
The following examples show string field definitions:
record (var1:string[];) // variable-length string
record (var2:string;) // variable-length string; same as string[]
record (var3:string[80];) // fixed-length string of 80 bytes
record (var4:nullable string[80];) // nullable string
record (var5[10]:string;) // fixed-length vector of strings
record (var6[]:string[80];) // variable-length vector of strings
You can specify the maximum length of a string with the optional property max, as shown in the example below:
record (var7:string[max=80];)
The length of a fixed-length string must be at least 1.
Time Columns
By default, the smallest unit of measure for a time value is seconds, but you can instead use microseconds with the [microseconds] option. The following are examples of time field definitions:
record (tField1:time; ) // single time field in seconds
record (tField2:time[microseconds];)// time field in
//microseconds
record (tField3[]:time; ) // variable-length time vector
record (tField4:nullable time;) // nullable time
Timestamp Columns
Timestamp fields contain both time and date information. In the time portion, you can use seconds (the default) or microseconds for the smallest unit of measure. For example:
record (tsField1:timestamp;)// single timestamp field in //seconds
record (tsField2:timestamp[microseconds];)// timestamp in //microseconds
record (tsField3[15]:timestamp; )// fixed-length timestamp //vector
record (tsField4:nullable timestamp;)// nullable timestamp Vectors
Many of the previous examples show how to define a vector of a particular data type. You define a vector field by following the column name with brackets []. For a variable-length vector, you leave the brackets empty, and for a fixed-length vector you put the number of vector elements in the brackets. For example, to define a variable-length vector of int32, you would use a field definition such as the following one:
intVec[]:int32;
To define a fixed-length vector of 10 elements of type sfloat, you would use a definition such as:
sfloatVec[10]:sfloat;
You can define a vector of any data type, including string and raw. You cannot define a vector of a vector or tagged type. You can, however, define a vector of type subrecord, and you can define that subrecord includes a tagged column or a vector. You can make vector elements nullable, as shown in the following record definition:
record (vInt[]:nullable int32;
vDate[6]:nullable date; )
In the example above, every element of the variable-length vector vInt will be nullable, as will every element of fixed-length vector vDate. To test whether a vector of nullable elements contains no data, you must check each element for null.
Subrecords
Record schemas let you define nested field definitions, or subrecords, by
specifying the type subrec. A subrecord itself does not define any storage;
instead, the fields of the subrecord define storage. The fields in a subrecord
can be of any data type, including tagged.
The following example defines a record that contains a subrecord:
record ( intField:int16;
aSubrec:subrec (
aField:int16;
Schemas A-7
bField:sfloat; );
)
In this example, the record contains a 16-bit integer field, intField, and a
subrecord field, aSubrec. The subrecord includes two fields: a 16-bit
integer and a single-precision float.
Subrecord columns of value data types (including string and raw) can be
nullable, and subrecord columns of subrec or vector types can have
nullable elements. A subrecord itself cannot be nullable.
You can define vectors (fixed-length or variable-length) of subrecords. The
following example shows a definition of a fixed-length vector of
subrecords:
record (aSubrec[10]:subrec (
aField:int16;
bField:sfloat; );
)
You can also nest subrecords and vectors of subrecords, to any depth of
nesting. The following example defines a fixed-length vector of
subrecords, aSubrec, that contains a nested variable-length vector of
subrecords, cSubrec:
record (aSubrec[10]:subrec (
aField:int16;
bField:sfloat;
cSubrec[]:subrec (
cAField:uint8;
cBField:dfloat; );
);
)
Subrecords can include tagged aggregate fields, as shown in the following
sample definition:
record (aSubrec:subrec (
aField:string;
bField:int32;
cField:tagged (
dField:int16;
eField:sfloat;
);
);
)
In this example, aSubrec has a string field, an int32 field, and a tagged
aggregate field. The tagged aggregate field cField can have either of two
data types, int16 or sfloat.
A-8 DataStage Enterprise Edition Parallel Job Developer’s Guide
Tagged Columns
You can use schemas to define tagged columns (similar to C unions), with
the data type tagged. Defining a record with a tagged type allows each
record of a data set to have a different data type for the tagged column.
When your application writes to a field in a tagged column, DataStage
updates the tag, which identifies it as having the type of the column that
is referenced.
The data type of a tagged columns can be of any data type except tagged
or subrec. For example, the following record defines a tagged subrecord
field:
record ( tagField:tagged (
aField:string;
bField:int32;
cField:sfloat;
) ;
)
In the example above, the data type of tagField can be one of following:
a variable-length string, an int32, or an sfloat.
Partial Schemas
Some parallel job stages allow you to use a partial schema. This means that
you only need define column definitions for those columns that you are
actually going to operate on. The stages that allow you to do this are file
stages that have a Format tab. These are:
• Sequential File stage
• File Set stage
• External Source stage
• External Target stage
• Column Import stage
You specify a partial schema using the Intact property on the Format tab
of the stage together with the Schema File property on the corresponding
Properties tab. To use this facility, you need to turn Runtime Column
Propagation on, and provide enough information about the columns
being passed through to enable DataStage to skip over them as necessary.
In the file defining the partial schema, you need to describe the record and
the individual columns. Describe the record as follows:
• intact. This property specifies that the schema being defined is a
partial one. You can optionally specify a name for the intact schema
here as well, which you can then reference from the Intact property
of the Format tab.
• record_length. The length of the record, including record delimiter
characters.
• record_delim_string. String giving the record delimiter as an
ASCII string in single quotes. (For a single character delimiter, use
record_delim and supply a single ASCII character in single
quotes).
Describe the columns as follows:
• position. The position of the starting character within the record.
• delim. The column trailing delimiter, can be any of the following:
– ws to skip all standard whitespace characters (space, tab, and
newline) trailing after a field.
– end to specify that the last field in the record is composed of all
remaining bytes until the end of the record.
– none to specify that fields have no delimiter.
A-10 DataStage Enterprise Edition Parallel Job Developer’s Guide
– null to specify that the delimiter is the ASCII null character.
– ASCII_char specifies a single ASCII delimiter. Enclose ASCII_char
in single quotation marks. (To specify multiple ASCII characters,
use delim_string followed by the string in single quotes.)
• text specifies the data representation type of a field as being text
rather than binary. Data is formatted as text by default. (Specify
binary if data is binary.)
Columns that are being passed through intact only need to be described in
enough detail to allow DataStage to skip them and locate the columns that
are to be operated on.
For example, say you have a sequential file defining rows comprising six
fixed width columns, and you are in interested in the last two. You know
that the first four columns together contain 80 characters. Your partial
schema definition might appear as follows:
record { intact=details, record_delim_string = '\r\n' }
( colstoignore: string [80]
name: string [20] { delim=none };
income: uint32 {delim = “,”, text };
Your stage would not be able to alter anything in a row other than the
name and income columns (it could also add a new column to either the
beginning or the end of a row).
-----------------------------------------------------------------------
Complex Data Types
Parallel jobs support three complex data types:
• Subrecords
• Tagged subrecords
• Vectors
When referring to complex data in DataStage column definitions, you can
specify fully qualified column names, for example:
Parent.Child5.Grandchild2
Subrecords
A subrecord is a nested data structure. The column with type subrecord
does not itself define any storage, but the columns it contains do. These
columns can have any data type, and you can nest subrecords one within
another. The LEVEL property is used to specify the structure of

Wednesday, January 13, 2010

IBM WebSphere Information Analyzer v8.0 - IBM 000-417

1. Which two connectors are automatically available when defining a data source to Information Analyzer?(Choose two.)
A. ODBCConnector
B. JDBCConnector
C. DataStageConnector
D. WebSphereMQConnector
Answer: AD
2.Which analysis builds a complete set of all the column pairs between primary key columns and remaining
selected columns in selected tables?A. column analysis
B. foreign key analysis
C. cross table analysis
D. primary key analysis
Answer: B
3. Which three methods can be used to review the results of domain analysis? (Choose three.)A. Examine each value individually.
B. Specify a range to determine the validity of the values.
C. Compare all of the values to the values in a reference table.
D. Specify a conditional test to determine the validity of the values.
E. Create an automated script to check the validity of values based on business requirements.
Answer: ABC
4. Given a review of a new column analysis, which review processes must be completed in sequence?
A. domain analysis, then format analysis
B. format analysis, then properties analysis
C. domain analysis, then rebuild inferences
D. properties analysis, then classification analysis
Answer: C
5.Which statement is true with regards to publishing analysis results?A. Reference tables can be created then published and accessed using QualityStage.
B. Categories can be created then published and accessed using Business Glossary.
C. The accepted primary and foreign keys can be published then accessed using DataStage.
D. The candidate primary and foreign keys can be published then accessed using Business Glossary.
Answer: C
6. You are installing Information Analyzer and plan to use DB2 as the analysis database, which statement is
true?
A. The analysis database is automatically created at installation.
B. You must install the analysis database after you install the suite.
C. The analysis database DSN is automatically created at installation.
D. You must configure the analysis database before you install the suite.
Answer: A
7. Which two roles must be assigned to a user to complete the configuration of Information Analyzer?
(Choose two.)
A. data administrator
B. project administrator
C. database administrator
D. production administrator
Answer: AB
8. What must be installed prior to AuditStage?
A. .NET Framework
B. Information Server
C. Java Run-time Environment
D. Microsoft Data Access Components
Answer: D
What are two valid ways of connecting AuditStage to a database? (Choose two.)
A. JDBC
B. federation
C. Information Analyzer
D. direct database connection
Answer: BD
10. For a data quality assessment project, which three roles are involved throughout the assessment?
(Choose three.)
A. data modeler
B. data analyst
C. ETL developer
D. business analyst
E. business subject matter expert
Answer: BDE
11. When planning a data quality assessment project, application owners have expressed reluctance to
provide real-time access to their source systems using Information Analyzer.
Which two concepts should be discussed and understood by the assessment team? (Choose two.)
A. data standardization needed
B. data transformation rules needed
C. timeliness of data and its effect on structural metadata
D. timeliness of data and its effect on data assessment results
Answer: CD
12. A company has determined a need for profiling a centralized data warehouse for business intelligence
reporting.
During the initial client discovery session, which three initial questions should be addressed? (Choose
three.)
A. Which data marts are generated?
B. Which data sources are contributors?
C. Which underlying issues are suspected?
D. Which users are subject matter experts?
E. Which are the referential integrity violations?
Answer: BCD
13. A company has embarked on the implementation of a centralized data warehouse.
Data Quality Assessment will be addressed at which two points in the implementation project? (Choose
two.)
A. periodically during post-implementation
B. during the initial project plan development
C. during the analysis phase before actual design begins
D. during acceptance testing conducted after construction but before the warehouse is turned over for use
Answer: CD
14. What are two architectural components of AuditStage? (Choose two.)
A. client interface
B. parallel execution engine
C. internal or external database
D. common services connector
Answer: AC
15. Which architectural component applies to both Information Analyzer and AuditStage?
A. XMeta
B. domain
C. client interface
D. parallel engine
Answer: C
16.Which method should be used to estimate disk space to store frequency distributions for a given column
based on the volume of data to be profiled?
A. Multiply the number of distinct values by 2000.
B. Multiply the total number of columns and the total number of rows.
C. Multiply the number of rows by the sum of the column length and the maximum column format size plus
150.
D. Multiply the number of rows by the sum of the column length and the maximum number of formats plus
150.
Answer: C
17. You have an Information Analyzer installation in an environment where multiple servers and multiple
operating systems are available.
Which two configurations are valid? (Choose two.)
A. all layers on one RedHat Linux server
B. Domain and Repository layers on one RedHat Linux server, Engine layer on one AIX UNIX server
C. Domain and Repository layers on one RedHat Linux server, Engine layer on another RedHat Linux
server
D. Domain layer on one RedHat Linux server, Engine layer on one AIX UNIX server, Repository layer on
another AIX UNIX server
Answer: AC
18. Which three steps must be completed to make a data source available for analysis in Information
Analyzer? (Choose three.)
A. Define a host computer.
B. Import the source metadata.
C. Import a sample set of source records.
D. Define an ODBC connection to the data source.
E. Define an ODBC connection to the IADB repository.
Answer: ABD
19. Which three steps must be performed to tell Information Analyzer where data is located? (Choose
three.)
A. Define a host.
B. Define a project.
C. Define a data store.
D. Define a data connection.
E. Define analysis settings.
Answer: ACD
20. Which two statements are true when importing metadata describing a data source? (Choose two.)
A. Column definitions can be changed by the user during import.
B. Metadata can be discovered by using the Identify Next Level function.
C. Discovered metadata is automatically imported into the analysis database.
D. Metadata for table and column definitions can be discovered using the Identify All Levels function.
Answer: BD

IBM WebSphere Information Analyzer

IBM WebSphere Information Analyzer evaluates the content and structure of your data for consistency and quality.IBM WebSphere Information Analyzer helps you to assess the quality of your data by identifying inconsistencies, redundancies, and anomalies in your data at the column, table, and cross-table levels. IBM WebSphere Information Analyzer also makes inferences about the best choices regarding data structure.Inferences help you to learn more about the optimal structure of your data and what you can do to improve the quality of your data. In addition, IBM WebSphere Information Analyzer provides a mechanism called Baseline Analysis to help you assess whether a data quality procedure that you implemented has resulted in an improvement in data quality by comparing a prior version of analysis results with the current analysis results for a given data source.
The main functions provided by IBM WebSphere Information Analyzer are:
Column Analysis
Primary Key Analysis
Foreign Key Analysis
Cross Domain Analysis
Publish Analysis Results
Baseline Analysis
Reports

Meta Data/Domain Integrity
•Column Analysis
•Completeness
•Consistency
•Pattern Consistency
•Translation table creation

Structural Integrity
•Table Analysis
•Key Analysis

Entity Integrity
•Duplicate Analysis
•Targeted Data Accuracy

Relational Integrity
•Cross-Table Analysis
•Redundancy Analysis

DQA process. Those steps are as follows:

Prepare the data for assessment
Select the data sources to be investigated and analyzed.
2. Conduct data discovery
The DA and SME perform the investigation and analyses using tools such as
IBM WebSphere Information Analyzer and IBM WebSphere AuditStage. This
discovery involves checking metadata integrity, structural integrity, entity
integrity, relational integrity, and domain integrity.
3. Document data quality issues and decisions
After all information about data quality is known, you can make and
implement the appropriate data alignment and cleansing decisions.

Sunday, January 10, 2010

Balanced Hardware Configuration for Data Warehouse

Balanced Hardware Configuration
A properly sized and balanced hardware configuration is required to maximize data
warehouse performance. The following sections discuss important considerations in
achieving this balance:
■ How Many CPUs and What Clock Speed Do I Need?
■ How Much Memory Do I Need?
■ How Many Disks Do I Need?
■ How Do I Determine Sufficient I/O Bandwidth?

How Many CPUs and What Clock Speed Do I Need?
Central processing units (CPUs) provide the calculation capabilities in a data Warehouse. You must have sufficient CPU power to perform the data warehouse Operations. Parallel operations are more CPU-intensive than the equivalent serial
operations.

Use the estimated highest throughput as a guideline for the number of CPUs you
need. As a rough estimate, use the following formula:

< No. of CPUs > = < maximum throughput in MB/secon> / 200 MB

This formula assumes that a CPU can sustain up to about 200 MB per second. For
example, if a system requires a maximum throughput of 1200 MB per second, then the
system needs = 1200/200 = 6 CPUs. A configuration with 1
server with 6 CPUs can service this system. A 2-node clustered system could be
configured with 3 CPUs in both nodes.
----------------------------------------------------------------------------------
How Much Memory Do I Need?

Memory in a data warehouse is particularly important for processing
memory-intensive operations such as large sorts. Access to the data cache is less
important in a data warehouse because most of the queries access vast amounts of
data. Data warehouses do not have the same memory requirements as mission-critical
OLTP applications.

The number of CPUs provides a good guideline for the amount of memory you need.
Use the following simplified formula to derive the amount of memory you need from
the CPUs that you select:

< amount of memory in GB > = 2 * < No. Of CPUs >

For example, a system with 6 CPUs needs 2 * 6 = 12 GB of memory. Most
standard servers fulfill this requirement.
----------------------------------------------------------------------------------
How Many Disks Do I Need?

A common mistake in data warehouse environments is to size the storage based on the
maximum capacity needed. Sizing that is based exclusively on storage requirements
will likely create a throughput bottleneck.

Use the maximum throughput you require to find out how many disk arrays you
need. Use the storage provider's specifications to find out how much throughput a
disk array can sustain. Note that storage providers measure in Gb per second, and
your initial throughput estimate is based on MB per second. An average disk
controller has a maximum throughput of 2 Gb per second, which translates to a
sustainable throughput of about (70% * 2 Gbit/s) /8 = 180 MB/s.
Use the following formula to determine the number of disk arrays you need:

< No. of disk controllers > = < throughput in MB/secon > / < individual controller throughput in MB/secon >

For example, our system with 1200 MB per second throughput requires at least 1200 /
180 = 7 disk arrays.
-------------------------------------------------------------------
How Do I Determine Sufficient I/O Bandwidth?

The end-to-end I/O system consists of more components than just the CPUs and disks.
A well-balanced I/O system must provide approximately the same bandwidth across
all components in the I/O system. These components include:
■ Host Bus Adapters (HBAs), the connectors between the server and the storage.
■ Switches, in between the servers and a Storage Area Network (SAN) or Network
Attached Storage (NAS).
■ Ethernet adapters for network connectivity (GigE NIC or Infiniband). In a Oracle
RAC environment, you need an additional private port for the interconnect
between the nodes that you should not include when sizing the system for I/O
throughput. The interconnect must be sized separately, taking into account factors
such as internode parallel execution.
■ Wires that connect the individual components.
Each of the components has to be able to provide sufficient I/O bandwidth to ensure a
well-balanced I/O system. The initial throughput you estimated and the hardware
specifications from the vendors are the basis to determine the quantities of the
individual components you need. Use the conversion in the following table to
translate the vendors' maximum throughput numbers in bits into sustainable
throughput numbers in bytes.

Friday, January 8, 2010

What is a Trigger?

What is a Trigger?

A trigger is a pl/sql block structure which is fired when a DML statements like Insert, Delete, Update is executed on a database table. A trigger is triggered automatically when an associated DML statement is executed.

Syntax of Triggers

The Syntax for creating a trigger is:
CREATE [OR REPLACE ] TRIGGER trigger_name
{BEFORE | AFTER | INSTEAD OF }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
[FOR EACH ROW]
WHEN (condition)
BEGIN --- sql statements
END;

•CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a trigger with the given name or overwrites an existing trigger with the same name.
•{BEFORE | AFTER | INSTEAD OF } - This clause indicates at what time should the trigger get fired. i.e for example: before or after updating a table. INSTEAD OF is used to create a trigger on a view. before and after cannot be used to create a trigger on a view.
•{INSERT [OR] | UPDATE [OR] | DELETE} - This clause determines the triggering event. More than one triggering events can be used together separated by OR keyword. The trigger gets fired at all the specified triggering event.
•[OF col_name] - This clause is used with update triggers. This clause is used when you want to trigger an event only when a specific column is updated.
•CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a trigger with the given name or overwrites an existing trigger with the same name.
•[ON table_name] - This clause identifies the name of the table or view to which the trigger is associated.
•[REFERENCING OLD AS o NEW AS n] - This clause is used to reference the old and new values of the data being changed. By default, you reference the values as :old.column_name or :new.column_name. The reference names can also be changed from old (or new) to any other user-defined name. You cannot reference old values when inserting a record, or new values when deleting a record, because they do not exist.
•[FOR EACH ROW] - This clause is used to determine whether a trigger must fire when each row gets affected ( i.e. a Row Level Trigger) or just once when the entire sql statement is executed(i.e.statement level Trigger).
•WHEN (condition) - This clause is valid only for row level triggers. The trigger is fired only for rows that satisfy the condition specified.

Exception Handling

Exception Handling

In this section we will discuss about the following,
1) What is Exception Handling.
2) Structure of Exception Handling.
3) Types of Exception Handling.

1) What is Exception Handling?
PL/SQL provides a feature to handle the Exceptions which occur in a PL/SQL Block known as exception Handling. Using Exception Handling we can test the code and avoid it from exiting abruptly. When an exception occurs a messages which explains its cause is recieved.
PL/SQL Exception message consists of three parts.
1) Type of Exception
2) An Error Code
3) A message
By Handling the exceptions we can ensure a PL/SQL block does not exit abruptly.

2) Structure of Exception Handling.
The General Syntax for coding the exception section

DECLARE
Declaration section
BEGIN
Exception section
EXCEPTION
WHEN ex_name1 THEN
-Error handling statements
WHEN ex_name2 THEN
-Error handling statements
WHEN Others THEN
-Error handling statements
END;

General PL/SQL statments can be used in the Exception Block.

When an exception is raised, Oracle searches for an appropriate exception handler in the exception section. For example in the above example, if the error raised is 'ex_name1 ', then the error is handled according to the statements under it. Since, it is not possible to determine all the possible runtime errors during testing fo the code, the 'WHEN Others' exception is used to manage the exceptions that are not explicitly handled. Only one exception can be raised in a Block and the control does not return to the Execution Section after the error is handled.

If there are nested PL/SQL blocks like this.

DELCARE
Declaration section
BEGIN
DECLARE
Declaration section
BEGIN
Execution section
EXCEPTION
Exception section
END;
EXCEPTION
Exception section
END;

In the above case, if the exception is raised in the inner block it should be handled in the exception block of the inner PL/SQL block else the control moves to the Exception block of the next upper PL/SQL Block. If none of the blocks handle the exception the program ends abruptly with an error.

3) Types of Exception.
There are 3 types of Exceptions.
a) Named System Exceptions
b) Unnamed System Exceptions
c) User-defined Exceptions

a) Named System Exceptions
System exceptions are automatically raised by Oracle, when a program violates a RDBMS rule. There are some system exceptions which are raised frequently, so they are pre-defined and given a name in Oracle which are known as Named System Exceptions.

For example: NO_DATA_FOUND and ZERO_DIVIDE are called Named System exceptions.

Named system exceptions are:
1) Not Declared explicitly,
2) Raised implicitly when a predefined Oracle error occurs,
3) caught by referencing the standard name within an exception-handling routine.

For Example: Suppose a NO_DATA_FOUND exception is raised in a proc, we can write a code to handle the exception as given below.

BEGIN
Execution section
EXCEPTION
WHEN NO_DATA_FOUND THEN
dbms_output.put_line ('A SELECT...INTO did not return any row.');
END;

b) Unnamed System Exceptions
Those system exception for which oracle does not provide a name is known as unamed system exception. These exception do not occur frequently. These Exceptions have a code and an associated message.

There are two ways to handle unnamed sysyem exceptions:
1. By using the WHEN OTHERS exception handler, or
2. By associating the exception code to a name and using it as a named exception.

We can assign a name to unnamed system exceptions using a Pragma called EXCEPTION_INIT.
EXCEPTION_INIT will associate a predefined Oracle error number to a programmer_defined exception name.

Steps to be followed to use unnamed system exceptions are
• They are raised implicitly.
• If they are not handled in WHEN Others they must be handled explicity.
• To handle the exception explicity, they must be declared using Pragma EXCEPTION_INIT as given above and handled referecing the user-defined exception name in the exception section.

The general syntax to declare unnamed system exception using EXCEPTION_INIT is:

DECLARE
exception_name EXCEPTION;
PRAGMA
EXCEPTION_INIT (exception_name, Err_code);
BEGIN
Execution sectionEXCEPTION
WHEN exception_name THEN
handle the exception
END;

For Example: Lets consider the product table and order_items table from sql joins.

Here product_id is a primary key in product table and a foreign key in order_items table.
If we try to delete a product_id from the product table when it has child records in order_id table an exception will be thrown with oracle code number -2292.
We can provide a name to this exception and handle it in the exception section as given below.

DECLARE
Child_rec_exception EXCEPTION;
PRAGMA
EXCEPTION_INIT (Child_rec_exception, -2292);
BEGIN
Delete FROM product where product_id= 104;
EXCEPTION
WHEN Child_rec_exception
THEN Dbms_output.put_line('Child records are present for this product_id.');
END;
/

c) User-defined Exceptions
Apart from sytem exceptions we can explicity define exceptions based on business rules. These are known as user-defined exceptions.

Steps to be followed to use user-defined exceptions:
• They should be explicitly declared in the declaration section.
• They should be explicitly raised in the Execution Section.
• They should be handled by referencing the user-defined exception name in the exception section.

For Example: Lets consider the product table and order_items table from sql joins to explain user-defined exception.
Lets create a business rule that if the total no of units of any particular product sold is more than 20, then it is a huge quantity and a special discount should be provided.

DECLARE
huge_quantity EXCEPTION;
CURSOR product_quantity is
SELECT p.product_name as name, sum(o.total_units) as units
FROM order_tems o, product p
WHERE o.product_id = p.product_id;
quantity order_tems.total_units%type;
up_limit CONSTANT order_tems.total_units%type := 20;
message VARCHAR2(50);
BEGIN
FOR product_rec in product_quantity LOOP
quantity := product_rec.units;
IF quantity > up_limit THEN
message := 'The number of units of product ' || product_rec.name || ' is more than 20. Special discounts should be provided. Rest of the records are skipped. '
RAISE huge_quantity;
ELSIF quantity < up_limit THEN
v_message:= 'The number of unit is below the discount limit.';
END IF;
dbms_output.put_line (message);
END LOOP;
EXCEPTION
WHEN huge_quantity THEN
dbms_output.put_line (message);
END;
/

RAISE_APPLICATION_ERROR ( )
RAISE_APPLICATION_ERROR is a built-in procedure in oracle which is used to display the user-defined error messages along with the error number whose range is in between -20000 and -20999.

Whenever a message is displayed using RAISE_APPLICATION_ERROR, all previous transactions which are not committed within the PL/SQL Block are rolled back automatically (i.e. change due to INSERT, UPDATE, or DELETE statements).

RAISE_APPLICATION_ERROR raises an exception but does not handle it.

RAISE_APPLICATION_ERROR is used for the following reasons,
a) to create a unique id for an user-defined exception.
b) to make the user-defined exception look like an Oracle error.

The General Syntax to use this procedure is:

RAISE_APPLICATION_ERROR (error_number, error_message);
• The Error number must be between -20000 and -20999
• The Error_message is the message you want to display when the error occurs.

Steps to be folowed to use RAISE_APPLICATION_ERROR procedure:
1. Declare a user-defined exception in the declaration section.
2. Raise the user-defined exception based on a specific business rule in the execution section.
3. Finally, catch the exception and link the exception to a user-defined error number in RAISE_APPLICATION_ERROR.

Using the above example we can display a error message using RAISE_APPLICATION_ERROR.

DECLARE
huge_quantity EXCEPTION;
CURSOR product_quantity is
SELECT p.product_name as name, sum(o.total_units) as units
FROM order_tems o, product p
WHERE o.product_id = p.product_id;
quantity order_tems.total_units%type;
up_limit CONSTANT order_tems.total_units%type := 20;
message VARCHAR2(50);
BEGIN
FOR product_rec in product_quantity LOOP
quantity := product_rec.units;
IF quantity > up_limit THEN
RAISE huge_quantity;
ELSIF quantity < up_limit THEN
v_message:= 'The number of unit is below the discount limit.';
END IF;
Dbms_output.put_line (message);
END LOOP;
EXCEPTION
WHEN huge_quantity THEN
raise_application_error(-2100, 'The number of unit is above the discount limit.'); END; /

Parameters in Procedure and Functions

Parameters in Procedure and Functions

How to pass parameters to Procedures and Functions in PL/SQL ?
In PL/SQL, we can pass parameters to procedures and functions in three ways.

1) IN type parameter: These types of parameters are used to send values to stored procedures.
2) OUT type parameter: These types of parameters are used to get values from stored procedures. This is similar to a return type in functions.
3) IN OUT parameter: These types of parameters are used to send values and get values from stored procedures.

NOTE: If a parameter is not explicitly defined a parameter type, then by default it is an IN type parameter.

1) IN parameter:
This is similar to passing parameters in programming languages. We can pass values to the stored procedure through these parameters or variables. This type of parameter is a read only parameter. We can assign the value of IN type parameter to a variable or use it in a query, but we cannot change its value inside the procedure.

The General syntax to pass a IN parameter is

CREATE [OR REPLACE] PROCEDURE procedure_name (
param_name1 IN datatype, param_name12 IN datatype ... )

•param_name1, param_name2... are unique parameter names.
•datatype - defines the datatype of the variable.
•IN - is optional, by default it is a IN type parameter.

2) OUT Parameter:
The OUT parameters are used to send the OUTPUT from a procedure or a function. This is a write-only parameter i.e, we cannot pass values to OUT paramters while executing the stored procedure, but we can assign values to OUT parameter inside the stored procedure and the calling program can recieve this output value.

The General syntax to create an OUT parameter is

CREATE [OR REPLACE] PROCEDURE proc2 (param_name OUT datatype)

The parameter should be explicity declared as OUT parameter.

3) IN OUT Parameter:

The IN OUT parameter allows us to pass values into a procedure and get output values from the procedure. This parameter is used if the value of the IN parameter can be changed in the calling program.

By using IN OUT parameter we can pass values into a parameter and return a value to the calling program using the same parameter. But this is possible only if the value passed to the procedure and output value have a same datatype. This parameter is used if the value of the parameter will be changed in the procedure.

The General syntax to create an IN OUT parameter is

CREATE [OR REPLACE] PROCEDURE proc3 (param_name IN OUT datatype)

The below examples show how to create stored procedures using the above three types of parameters.

Example1:

Using IN and OUT parameter:

Let’s create a procedure which gets the name of the employee when the employee id is passed.

1> CREATE OR REPLACE PROCEDURE emp_name (id IN NUMBER, emp_name OUT NUMBER)
2> IS
3> BEGIN
4> SELECT first_name INTO emp_name
5> FROM emp_tbl WHERE empID = id;
6> END;
7> /

We can call the procedure ‘emp_name’ in this way from a PL/SQL Block.

1> DECLARE
2> empName varchar(20);
3> CURSOR id_cur SELECT id FROM emp_ids;
4> BEGIN
5> FOR emp_rec in id_cur
6> LOOP
7> emp_name(emp_rec.id, empName);
8> dbms_output.putline('The employee ' || empName || ' has id ' || emp-rec.id);
9> END LOOP;
10> END;
11> /

In the above PL/SQL Block
In line no 3; we are creating a cursor ‘id_cur’ which contains the employee id.
In line no 7; we are calling the procedure ‘emp_name’, we are passing the ‘id’ as IN parameter and ‘empName’ as OUT parameter.
In line no 8; we are displaying the id and the employee name which we got from the procedure ‘emp_name’.

Example 2:

Using IN OUT parameter in procedures:

1> CREATE OR REPLACE PROCEDURE emp_salary_increase
2> (emp_id IN emptbl.empID%type, salary_inc IN OUT emptbl.salary%type)
3> IS
4> tmp_sal number;
5> BEGIN
6> SELECT salary
7> INTO tmp_sal
8> FROM emp_tbl
9> WHERE empID = emp_id;
10> IF tmp_sal between 10000 and 20000 THEN
11> salary_inout := tmp_sal * 1.2;
12> ELSIF tmp_sal between 20000 and 30000 THEN
13> salary_inout := tmp_sal * 1.3;
14> ELSIF tmp_sal > 30000 THEN
15> salary_inout := tmp_sal * 1.4;
16> END IF;
17> END;
18> /

The below PL/SQL block shows how to execute the above 'emp_salary_increase' procedure.

1> DECLARE
2> CURSOR updated_sal is
3> SELECT empID,salary
4> FROM emp_tbl;
5> pre_sal number;
6> BEGIN
7> FOR emp_rec IN updated_sal LOOP
8> pre_sal := emp_rec.salary;
9> emp_salary_increase(emp_rec.empID, emp_rec.salary);
10> dbms_output.put_line('The salary of ' || emp_rec.empID ||
11> ' increased from '|| pre_sal || ' to '||emp_rec.salary);
12> END LOOP;
13> END;
14> /

PL/SQL Functions

PL/SQL Functions

What is a Function in PL/SQL?
A function is a named PL/SQL Block which is similar to a procedure. The major difference between a procedure and a function is, a function must always return a value, but a procedure may or may not return a value.

The General Syntax to create a function is:

CREATE [OR REPLACE] FUNCTION function_name [parameters]
RETURN return_datatype;
IS
Declaration_section
BEGIN
Execution_section
Return return_variable;
EXCEPTION
exception section
Return return_variable;
END;

1) Return Type: The header section defines the return type of the function. The return datatype can be any of the oracle datatype like varchar, number etc.
2) The execution and exception section both should return a value which is of the datatype defined in the header section.

For example, let’s create a frunction called ''employer_details_func' similar to the one created in stored proc

1> CREATE OR REPLACE FUNCTION employer_details_func
2> RETURN VARCHAR(20);
3> IS
5> emp_name VARCHAR(20);
6> BEGIN
7> SELECT first_name INTO emp_name
8> FROM emp_tbl WHERE empID = '100';
9> RETURN emp_name;
10> END;
11> /

In the example we are retrieving the ‘first_name’ of employee with empID 100 to variable ‘emp_name’.
The return type of the function is VARCHAR which is declared in line no 2.
The function returns the 'emp_name' which is of type VARCHAR as the return value in line no 9.

How to execute a PL/SQL Function?
A function can be executed in the following ways.

1) Since a function returns a value we can assign it to a variable.

employee_name := employer_details_func;If ‘employee_name’ is of datatype varchar we can store the name of the employee by assigning the return type of the function to it.

2) As a part of a SELECT statement

SELECT employer_details_func FROM dual;3) In a PL/SQL Statements like,

dbms_output.put_line(employer_details_func);This line displays the value returned by the function.

Stored Procedures

Stored Procedures

What is a Stored Procedure?
A stored procedure or in simple a proc is a named PL/SQL block which performs one or more specific task. This is similar to a procedure in other programming languages. A procedure has a header and a body. The header consists of the name of the procedure and the parameters or variables passed to the procedure. The body consists or declaration section, execution section and exception section similar to a general PL/SQL Block. A procedure is similar to an anonymous PL/SQL Block but it is named for repeated usage.

We can pass parameters to procedures in three ways.
1) IN-parameters
2) OUT-parameters
3) IN OUT-parameters

A procedure may or may not return any value.

General Syntax to create a procedure is:

CREATE [OR REPLACE] PROCEDURE proc_name [list of parameters]

IS
Declaration section
BEGIN
Execution section
EXCEPTION
Exception section
END;

IS - marks the beginning of the body of the procedure and is similar to DECLARE in anonymous PL/SQL Blocks. The code between IS and BEGIN forms the Declaration section.

The syntax within the brackets [ ] indicate they are optional. By using CREATE OR REPLACE together the procedure is created if no other procedure with the same name exists or the existing procedure is replaced with the current code.

The below example creates a procedure ‘employer_details’ which gives the details of the employee.

1> CREATE OR REPLACE PROCEDURE employer_details
2> IS
3> CURSOR emp_cur IS
4> SELECT first_name, last_name, salary FROM emp_tbl;
5> emp_rec emp_cur%rowtype;
6> BEGIN
7> FOR emp_rec in sales_cur
8> LOOP
9> dbms_output.put_line(emp_cur.first_name || ' ' ||emp_cur.last_name
10> || ' ' ||emp_cur.salary);
11> END LOOP;
12>END;
13> /
How to execute a Stored Procedure?
There are two ways to execute a procedure.

1) From the SQL prompt.

EXECUTE [or EXEC] procedure_name; 2) Within another procedure – simply use the procedure name.

procedure_name;NOTE: In the examples given above, we are using backward slash ‘/’ at the end of the program. This indicates the oracle engine that the PL/SQL program has ended and it can begin processing the statements.

SURENDRA PULAGAM