Saturday, December 19, 2009

Data Profling with IBM InfoSphere Information Analyzer Version 8.1.1

When you start the Information Analyzer client, called Information Server Console, you’ll be shown its start-up screen; and then, its log-in window.

When your log-in is successful, the console main window will show up.

Assuming the Oracle table that we’d like to profile is new; we must identify it to the Analyzer, which technically means importing its metadata.

Make sure you have connected the Oracle database to the Information Analyzer server before you import the metadata of its tables.

Expand Metadata Management from the HOME drop-down menu.

Then, click Import Metadata.

Our example Oracle data (table) is in the CLROPER database (hosted in DDOM02), so select CLROPER and then click Identify Next Level.

It might take a while, particularly for a database that has many tables and many columns; so just wait.

On the completion message screen, click OK to close the screen.

All tables in CLROPER database will be identified (listed) including our example table named SPACE1. We’ll next identify the columns of our SPACE1 table; so select SPACE1 and then click Identify Next Level.

The result shows that Analyzer has correctly identified the two columns of the table.

Now, import metadata of all columns of the table by selecting the table and then clicking Import.

Click OK to continue.



Click OK on the successful completion screen.

We’re now done with the metadata of the data; we’re now ready to start our profiling task.

In Information Analyzer (as in most other software of these days) we group our profiling works into projects. Here, I just use an existing project (DJONI_TEST), so select Open Project from the drop-down arrow on the right of NO PROJECT SELECTED.

You’ll be shown the list of existing projects. Select your project, and click Open.

Our previous (existing) profiling works are shown.

Next, open click Project Properties from the OVERVIEW drop-down menu.

Go to the Data Sources tab. Our SPACE1 table is not in the list yet, as we haven’t identified it specifically in our project (we did in the previous steps at the server-wide level); so we need to add it into our project, click Add.


When completed, click Save All, and then close the Project Properties window.

Expand the SPACE1 table to see its columns. Select all of the columns as we want to profile all of them, and then click OK

Now, we’re ready to profile our SPACE1 data, to analyze its columns. On the main toolbar select Investigate | Column analysis.

Select all columns of the SPACE1 table to analyze, and click Run Column Analysis.

Click Submit.

Check status by clicking Details.

When the job status shows Schedule Complete, click Close to close the Activity Status (job status) window.

Close the Column Analysis window as well.

Our profiling output shows the metadata characteristics of the two columns. Our focus is on their sizes; so if necessary scroll to the right to see the Length columns.

The Length has three columns: Defined, Inferred, and Selected. The Defined length of the first column (INTEGER1) is as defined in the metadata of the table we imported, which is 38. The Inferred length, which is 3, is produced, by Information Analyzer, by computing statistically the data lengths of all rows, based on the actual data values of the column; and then, it suggests (Selected) that 3 should be the length of this column. Similarly, Information Analyzer did the Inferred and Suggested on the other column, the LARGECHAR1.

Based on these output produced by Information Analyzer, we can decide how much we’d to reduce the length of the columns, which will certainly reduce the disk space needed for the data.

Summary
Using a data profiling tool, such as the IBM Information Analyzer, we can analyze and gain knowledge particularly large amount of data that otherwise would not be apparent. The Information Analyzer has much more functionalities; this article discussed only the basics of one of them (Column Analysis).

1 comment:

Unknown said...

I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in IBM Information Analyzer.kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor led training on IBM Information Analyzer. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.

For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com