Friday, December 4, 2009

Teradata RDBMS for IBM Infosphere Datastage

Teradata RDBMS
---------------------------------------------------
Runs on SMP and MPP systems
High Volume database
Parallel database procesessing
Utilities for reading and writing to the RDBMS
-Load utilities
# FastLoad
# MultiLoad
# TPump
-Read utility
# FastExport
---------------------------------------------------
Teradata Architecture:

# Disks contain database tables
- Rows of tables are distributed across disks
# AMPs( Access Module Processors) are VPROCs ( Virtual Processors )that manage the
database.
- Locking
- Joining
- Read and Write functions
# PEs (Parsing Engines)are VPROCs that parse SQL sent to the Teradata Server by
clients (e.g.,DataStage)
- Check syntax
- Create optimized Parse Tree
- Generate task Steps
- Dispatch Steps to AMPs
# VPROCs run under PDE ( Parallel Database Extensions)



The communication layer consists of BNET communications software supporting TCP/IP communication between the AMP processors. It supports broadcast, multicast, and point-to-poin communcation.

An AMP, or unit of parallelism, "owns" a portion of the database. Multiple AMPs reside on a single 2-CPU SMP. Therefore, the Teradata Database doesn't rely on the hardware platform for parallelism, scalability, reliability, or availabl.
These capabilities are inherent in the database architecture and are independent of the operating system and hardware configuration.

AMPs are one of two types of virtual processors ( VPROCs). The second type of VPROC is Parsing Engines (PE), which break up a request or query into manageable pieces and distribute the work to the AMP VPROCs for processing. Multiple PEs can also exist on a single node. It's important to note that each PE has access to each AMP, which allows for complete parallel processing of each request.

PEs are similar in concept to a database optimizer.
----------------------------------------------------------
Client Access to Teradata (ex: Datastage )

# Utilities access
- Teradata has a set number of slots (15 by default) in which the
Teradata utilites can run concurrently.
* A Datastage job that invokes a utility when no slot is available will abort.
- the MultiLoad stage supports tenacity, which queues the process for retry when slots aren't available.
- Teradata utilites for loading tables
Vary in whether they take up a utility slot
Vary in the number of target tables they can load in a single run
Vary in their update and load capabilities
Vary in whether the table in locked for use by other users
Vary in their performance
- Teradata utility for reading from tables
Only one, so no choice here
# Non-utilities access
- ODBC: Access through ODBC driver
- CLI ( Call Level Interface )
Programmed access.
-------------------------------------------------------
Teradata Utilities
----------------------------------------------------
# Utilities for loading tables
- FastLoad ( Takes a utility slot )
High performance loading to a single, empty table
No updates or upserts
Locks table
Supports checkpoint restart
- MultiLoad ( Takes a utility slot )
High performance inserts, updates and deletes
Can update multiple tables per run
Table need not be empty
Locks table
Supports checkpoint restart
- TPump ( Does not table a utility slot; runs in the background )
Supports inserts, updates, upserts, and deletes
Can update multiple tables per run
"Trickle feed"; Like a pump the data is fed to the target in a slow, steady stream that it can handle
- Flow of data can be throttled to a specifed number of updates per minute
Supports checkpoint restart
# Utilities for reading from tables
- FastExport ( Takes a utility slot )
Extracts large amounts of distributed data.
-----------------------------------------------------------------
Teradata Stage:
-----------------------------
# Four Teradata stages
- Teradata Enterprise
- Teradata MultiLoad
- Teradata API
- Teradata Load

# All support one input link and/or one output link
- Output link reads from database tables or load files
- Input writes to database tables.

# How they differ
- Teradata utilities used / not used
- Performance
- Whether they take up a utility slot
- Update (upsert) capabilities
- Target one table or more than one per run
- Whether table is locked.

No comments: