Sunday, January 10, 2010

Balanced Hardware Configuration for Data Warehouse

Balanced Hardware Configuration
A properly sized and balanced hardware configuration is required to maximize data
warehouse performance. The following sections discuss important considerations in
achieving this balance:
How Many CPUs and What Clock Speed Do I Need?
■ How Much Memory Do I Need?
■ How Many Disks Do I Need?
■ How Do I Determine Sufficient I/O Bandwidth?


How Many CPUs and What Clock Speed Do I Need?
Central processing units (CPUs) provide the calculation capabilities in a data Warehouse. You must have sufficient CPU power to perform the data warehouse Operations. Parallel operations are more CPU-intensive than the equivalent serial
operations.

Use the estimated highest throughput as a guideline for the number of CPUs you
need. As a rough estimate, use the following formula:

< No. of CPUs > = < maximum throughput in MB/secon> / 200 MB

This formula assumes that a CPU can sustain up to about 200 MB per second. For
example, if a system requires a maximum throughput of 1200 MB per second, then the
system needs = 1200/200 = 6 CPUs. A configuration with 1
server with 6 CPUs can service this system. A 2-node clustered system could be
configured with 3 CPUs in both nodes.
----------------------------------------------------------------------------------
How Much Memory Do I Need?

Memory in a data warehouse is particularly important for processing
memory-intensive operations such as large sorts. Access to the data cache is less
important in a data warehouse because most of the queries access vast amounts of
data. Data warehouses do not have the same memory requirements as mission-critical
OLTP applications.

The number of CPUs provides a good guideline for the amount of memory you need.
Use the following simplified formula to derive the amount of memory you need from
the CPUs that you select:

< amount of memory in GB > = 2 * < No. Of CPUs >

For example, a system with 6 CPUs needs 2 * 6 = 12 GB of memory. Most
standard servers fulfill this requirement.
----------------------------------------------------------------------------------
How Many Disks Do I Need?

A common mistake in data warehouse environments is to size the storage based on the
maximum capacity needed. Sizing that is based exclusively on storage requirements
will likely create a throughput bottleneck.

Use the maximum throughput you require to find out how many disk arrays you
need. Use the storage provider's specifications to find out how much throughput a
disk array can sustain. Note that storage providers measure in Gb per second, and
your initial throughput estimate is based on MB per second. An average disk
controller has a maximum throughput of 2 Gb per second, which translates to a
sustainable throughput of about (70% * 2 Gbit/s) /8 = 180 MB/s.
Use the following formula to determine the number of disk arrays you need:

< No. of disk controllers > = < throughput in MB/secon > / < individual controller throughput in MB/secon >

For example, our system with 1200 MB per second throughput requires at least 1200 /
180 = 7 disk arrays.
-------------------------------------------------------------------
How Do I Determine Sufficient I/O Bandwidth?

The end-to-end I/O system consists of more components than just the CPUs and disks.
A well-balanced I/O system must provide approximately the same bandwidth across
all components in the I/O system. These components include:
■ Host Bus Adapters (HBAs), the connectors between the server and the storage.
■ Switches, in between the servers and a Storage Area Network (SAN) or Network
Attached Storage (NAS).
■ Ethernet adapters for network connectivity (GigE NIC or Infiniband). In a Oracle
RAC environment, you need an additional private port for the interconnect
between the nodes that you should not include when sizing the system for I/O
throughput. The interconnect must be sized separately, taking into account factors
such as internode parallel execution.
■ Wires that connect the individual components.
Each of the components has to be able to provide sufficient I/O bandwidth to ensure a
well-balanced I/O system. The initial throughput you estimated and the hardware
specifications from the vendors are the basis to determine the quantities of the
individual components you need. Use the conversion in the following table to
translate the vendors' maximum throughput numbers in bits into sustainable
throughput numbers in bytes.

No comments: