Friday, January 1, 2010

IBM Websphere Framework Architecture




What is Information Integrations Solution’s definition of Grid Computing?

􀂃A Group of Like Servers running the Same Operating system
􀂃High Speed Private network between nodes (1gb)
􀂃More than 3 physical nodes
–3 or Less can be managed using static configuration files
–More than 3 is un-manageable (too many combinations)
􀂃Resource Manger is being used to Manage node allocation
􀂃Dynamic Configuration files created at RUNTIME

Why Information Integration Grid?

1.Commodity Hardware
􀂃Low Cost
􀂃High Availability
2.Software Scalability
􀂃Larger Data sources
􀂃Faster Run times
3.Utility Usage / Consolidation
􀂃Away from dedicated Servers by Project
􀂃Data Analysis / Data Cleansing / Data Integration on shared pool of resources



Resource Manager Responsibility
􀂃Node Allocation / Holding of Resource until task completes
􀂃Nodes that are used, can’t be used again until previously assigned task completes
􀂃No additional nodes available –Jobs Queue’d
􀂃Released using FIFO method
􀂃Must have enough to run max concurrent job streams
􀂃Smart usage/allocation by type of job
􀂃Define queues based on priority
􀂃Down/off servers not assigned

Resource Manager Responsibility

􀂃License Restricter(nodes allocated by type of task)
–ProfileStage: 4 nodes
–QualityStage: 6 nodes
–DataStage: 10 nodes
•Based on time of day
􀂃From Shared pool of nodes
􀂃License for 8 nodes, but have 48 nodes available (other apps)
􀂃Assigned to Logical, Not Physical node
􀂃Each Resource Manager manages these tasks differently
􀂃Real Time during Day, Batch nightly or weekends

Correct Node Allocation
􀂃Each Job Sequence or Job should maximize utilization on single node before using multiple nodes
–Maximize on single node first using
•$APT_GRID_PARTITIONS (per node or server)
–Then add more nodes using
•$APT_GRID_COMPUTENODES (servers)
􀂃Most Jobs or Job Sequences should use single node and partition, (80%+) determined by:
􀂃Run time (ie. Less than 5 minutes)
􀂃Data volume (< few thousand rows)
􀂃Concurrency of jobs from a job single sequence

Information Integration High Availability Grid
􀂃Enough Compute nodes to provide service when node(s) fails
􀂃Compute node failure
–Just restart Job(s), based on availability
–Must be designed for restart
􀂃Frontendnode failure
–Heartbeat failover of services
–Restart jobs
􀂃SAN failure (I/O node for Larger Grid)
–Similar to Frontendfailure





No comments: