What are the main components of a Spark cluster and how do they interact?
- Driver: Runs your main program,
builds logical plans, coordinates tasks, holds metadata, sometimes
collects results.
- Executors: JVM processes on worker
nodes that run tasks, store cached data, and write shuffle files.
- Cluster manager (YARN /
Kubernetes / Databricks / Standalone): Allocates resources
(containers/pods/VMs) for driver and executors.
- Flow: Driver requests resources from cluster manager → cluster
manager starts executors → driver sends tasks to executors and tracks
progress.

No comments:
Post a Comment