Saturday, January 31, 2026

 

What are the main components of a Spark cluster and how do they interact?

  • Driver: Runs your main program, builds logical plans, coordinates tasks, holds metadata, sometimes collects results.
  • Executors: JVM processes on worker nodes that run tasks, store cached data, and write shuffle files.
  • Cluster manager (YARN / Kubernetes / Databricks / Standalone): Allocates resources (containers/pods/VMs) for driver and executors.
  • Flow: Driver requests resources from cluster manager → cluster manager starts executors → driver sends tasks to executors and tracks progress.



No comments: