Deep Dive into the Spark Scheduler
Last updated
Last updated
In this phase, RDDs will be translated into stages. For example:
Implement stage-oriented scheduling
Compute a DAG of stages for submitted jobs
Keep track of materialized RDD/Stage outputs
Find a minimal schedule to run the job
Stage -> TaskSet
TaskSet is a set of tasks submitted to computing missing partitions of a particular stage
A stage can correspond to multiple TaskSets
DAGScheduler submit set of tasks to TaskScheduler
Schedule and monitor tasks with SchedulerBackend.
Return events to DAGScheduler
JobSubmitted/JobCancelled
MapStageSubmitted/StageCancelled
CompletionEvent
How to schedule tasks(TaskSets)?
Batch scheduling approach
Get all available slots
Schedule tasks with locality preference
Wait until all task in the same TaskSet can be scheduled at the same time
Retry all tasks in the TaskSet if any task fails
How to schedule tasks in a TaskSet?
Try to achieve better locality for each task(This locality-aware scheduling is implemented via delay scheduling)
Less data transfer over network
Higher performance
Locality can have different levels
Process Locality(Cache/Memory)
Node Locality(Local Disk)
Rack Locality(Same Rack)
SchedulingBackend is responsible for resource management
Task Failure
Record the failure count of the task
Retry the task if failure count < maxTaskFailures
Abort the stage and corresponding jobs if count >= maxTaskFailures
Fetch Failure
Don't count the failure into task failure count
Retry the stage if stage failure < maxStageFailures
Abort the stage and corresponding jobs if stage failure <= maxStageFailures
Mark executor/host as lost(optional)
A container for executors
Executors provide
Threads to run Tasks
BlockManager to store/serve blocks