Introduction to HPC and Batch Computing on OpenShift

This workshop introduces high-performance computing (HPC) and batch computing in the context of Kubernetes and Red Hat OpenShift. You will learn how these workloads differ from traditional containerized applications and how the platform scheduler and ecosystem components support scale-out batch jobs.

How HPC and Batch Jobs Differ from Traditional Workloads

On Kubernetes and OpenShift, most day-to-day workloads are traditional containerized applications: long-running services (e.g., APIs, web servers) or short-lived jobs that run a single Pod to completion. They are designed for elasticity, rolling updates, and handling requests with relatively small, independent units of work.

HPC and batch computing represent a different paradigm:

Scale-out parallelism — Work is split across many Pods (often hundreds or thousands) that must run together as one logical job. Losing or delaying a subset of Pods can make the whole job useless or incorrect.
All-or-nothing execution — A distributed training run or a batch simulation typically needs all required replicas to be running before useful work begins. Partial execution is not acceptable.
Resource intensity — Jobs often need GPUs, large amounts of CPU or memory per Pod, or specialized hardware. Schedulers must account for these resources and place Pods accordingly.
Queue-based admission — Cluster capacity is finite. Batch jobs are usually submitted to a queue and start only when enough capacity is available, rather than competing immediately with every other workload.
Job lifecycle — Jobs have a clear start and end; they are not long-running services. The system must track job completion, retries, and cleanup.

Because of these requirements, running HPC and batch workloads effectively on OpenShift requires more than default deployment patterns: you need scheduling policies that understand gang scheduling and components that provide queuing, frameworks, and operators tailored to batch and HPC.

The Kubernetes Scheduler and Gang Scheduling

The Kubernetes scheduler is responsible for placing Pods onto nodes. By default, it makes independent placement decisions per Pod: each Pod is scheduled as soon as there are enough resources on some node. For a single web server or a small Deployment, this is ideal.

For scale-out batch computing, this default behavior is insufficient. Consider a distributed training job that needs 64 Pods. If the scheduler places 60 Pods and cannot place the remaining 4 (e.g., due to resource fragmentation or quota), you end up with:

60 Pods consuming resources and possibly waiting or failing
No useful progress until all 64 are running
Wasted capacity and poor cluster utilization

Gang scheduling addresses this by treating a set of Pods as a gang: either all of them are scheduled together, or none of them are. The scheduler (or a component that works with it) holds back the entire gang until there is capacity to run the full set. This avoids resource waste and ensures that batch jobs only start when they can run to completion as a unit.

On Kubernetes and OpenShift, gang scheduling is not yet built into the default scheduler. It is provided by:

Custom schedulers or scheduler plugins that understand gang semantics, or
Higher-level operators and frameworks (e.g., Kubeflow Training Operator, Spark Operator, Kueue) that create Pods and integrate with scheduling layers to achieve gang-like behavior.

Understanding the role of the Kubernetes scheduler—and its limitations for batch—sets the stage for the queue systems and operators we use to run HPC and batch workloads at scale.

Concepts Covered in This Showroom

The rest of this showroom explores the tools and patterns that make HPC and batch computing practical on OpenShift. The following concepts and components are introduced here and detailed in later sections.

Kueue

Kueue is a Kubernetes-native job queue that manages how batch jobs are admitted to the cluster. It holds jobs in queues, respects resource quotas and cluster capacity, and releases them so that the Kubernetes scheduler can place their Pods. Kueue integrates with the scheduler and with job frameworks to support fair sharing and gang-style scheduling.

Multikueue

Multikueue extends Kueue to multi-cluster scenarios. When you have several OpenShift or Kubernetes clusters (e.g., on-premises and in the cloud), Multikueue allows a single logical queue to distribute work across clusters, improving utilization and enabling hybrid batch workloads.

Kubeflow Training Operator

The Kubeflow Training Operator runs distributed machine learning training jobs (e.g., PyTorch, TensorFlow, MXNet) on Kubernetes. It defines custom resources for training jobs and handles gang scheduling and lifecycle so that all worker Pods start together and the job completes or fails as a unit.

Spark and Ray

Apache Spark and Ray are frameworks for distributed data processing and ML. On OpenShift they run via operators (e.g., Spark Operator, Ray Operator) that submit and manage Pods for drivers and workers. These operators work with the scheduler and, when used with Kueue, with the queue layer to schedule jobs in a coordinated way.

Slurm Operator

Slurm is a widely used HPC workload manager. The Slurm Operator brings Slurm’s scheduling and job model onto Kubernetes/OpenShift, allowing HPC users and applications that expect Slurm to run batch jobs on the same cluster as other Kubernetes workloads.

In the following sections you will see how to use these components on OpenShift to run HPC and batch workloads with proper queuing, gang scheduling, and resource management.