Deploying Ballista with Kubernetes
Ballista can be deployed to any Kubernetes cluster using the following instructions. These instructions assume that you are already comfortable with managing Kubernetes deployments.
The k8s deployment consists of:
- k8s stateful set for one or more scheduler processes
- k8s stateful set for one or more executor processes
- k8s service to route traffic to the schedulers
- k8s persistent volume and persistent volume claims to make local data accessible to Ballista
Limitations
Ballista is at an early stage of development and therefore has some significant limitations:
- There is no support for shared object stores such as S3. All data must exist locally on each node in the cluster, including where any client process runs (until #473 is resolved).
- Only a single scheduler instance is currently supported unless the scheduler is configured to use
etcd
as a backing store.
Create Persistent Volume and Persistent Volume Claim
Copy the following yaml to a pv.yaml
file and apply to the cluster to create a persistent volume and a persistent
volume claim so that the specified host directory is available to the containers. This is where any data should be
located so that Ballista can execute queries against it.
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-pv
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
To apply this yaml:
kubectl apply -f pv.yaml
You should see the following output:
persistentvolume/data-pv created
persistentvolumeclaim/data-pv-claim created
Deploying Ballista Scheduler and Executors
Copy the following yaml to a cluster.yaml
file.
apiVersion: v1
kind: Service
metadata:
name: ballista-scheduler
labels:
app: ballista-scheduler
spec:
ports:
- port: 50050
name: scheduler
clusterIP: None
selector:
app: ballista-scheduler
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ballista-scheduler
spec:
serviceName: "ballista-scheduler"
replicas: 1
selector:
matchLabels:
app: ballista-scheduler
template:
metadata:
labels:
app: ballista-scheduler
ballista-cluster: ballista
spec:
containers:
- name: ballista-scheduler
image: ballistacompute/ballista-rust:0.4.1
command: ["/scheduler"]
args: ["--port=50050"]
ports:
- containerPort: 50050
name: flight
volumeMounts:
- mountPath: /mnt
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pv-claim
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ballista-executor
spec:
serviceName: "ballista-scheduler"
replicas: 2
selector:
matchLabels:
app: ballista-executor
template:
metadata:
labels:
app: ballista-executor
ballista-cluster: ballista
spec:
containers:
- name: ballista-executor
image: ballistacompute/ballista-rust:0.4.1
command: ["/executor"]
args: ["--port=50051", "--scheduler-host=ballista-scheduler", "--scheduler-port=50050", "--external-host=$(MY_POD_IP)"]
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
ports:
- containerPort: 50051
name: flight
volumeMounts:
- mountPath: /mnt
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pv-claim
$ kubectl apply -f cluster.yaml
This should show the following output:
service/ballista-scheduler created
statefulset.apps/ballista-scheduler created
statefulset.apps/ballista-executor created
You can also check status by running kubectl get pods
:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 16m
ballista-scheduler-0 1/1 Running 0 42s
ballista-executor-0 1/1 Running 2 42s
ballista-executor-1 1/1 Running 0 26s
You can view the scheduler logs with kubectl logs ballista-scheduler-0
:
$ kubectl logs ballista-scheduler-0
[2021-02-19T00:24:01Z INFO scheduler] Ballista v0.4.1 Scheduler listening on 0.0.0.0:50050
[2021-02-19T00:24:16Z INFO ballista::scheduler] Received register_executor request for ExecutorMetadata { id: "b5e81711-1c5c-46ec-8522-d8b359793188", host: "10.1.23.149", port: 50051 }
[2021-02-19T00:24:17Z INFO ballista::scheduler] Received register_executor request for ExecutorMetadata { id: "816e4502-a876-4ed8-b33f-86d243dcf63f", host: "10.1.23.150", port: 50051 }
Deleting the Ballista cluster
Run the following kubectl command to delete the cluster.
kubectl delete -f cluster.yaml