Ballista A modern distributed compute platform

This Week in Ballista #3

Welcome to the third edition of “This Week in Ballista”, a weekly newsletter that summarizes activity in the Ballista Distributed Compute project.

Ballista is a modern distributed compute platform powered by Apache Arrow and primarily implemented in Rust, but designed to provide first-class support for other programming languages, including Python, C++, and Java.

Current Status

There have been some notable developments this week:

There are now separate scheduler and executor binaries, with support for the following deployment models:

  • Local Mode: Single process containing scheduler and executor. This is the simplest way to run Ballista and is primarily intended for local development testing.
  • Standalone Mode: Executors connect to the scheduler process
  • Etcd Mode: Executors connect to the scheduler process and the scheduler uses etcd for state management

Thanks @edrevo for taking the lead on this work.

Preliminary work on distributed query execution is now checked in, although not working yet. End-to-end testing is now at the point where fragments of the physical plan are being sent to executors for execution. This work will likely continue over the next 2-3 weekends and the hope is that distributed execution will be fully working again sometime in March 2021, with support for several of the TPC-H queries.

Current Focus

There is still a need to continue with implementing serde for the physical plan so that a wide range of queries can be supported in distributed mode once the distributed planner is complete.

Join the Community

There is a growing community in the Ballista Discord Channel. This is a great place to ask questions and learn more about the project.