This Week in Ballista #914 Mar 2021
Welcome to “This Week in Ballista”, a weekly newsletter that summarizes activity in the Ballista Distributed Compute project.
Ballista is a modern distributed compute platform powered by Apache Arrow and primarily implemented in Rust, but designed to provide first-class support for other programming languages, including Python, C++, and Java.
Proposal: Donate Ballista to Apache Arrow project
This week’s update is a little different. Andy Grove has started a discussion about donating Ballista to Apache Arrow. Here is his proposal, which is also filed as a GitHub issue here. There is a discussion on the Apache Arrow mailing list about this as well.
The Ballista project has recently reached a point where I believe that the basic architecture has been proven to work. The project has also suddenly become very popular and generated a lot of interest (more than 2k stars on GitHub).
For these reasons, I think that the project has grown too large for me to continue maintaining as a personal project and I think it is now time to move the code to a foundation to ensure its continued success.
Given the deep dependencies on Apache Arrow (the core Arrow, DataFusion, and Parquet crates) and the fact that there is already some overlap between Arrow and Ballista committers, I believe that the obvious choice would be to donate the project to Apache Arrow.
Some of the benefits of donating the project to Arrow are:
- It will be easier to add new features to Ballista and DataFusion if they are in the same repository. Ballista essentially extends DataFusion and it is often necessary to touch both code bases when implementing new functionality.
- DataFusion would benefit from having a scheduler (rather than trying to eagerly evaluate the entire query plan) and it would probably make sense to push some parts of the Ballista scheduler down a level in the stack so that the same approach is used to scale across cores in DataFusion and to scale across nodes in Ballista.
- Apache Arrow has a strong community anbd there is a team of committers that understand Arrow and DataFusion that can help with PR reviews so I will no longer be a bottleneck. Companies are also more likely to commit resources to contributing to an Apache project compared to a personal project.
If you have opinions (for or against) this donation, please comment on the GitHub issue.
Please try out Ballista with your own queries and data sets and file issues for any bugs or missing features that you discover. We would really like some help improving the user guide as well.
We are also looking for help with these issues.
Follow the @BallistaCompute Twitter account to receive notifications when new editions of “This Week in Ballista” are published.
Join the Ballista Discord Channel to chat with the core contributors.