Ballista A modern distributed compute platform

Ballista 0.3.0

The goal of the 0.3.0 release is to provide a minimum viable product of distributed compute in Rust. It is now possible to run a query that is very close to TPC-H query 1 on a distributed cluster with reasonable performance. Performance and scalability is comparable to Apache Spark (within the range of 2x slower to 2x faster based on initial benchmarks).

Performance tuning will be one of the main areas of focus for the 0.4.0 release.

Please refer to the user guide for installation instructions.

This release supports the following operators:

  • Projection
  • Selection
  • Hash Aggregate
  • CSV Table Scan
  • Parquet Table Scan
  • In-memory Table Scan
  • Shuffle

This release supports the following expressions:

  • Column references
  • Literal values
  • Aggregate expressions (MIN, MAX, SUM, AVG, COUNT)
  • Basic math expressions (+, -, *, /)
  • Comparison expressions (<. <=, =, !=, >=, >)
  • Aliased expressions

This release contains the following improvements to the Rust project compared to 0.3.0-alpha-2:

  • Query execution no longer uses async and this has allowed us to remove the dedicated thread in ParquetScanExec.

Known issues:

  • The scheduler is still extremely simple and inefficient.
  • Distributed query performance is not optimized yet.