Ballista 0.3.009 Aug 2020
The goal of the 0.3.0 release is to provide a minimum viable product of distributed compute in Rust. It is now possible to run a query that is very close to TPC-H query 1 on a distributed cluster with reasonable performance. Performance and scalability is comparable to Apache Spark (within the range of 2x slower to 2x faster based on initial benchmarks).
Performance tuning will be one of the main areas of focus for the 0.4.0 release.
Please refer to the user guide for installation instructions.
This release supports the following operators:
- Hash Aggregate
- CSV Table Scan
- Parquet Table Scan
- In-memory Table Scan
This release supports the following expressions:
- Column references
- Literal values
- Aggregate expressions (MIN, MAX, SUM, AVG, COUNT)
- Basic math expressions (+, -, *, /)
- Comparison expressions (<. <=, =, !=, >=, >)
- Aliased expressions
This release contains the following improvements to the Rust project compared to 0.3.0-alpha-2:
- Query execution no longer uses
asyncand this has allowed us to remove the dedicated thread in
- The scheduler is still extremely simple and inefficient.
- Distributed query performance is not optimized yet.