The benchmarks use the NYC Taxi data set, specifically the Yellow Taxi data for 2019. The data set is 7.3 GB in CSV format.
Query 1 is a simple aggregate query and is executed against the NYC Taxi 2019 data set in CSV format. Here is query expressed in SQL.
SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), SUM(fare_amount) FROM tripdata GROUP BY passenger_count