Warp Solutions: Parquet <-> WarpStream

1 month ago
2

Series: Warp Solutions
Subject: Use Bento and WarpStream to Parquet and query with DuckDB

Apache Parquet is an open-source, column-oriented data file format designed for efficient data storage and retrieval. It forms the backbone of many datalake and table format systems. In this Solution, Shawn will create a small pipeline script with the popular open-source Bento tool, to read from a topic in a WarpStream cluster, and write batches of Parquet files which are then queried with #DuckDB.

WarpStream - www.warpstream.com
Parquet - parquet.apache.org
Bento docs - https://warpstreamlabs.github.io/bento/docs/guides/getting_started/

#apachekafka #apacheiceberg #parquet #datastreaming #dataengineering #duckdb #bento

Loading comments...