What is Apache XTable?
Confused between Apache Hudi, Apache Iceberg and Delta Lake? - Don't Worry, Apache XTable is here!
If you have explored lakehouse architectures, you would be aware of the different open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, that you can use to implement lakehouse storage. These three are the most widely adopted open table formats that help add transactional capabilities to a data lake.
All these open table formats add a transactional/metadata layer on top of data files. While all three store data in the underlying data files like Parquet, their metadata format differs. Each has a different approach to storing and handling metadata and transaction logs.
When choosing an open table format for your lakehouse, you can often get confused between these formats. Selecting one of these formats is difficult, and this decision is based on multiple factors. To understand these factors in detail, refer to Chapter 3 of my book Practical Lakehouse Architecture.
Using a single format can restrict your choice of tools and services for implementing your lakehouse. This has been one of the top challenges when designing a lakehouse platform.
Enter Apache XTable
What is Apache XTable
Apache XTable (formerly known as Onetable by Onehouse.ai) translates metadata of one format to another. For example, data written in Delta Lake format can be converted into Iceberg by translating the metadata of Delta Lake to Iceberg.
It supports the interoperability between the open table formats. It is not a new table format but just an abstraction for the open table format’s metadata. It is an open-source initiative supported by Onehouse, Microsoft, and others.
Benefits
Apache XTable can help implement an open data lakehouse that is not restricted to any specific tool or platform. It removes the restriction to use particular services and compute engines that support only specific table formats, eliminating vendor lock-ins that support only specific table formats.
Microsoft Fabric recently announced its support for Apache Iceberg and bidirectional access between Snowflake (with Iceberg tables) and Fabric. It uses Apache XTable for the Delta to Iceberg conversion. Because of the interoperability offered by Apache XTable, we might see more such integrations across different ecosystems in the future.
Apache XTable and Delta UniForm can certainly open many new opportunities. Exciting times if you are building a lakehouse!