  
Apache Parquet is an open source data file format that was designed to improve performance when handling [[Column-oriented Database|column-oriented data]] in bulk. Apache Parquet is able to provide efficient compression and encoding schemes with enhanced performance due to its design. This makes it a common interchange format for both batch and interactive workloads, similar to other available columnar-storage file formats in Hadoop like RCFile and ORC.
**Extension:** `.parquet`
## Apache Parquet Official Documentation
https://parquet.apache.org/docs/
## Apache Parquet Advantages
- Reduces IO operations.
- Column-based format makes it more efficient in terms of storage space but also speeds up analytics queries.
- Highly efficient data compression and decompression.
- Support type-specific encoding.
- Supports several data types and nested data structures.
## Apache Parquet Disadvantages
- Not human readable (binary).
- More memory required to read data vs row-based format.
- Can be slower to write than row-based file formats because of the metadata overhead.
%% wiki footer: Please don't edit anything below this line %%
## This note in GitHub
<span class="git-footer">[Edit In GitHub](https://github.dev/data-engineering-community/data-engineering-wiki/blob/main/Tools/File%20Formats/Apache%20Parquet.md "git-hub-edit-note") | [Copy this note](https://raw.githubusercontent.com/data-engineering-community/data-engineering-wiki/main/Tools/File%20Formats/Apache%20Parquet.md "git-hub-copy-note")</span>
<span class="git-footer">Was this page helpful?
[👍](https://tally.so/r/mOaxjk?rating=Yes&url=https://dataengineering.wiki/Tools/File%20Formats/Apache%20Parquet) or [👎](https://tally.so/r/mOaxjk?rating=No&url=https://dataengineering.wiki/Tools/File%20Formats/Apache%20Parquet)</span>