Data Warehouse VS Data Lake

Both data warehouse and data lake will collect data from various sources, both internal and external.

Data Warehouse

The data warehouse only store structured data and needs a schema beforehand.

The structure is optimized for commercial purpose and decision making.

For example, it can extract orders from a regular database, clean it and transform it, for example grouped by user and purchase date and load it in the data warehouse. It offers a well structured, easy to use and understandable data for operational users.

Data warehouses can also provides insights into pre-defined questions for pre-defined data types. It is useful for operational users to make reports and see key performance metrics easily.

Data Lake

The data lake store structured, unstructured or semi-structured data.

These data can be text content from social medias, logs, data from IoT sensors, … and are stored in their native format.

It allows data scientists, engineers and analysts to access the data in its raw form for deep analysis.

The storage is cheaper than in a data warehouse, and the access quicker.

Cloud Computing tends to use both data warehouses and data lakes, combining those benefits.

Leave a Reply

Your email address will not be published. Required fields are marked *