Data Warehouse: Classic Use Cases for Hadoop in DW

Enterprise Data Warehousing (EDW) has been a mainstay of many major corporations for the last 20 years. However, with the tremendous growth of data (doubling every two years), the enterprise data warehouses are exceeding their capacity too quickly. Load processing windows are similarly being maxed out, adversely affecting service and threatening the delivery of critical business insights. So it becomes very expensive for organisations to process and maintain large datasets.

So how can organizations tackle this challenge? The answer lies in Hadoop. Deploying Hadoop as part of the data warehouse infrastructure can help organizations dramatically reduce costs; improve quality of service for query, analytics, and reporting; and provide flexibility for tapping more data for business intelligence.

Offload “ETL process” from DW to Hadoop

One of the major opportunities to breathe new life into data warehouses is migrating heavy ETL (Extract-Transform-Load) processing to Hadoop, leading to remove restricting the data, faster processing times and lower costs. First, raw data (e.g. XML, JSON, images, movies, text files, spreadsheets, log files and many more which is limited in traditional DW systems as it only manage and process structured data) from source systems (raw data automatically allows us to integrate with any kind of source systems which is also limited in traditional DW systems) is loaded as-is (schema-less write) into Hadoop. Organizations can then leverage the cluster processing in Hadoop to transform the data into the required data models. The transformed data is then loaded from Hadoop into the existing data warehouse(s). In this use case, Hadoop front-ending traditional Data Warehouse.

Offload “Cold Data” from DW to Hadoop

Another opportunity is in moving “cold data” – infrequently used, inactive or dormant data – from the data warehouse into Hadoop. This frees up capacity and improves performance in the current data warehouse while still providing access to cold data for queries as needed. In fact, cold data in this approach can even be mined for additional insights or combined with other data. Since Hadoop storage costs are much lower than typical DW costs this also saves on storage costs (versus adding capacity to the existing DW infrastructure). In this use case, Hadoop back-ending traditional Data Warehouse.

Leave a comment