Data Warehouse: Classic Use Cases for Hadoop in DW

Enterprise Data Warehousing (EDW) has been a mainstay of many major corporations for the last 20 years. However, with the tremendous growth of data (doubling every two years), the enterprise data warehouses are exceeding their capacity too quickly. Load processing windows are similarly being maxed out, adversely affecting service and threatening the delivery of critical business insights. So it becomes very expensive for organisations to process and maintain large datasets.

CAP Theorem: How a distributed system can provide C + A without P?

June 28, 2015June 28, 2015 Bikash Sen Community availability, Brewer’s theorem, cap theorem, consistency, distributed system, network partitions, partition tolerance

The CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency – all nodes always give the same result.
Availability – a guarantee that nodes always answer queries and accept updated.
Partition tolerance – system continues working even if one or more nodes become silent or not responsive.

Hadoop: MapReduce Vs Spark

June 27, 2015June 27, 2015 Bikash Sen Community apache spark, hadoop, hadoop mapreduce, iterative query, mapreduce, mapreduce vs spark, memory based processing, rdd, spark, when to use spark

Sometimes I came across a question “Is Apache Spark going to replace Hadoop MapReduce?“. It depends based on your use cases. Here I tried to explained features of Apache Spark and Hadoop MapReduce as data processing. I hope this blog post will help to answer some of your questions which might have coming to your mind these days.

Data Warehouse: Teradata Vs Hadoop

May 26, 2015 Bikash Sen Community data warehouse, etl, hadoop, hadoop data warehouse, hive sql, massively parallel processing, mpp database, parallel database, teradata, teradata vs hadoop

Teradata is a fully horizontal scalable relational database management system (RDBMS). In other words, Massively Parallel Processing (MPP) database systems based on a cluster of commodity hardware (computers) called “shared-nothing” nodes (each node has separate CPU, memory, and disks to process data locally) connected through a high-speed interconnect. Horizontal partitioning of relational tables, along with the parallel execution of SQL queries.

Google Gives Us A Big Data Map

February 11, 2015February 11, 2015 Bikash Sen Community Google, Google big data

The progress of Hadoop (Hadoop eco-System) was greatly influenced by Google. The challenge to tame this eruption of big data was recognized and accepted by the data engineers at Google as early as late nineties and early 2000s. As a company on the path to becoming synonymous with global search, they were not only trying to tame this tsunami of big data,

Hadoop ABCD

Let's Do Big Data…

Community

Data Warehouse: Classic Use Cases for Hadoop in DW

CAP Theorem: How a distributed system can provide C + A without P?

Hadoop: MapReduce Vs Spark

Data Warehouse: Teradata Vs Hadoop

Google Gives Us A Big Data Map