Enterprise Data Warehousing (EDW) has been a mainstay of many major corporations for the last 20 years. However, with the tremendous growth of data (doubling every two years), the enterprise data warehouses are exceeding their capacity too quickly. Load processing windows are similarly being maxed out, adversely affecting service and threatening the delivery of critical business insights. So it becomes very expensive for organisations to process and maintain large datasets.
Community
CAP Theorem: How a distributed system can provide C + A without P?
The CAP theorem, also known as Brewer’s theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
- Consistency – all nodes always give the same result.
- Availability – a guarantee that nodes always answer queries and accept updated.
- Partition tolerance – system continues working even if one or more nodes become silent or not responsive.
Hadoop: MapReduce Vs Spark
Sometimes I came across a question “Is Apache Spark going to replace Hadoop MapReduce?“. It depends based on your use cases. Here I tried to explained features of Apache Spark and Hadoop MapReduce as data processing. I hope this blog post will help to answer some of your questions which might have coming to your mind these days.
Data Warehouse: Teradata Vs Hadoop
Teradata is a fully horizontal scalable relational database management system (RDBMS). In other words, Massively Parallel Processing (MPP) database systems based on a cluster of commodity hardware (computers) called “shared-nothing” nodes (each node has separate CPU, memory, and disks to process data locally) connected through a high-speed interconnect. Horizontal partitioning of relational tables, along with the parallel execution of SQL queries.
Google Gives Us A Big Data Map
The progress of Hadoop (Hadoop eco-System) was greatly influenced by Google. The challenge to tame this eruption of big data was recognized and accepted by the data engineers at Google as early as late nineties and early 2000s. As a company on the path to becoming synonymous with global search, they were not only trying to tame this tsunami of big data,