Kafka: Building a Real-Time Data Pipeline

Apache Kafka (latest version 0.8.2.1) is an open-source distributed publish-subscribe messaging system for data integration that was originally developed at LinkedIn and written in Scala. The project aims to provide collecting and delivering huge volume of log data with low latency for handling real-time data feeds through data pipeline (data motion from one point to another). The design is heavily influenced by log processing.

Read More…