Apache Kafka ~ Techie's Notes

Introduction:

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Fast

A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.

Scalable

Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers

Durable

Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.

Distributed by Design

Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

Key characteristics of Apache Kafka:

Constant-time (O(1)) performance even with increased data loads
Real-time focus – immediate consumption of produced messages
Distribution of message consumption over a cluster of machines
Consumers hold the state of message ordering, allowing consumers to “rollback” time and review old messages

Designed to support the following

Designed to support millions of messages per second.
Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second.
Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
Support for parallel data load into Hadoop.

Apache Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by “logging” and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to an offline analysis system like Hadoop, but is very limiting for building real-time processing. Kafka aims to unify offline and online processing by providing a mechanism for parallel load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines.

Cluster configuration

Steps to setting up a clustered (multi-node, multi-broker) Kafka configuration:

Download Apache Kafka to every node in your cluster, assume it is installed at $KAFKA_HOME
Add X number of configuration files under $KAFKA_HOME/config , each representing the configuration of a single broker. Copy the contents of $KAFKA_HOME/config/server.properties into something like $KAFKA_HOME/config/server-X.properties
For each configuration file:
1. Make sure to change the broker.id property for each configuration file to be an integer representing the individual broker
2. Set the zookeeper.connect property to the ensemble of nodes that your Zookeeper instance is running on. NOTE: it’s good practice to add a directory name at the end of the Zookeeper URL and port to enable sharing of Zookeeper with other applications. e.g. host:port/kafka
3. For very data heavy real-time applications, consider setting log.retention.hours=1 and log.cleaner.enable=true to prevent too much log data being written to disk

references :
http://kafka.apache.org/
http://hortonworks.com/hadoop/kafka/
http://tech.lalitbhatt.net/2014/07/apache-kafka-tutorial.html
http://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

Techie's Notes

while( !(succeed=try())){}

Popular Posts

Recent Posts

Categories

Blog Archive

Contributors

Followers

Total Pageviews

Search This Blog

Blogroll

About

Blogger templates

Wednesday, 12 August 2015

Apache Kafka

Introduction:

0 comments:

Post a Comment

Popular Posts

Recent Posts

Categories

Blog Archive

Contributors

Subscribe To

Followers

Total Pageviews

Search This Blog

Blogroll

About

Blogger templates

Wednesday, 12 August 2015

Introduction:

0 comments:

Post a Comment