Introduction:
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
Kafka is a distributed, partitioned, replicated commit log service. It
provides the functionality of a messaging system, but with a unique
design.
Fast
A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
Scalable
Kafka is designed to allow a single cluster to serve as the central data
backbone for a large organization. It can be elastically and
transparently expanded without downtime. Data streams are partitioned
and spread over a cluster of machines to allow data streams larger than
the capability of any single machine and to allow clusters of
co-ordinated consumers
Durable
Messages are persisted on disk and replicated within the cluster to
prevent data loss. Each broker can handle terabytes of messages without
performance impact.
Distributed by Design
Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.Key characteristics of Apache Kafka:
- Constant-time (O(1)) performance even with increased data loads
- Real-time focus – immediate consumption of produced messages
- Distribution of message consumption over a cluster of machines
- Consumers hold the state of message ordering, allowing consumers to “rollback” time and review old messages
- Designed to support millions of messages per second.
- Persistent messaging with O(1) disk structures that provide constant time performance even with many TB of stored messages.
- High-throughput: even with very modest hardware Kafka can support hundreds of thousands of messages per second.
- Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
- Support for parallel data load into Hadoop.
Cluster configuration
Steps to setting up a clustered (multi-node, multi-broker) Kafka configuration:- Download Apache Kafka to every node in your cluster, assume it is installed at $KAFKA_HOME
- Add X number of configuration files under $KAFKA_HOME/config , each representing the configuration of a single broker. Copy the contents of $KAFKA_HOME/config/server.properties into something like $KAFKA_HOME/config/server-X.properties
- For each configuration file:
- Make sure to change the broker.id property for each configuration file to be an integer representing the individual broker
- Set the zookeeper.connect property to the ensemble of nodes that your Zookeeper instance is running on. NOTE: it’s good practice to add a directory name at the end of the Zookeeper URL and port to enable sharing of Zookeeper with other applications. e.g. host:port/kafka
- For very data heavy real-time applications, consider setting log.retention.hours=1 and log.cleaner.enable=true to prevent too much log data being written to disk
references :
http://kafka.apache.org/
http://hortonworks.com/hadoop/kafka/
http://tech.lalitbhatt.net/2014/07/apache-kafka-tutorial.html
http://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
0 comments:
Post a Comment