Apache Kafka is an open-source file processing platform developed by the Apache Software Foundation in Scala and Java . The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a massively scalable pub / sub message queue architected as a distributed transaction log, ” [3] making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import / export) via Kafka Connect and provides Kafka Streams, a Java stream processing library .

The design is heavily influenced by transaction logs . [4]

History

Apache Kafka was originally developed by LinkedIn and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In November 2014, several engineers who worked on Kafka at LinkedIn created a new company named Confluent [5] with A focus on Kafka. According to a quora post from 2014, Jay Kreps seems to have named after the author Franz Kafka . Kreps thing to name the system after an author because it is a system optimized for writing, and he liked Kafka’s work. [6]

Description

Overview of Kafka

Kafka blinds messages which come from arbitrarily many processes called “producers”. The data can be partitioned in different “partitions” within different “topics”. Within a partition the messages are indexed and stored together with a timestamp. Other processes called “consumers” can query messages from partitions. Kafka runs on a cluster of one or more servers and the partitions can be distributed across cluster nodes.

Kafka performance

Due to its widespread integration into enterprise-level infrastructures, monitoring Kafka performance at scale has become an increasingly important issue. Monitoring end-to-end performance requires tracking metrics from brokers, consumer, and producteurs, in addition to monitoring ZooKeeper qui is used by Kafka for coordination Among Consumers. [7] [8] There are currently several monitoring platforms to track Kafka performance, either open-source, like LinkedIn’s Burrow, or paid, like Datadog . In addition to these platforms, Kafka data collection can also be performed using commonly bundled Java tools, including JConsole. [9]

Enterprises that use Kafka

The following is a list of companies that have used or are using Kafka:

  • Betfair [10]
  • Cisco Systems [11]
  • CloudFlare [12]
  • Conviva [13]
  • Daumkakao [14]
  • EBay [15]
  • Hyperledger Fabric [16]
  • HubSpot [17]
  • Netflix [18]
  • PayPal [19]
  • Shopify [20]
  • Sift Science [21]
  • Spotify [22]
  • Ticketmaster [23]
  • Uber [24]
  • Walmart [25]
  • Yelp Real People.

See also

  • Apache ActiveMQ
  • Apache Flink
  • Apache Qpid
  • Apache Samza
  • Apache Spark Streaming
  • Data Distribution Service
  • Enterprise Integration Patterns
  • Enterprise messaging system
  • Streaming analytics
  • Event-driven SOA
  • Message-oriented middleware
  • Service-oriented architecture
  • StormMQ

References

  1. Jump up^ “Mirror of Apache Kafka at GitHub]” . Github.com . Retrieved 6 March 2017 .
  2. Jump up^ “Open-sourcing Kafka, LinkedIn’s distributed message queue”. Retrieved 27 October 2016 .
  3. Jump up^ Monitoring Kafka performance metrics, Datadog Engineering Blog, accessed 23 May 2016 /
  4. Jump up^ The Log: What every software engineer should know about real-time data’s unifying abstraction, LinkedIn Engineering Blog, accessed 5 May 2014
  5. Jump up^ Primack, Dan. “LinkedIn engineers spin out to launch ‘Kafka’ startup Confluent” . Fortune.com . Retrieved 10 February 2015.
  6. Jump up^ “What is the relationship between Kafka, the writer, and Apache Kafka, the distributed messaging system?” . Quora . Retrieved 2017-06-12 .
  7. Jump up^ “Monitoring Kafka performance metrics” . 2016-04-06 . Retrieved 2016-10-05 .
  8. Jump up^ Mouzakitis, Evan (2016-04-06). “Monitoring Kafka performance metrics” . Datadoghq.com . Retrieved 2016-10-05 .
  9. Jump up^ “Collecting Kafka performance metrics – Datadog” . 2016-04-06 . Retrieved 2016-10-05 .
  10. Jump up^ “Exchange Market Data Streaming with Kafka” .
  11. Jump up^ “OpenSOC: An Open Commitment to Security” . Cisco blog . Retrieved 2016-02-03 .
  12. Jump up^ “More data, more data” .
  13. Jump up^ “Conviva home page” . Conviva . 2017-02-28 . Retrieved 2017-05-16 .
  14. Jump up^ Doyung Yoon. “S2Graph: A Large-Scale Graph Database with HBase” .
  15. Jump up^ “Kafka Usage in Ebay Communications Delivery Pipeline” .
  16. Jump up^ “Cryptography and Protocols in Hyperledger Fabric” (PDF) . January 2017 . Retrieved 2017-05-05 .
  17. Jump up^ “Kafka at HubSpot: Critical Consumer Metrics” .
  18. Jump up^ Cheolsoo Park and Ashwin Shankar. “Netflix: Integrating Spark at Petabyte Scale” .
  19. Jump up^ Shibi Sudhakaran of PayPal. “PayPal: Creating a Central Data Backbone: Couchbase Server to Kafka to Hack and Back (talk at Couchbase Connect 2015)” . Couchbase . Retrieved 2016-02-03 .
  20. Jump up^ “Shopify – Sarama is a Go library for Apache Kafka” .
  21. Jump up^ “Concurrent and At Least Once Semantics with the New Kafka Consumer” .
  22. Jump up^ Josh Baer. “How Apache Drives Spotify’s Music Recommendations” .
  23. Jump up^ Patrick Hechinger. “CTOs to Know: Meet Ticketmaster’s Jody Mulkey” .
  24. Jump up^ “Stream Processing in Uber” . InfoQ . Retrieved 2015-12-06 .
  25. Jump up^ “Apache Kafka for Item Setup” . Medium.com . Retrieved 2017-06-12 .
  26. Jump up^ “Streaming Messages from Kafka into Redshift in near Real-Time” . Yelp Real People . Retrieved 2017-07-19 .

Leave a Reply

Your email address will not be published. Required fields are marked *