Streaming Data Using Kafka Streams to Count Words

1 hour
  • 4 Learning Objectives

About this Hands-on Lab

Kafka Streams is a library enabling you to perform per-event processing of records. You can use it to process data as soon as it arrives, versus having to wait for a batch to occur. In this hands-on lab, we use Kafka Streams to stream some input data as plain-text and process it in real-time. With this, we can count the number of words from our input stream.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create the Input and Output Topic

Create the input topic named streams-plaintext-input.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-plaintext-input

Create the output topic named streams-wordcount-output.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-wordcount-output
Open a Kafka Console Producer

Open a Kafka console producer.

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-plaintext-input

Type the three messages from the instructions.

>kafka streams is great
>kafka processes messages in real time
>kafka helps real information streams
Open a Kafka Console Consumer

Open a Kafka console consumer using the default message formatter and the three properties given in the instructions.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
--topic streams-wordcount-output 
--from-beginning 
--formatter kafka.tools.DefaultMessageFormatter 
--property print.key=true 
--property print.value=true 
--property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer 
--property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
Run the Kafka Streams Application

Use the kafka-run-class.sh command to run the WordCountDemo application.

bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

Additional Resources

In this hands-on lab, we use the WordCount demo application that comes with the Kafka binaries. This application is already built, so we won't create the application from scratch. We need to create an input topic, an output topic, and then use the WordCount Streaming Application to count the number of words in the input stream using the Kafka console consumer. We do this by passing in the apprpriate properties to the console consumer to format, serialize, and deserialize the data into the correct output for viewing in the console. When we have an output of the count of each word in the console, we've successfully completed this lab.

We need to create the Kafka cluster in order to proceed with creating our first topic.

Here are the instructions for starting the Kafka cluster:

  1. Use Docker Compose to build the Kafka Cluster.

    cd content-kafka-deep-dive
    
    docker-compose up -d --build
  2. Install Java.
    sudo apt install default-jdk
  3. Unzip the Kafka binaries tar file located in /home/cloud_user
    tar -xvf kafka_2.12-2.2.0.tgz
  4. Change the name of the 'kakfa_2.12-2.2.0' to 'kafka'
    mv kafka_2.12-2.2.0 kafka

Perform all commands in this hands-on lab from within the ~/kafka directory:

Follow these requirements in order to complete this Hands-on lab:

  • Create a topic named streams-plaintext-input.
  • Create a topic named streams-wordcount-output.
  • Open the Kafka console producer and write the following messages:
    • kafka streams is great
    • kafka processes messages in real time
    • kafka helps real information streams
  • Open a Kafka console consumer to the output topic to read messages from the beginning using the default message formatter with the following properties:
    • print.key=true
    • print.value=true
    • key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
    • value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
  • Run the org.apache.kafka.streams.examples.wordcount.WordCountDemo application using the kafka-run-class.sh command.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?