Working with KSQL Streams

30 minutes
  • 2 Learning Objectives

About this Hands-on Lab

KSQL provides a powerful and flexible interface for Kafka’s stream processing features. With KSQL, you can even build data processing pipelines without needing to write your own Kafka Streams applications. In this lab, we will solve a simple data processing use case using KSQL. We will create a stream from an existing topic, and we will output the data in a processed form to an output topic using a persistent streaming query.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create a Stream to Pull Data in from the Topic
  1. Start a KSQL session:

    sudo ksql
  2. Set auto.offset.reset to earliest:

    SET 'auto.offset.reset' = 'earliest';
  3. Look at the data in the member_signups topic:

    PRINT 'member_signups' FROM BEGINNING;
  4. Create a stream from the topic:

    CREATE STREAM member_signups
      (firstname VARCHAR,
        lastname VARCHAR,
        email_notifications BOOLEAN)
      WITH (KAFKA_TOPIC='member_signups',
        VALUE_FORMAT='DELIMITED');
Create a Persistent Streaming Query to Write Data to the Output Topic in Real Time
  1. Create the persistent streaming query:

    CREATE STREAM member_signups_email AS
      SELECT * FROM member_signups WHERE email_notifications=true;
  2. View the data in the output topic to verify that everything is working:

    PRINT 'MEMBER_SIGNUPS_EMAIL' FROM BEGINNING;

Additional Resources

Your supermarket company has a customer membership program, and they are using Kafka to manage some of the back-end data related to this program. A topic called member_signups contains records that are published when a new customer signs up for the program. Each record contains some data indicating whether or not the customer has agreed to receive email notifications.

The email notification system reads from a Kafka topic, so a topic called member_signups_email needs to be created that contains the new member data, but only for members who have agreed to receive notifications. The company would like to have this data automatically processed in real-time so that consumer applications can appropriately respond when a customer signs up. Luckily, this use case can be accomplished using KSQL persistent streaming queries, so you do not need to write a Kafka Streams application.

The data in member_signups is formatted with the key as the member ID. The value is a comma-delimited list of fields in the form <last name>,<first name>,<email notifications true/false>.

Create a stream that pulls the data from member_signups, and then create a persistent streaming query to filter out records where the email notification value is false and output the result to the member_signups_email topic.

If you get stuck, feel free to check out the solution video, or the detailed instructions under each objective. Good luck!

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?