Joining Datasets with KSQL

30 minutes
  • 2 Learning Objectives

About this Hands-on Lab

KSQL provides a SQL-like interface for most of the operations you perform using Kafka Streams. Like Kafka Streams, KSQL is capable of joining multiple streams into a single dataset. In this lab, we will work with joins in KSQL by writing a persistent streaming query that joins two streams. This will give you some hands-on experience with joins in KSQL.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create Streams for Both Input Topics
  1. Start a KSQL session:

    sudo ksql
  2. Set auto.offset.reset to earliest so that all streams will process the existing test data:

    SET 'auto.offset.reset' = 'earliest';
  3. View the data in the member_signups topic:

    PRINT 'member_signups' FROM BEGINNING;
  4. Create a stream for the member_signups topic:

    CREATE STREAM member_signups
      (lastname VARCHAR,
        firstname VARCHAR)
      WITH (KAFKA_TOPIC='member_signups',
        VALUE_FORMAT='DELIMITED');
  5. View the data in the member_contact topic:

    PRINT 'member_contact' FROM BEGINNING;
  6. Create a stream for the member_contact topic:

    CREATE STREAM member_contact
      (email VARCHAR)
      WITH (KAFKA_TOPIC='member_contact',
        VALUE_FORMAT='DELIMITED');
Create a Persistent Streaming Query to Join the Two Streams and Output the Result
  1. Create a persistent streaming query to join the two streams:

    CREATE STREAM member_email_list AS
      SELECT member_signups.firstname, member_signups.lastname, member_contact.email
      FROM member_signups
      INNER JOIN member_contact WITHIN 365 DAYS ON member_signups.rowkey = member_contact.rowkey;
  2. Check the output topic to verify the correct data is present:

    PRINT 'MEMBER_EMAIL_LIST' FROM BEGINNING;

Additional Resources

Your supermarket company has a customer membership program. Some of the data for this program is managed using Kafka. There are currently two relevant topics:

  • member_signups — Key: member ID, value: Customer name.
  • member_contact — Key: member ID, value: Customer email address.

The company would like to send an email to new members when they join. This email needs to contain the customer's name, and it needs to be sent to the customer's email address, but these pieces of data are currently in two different topics. Using KSQL, create a persistent streaming query to join the customer names and email addresses, and stream the result to an output topic called MEMBER_EMAIL_LIST.

If you get stuck, feel free to check out the solution video, or the detailed instructions under each objective. Good luck!

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?