KSQL provides a SQL-like interface for most of the operations you perform using Kafka Streams. Like Kafka Streams, KSQL is capable of joining multiple streams into a single dataset. In this lab, we will work with joins in KSQL by writing a persistent streaming query that joins two streams. This will give you some hands-on experience with joins in KSQL.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Create Streams for Both Input Topics
Start a KSQL session:
sudo ksql
Set
auto.offset.reset
toearliest
so that all streams will process the existing test data:SET 'auto.offset.reset' = 'earliest';
View the data in the
member_signups
topic:PRINT 'member_signups' FROM BEGINNING;
Create a stream for the
member_signups
topic:CREATE STREAM member_signups (lastname VARCHAR, firstname VARCHAR) WITH (KAFKA_TOPIC='member_signups', VALUE_FORMAT='DELIMITED');
View the data in the
member_contact
topic:PRINT 'member_contact' FROM BEGINNING;
Create a stream for the
member_contact
topic:CREATE STREAM member_contact (email VARCHAR) WITH (KAFKA_TOPIC='member_contact', VALUE_FORMAT='DELIMITED');
- Create a Persistent Streaming Query to Join the Two Streams and Output the Result
Create a persistent streaming query to join the two streams:
CREATE STREAM member_email_list AS SELECT member_signups.firstname, member_signups.lastname, member_contact.email FROM member_signups INNER JOIN member_contact WITHIN 365 DAYS ON member_signups.rowkey = member_contact.rowkey;
Check the output topic to verify the correct data is present:
PRINT 'MEMBER_EMAIL_LIST' FROM BEGINNING;