Evolving an Avro Schema in a Kafka Application

30 minutes
  • 3 Learning Objectives

About this Hands-on Lab

Confluent Schema Registry is a useful tool for coordinating contracts between producers and consumers, as well as simplifies the process of serializing and deserializing complex data objects. However, it also provides some powerful functionality to help you manage changes to your data schemas. In this lab, you will have the opportunity to make a change to an existing schema by adding a new field. This will give you some hands-on experience with the process of evolving a schema using the Confluent Schema Registry.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Clone the Starter Project and Run It to Make Sure Everything Is Working
  1. Clone the starter project into your home directory:

    cd ~/
    git clone https://github.com/linuxacademy/content-ccdak-schema-evolve-lab.git
  2. Run the code to ensure it works before modifying it:

    cd content-ccdak-schema-evolve-lab
    ./gradlew runProducer
    ./gradlew runConsumer

    Note: The consumer should output some records that were created by the producer.

Update the Purchase Schema to Add the `member_id` Field
  1. Edit the schema definition file:

    vi src/main/avro/com/linuxacademy/ccdak/schemaregistry/Purchase.avsc
  2. Add the member_id field with a blank default:

    {
    "namespace": "com.linuxacademy.ccdak.schemaregistry",
    "compatibility": "FORWARD",
    "type": "record",
    "name": "Purchase",
    "fields": [
      {"name": "id", "type": "int"},
      {"name": "product", "type": "string"},
      {"name": "quantity", "type": "int"},
      {"name": "member_id", "type": "int", "default": 0}
    ]
    }
Update the Producer to Set the `member_id` for the Records It Publishes
  1. Edit the schema definition file:

    vi src/main/avro/com/linuxacademy/ccdak/schemaregistry/Purchase.avsc
  2. Add the member_id field with a blank default:

    {
    "namespace": "com.linuxacademy.ccdak.schemaregistry",
    "compatibility": "FORWARD",
    "type": "record",
    "name": "Purchase",
    "fields": [
      {"name": "id", "type": "int"},
      {"name": "product", "type": "string"},
      {"name": "quantity", "type": "int"},
      {"name": "member_id", "type": "int", "default": 0}
    ]
    }

Update the Producer to Set the member_id for the Records It Publishes

  1. Edit the producer Main class:

    vi src/main/java/com/linuxacademy/ccdak/schemaregistry/ProducerMain.java
  2. Implement the new member_id field in the producer by setting it for the records being produced:

    package com.linuxacademy.ccdak.schemaregistry;
    
    import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig;
    import io.confluent.kafka.serializers.KafkaAvroSerializer;
    import java.util.Properties;
    import org.apache.kafka.clients.producer.KafkaProducer;
    import org.apache.kafka.clients.producer.ProducerConfig;
    import org.apache.kafka.clients.producer.ProducerRecord;
    import org.apache.kafka.common.serialization.StringSerializer;
    
    public class ProducerMain {
    
      public static void main(String[] args) {
          final Properties props = new Properties();
          props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
          props.put(ProducerConfig.ACKS_CONFIG, "all");
          props.put(ProducerConfig.RETRIES_CONFIG, 0);
          props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
          props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
          props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
    
          KafkaProducer<String, Purchase> producer = new KafkaProducer<String, Purchase>(props);
    
          Purchase apples = new Purchase(1, "apples", 17, 77543);
          producer.send(new ProducerRecord<String, Purchase>("inventory_purchases", apples.getId().toString(), apples));
    
          Purchase oranges = new Purchase(2, "oranges", 5, 56878);
          producer.send(new ProducerRecord<String, Purchase>("inventory_purchases", oranges.getId().toString(), oranges));
    
          producer.close();
      }
    
    }
  3. Run the producer:

    ./gradlew runProducer
  4. Run the consumer:

    ./gradlew runConsumer
  5. Verify the data in the output file. We should see the new member_id data in the last lines of the file:

    cat /home/cloud_user/output/output.txt

Additional Resources

Your supermarket company is using Kafka to track changes in inventory as purchases occur. There is a topic for this data called inventory_purchases, plus a producer and consumer that interact with that topic. The producer and consumer are using an Avro schema to serialize and deserialize the data.

The company has a membership program for customers, and members of the program each have a member ID. The company would like to start tracking this member ID with the data in inventory_purchases. This field should be optional, however, since not all customers participate in the membership program.

Add a new field called member_id to the Purchase schema. Make this field optional with a 0 default. Then, update the producer to set this new field on the records it is producing. Run the producer and consumer to verify that everything works.

The consumer writes its output to a data file located at /home/cloud_user/output/output.txt. Once all changes are made, and everything is working, you should see the member_id field reflected in the data written to that file.

There is a starter project on GitHub at https://github.com/linuxacademy/content-ccdak-schema-evolve-lab.git. Clone this project to the broker and edit its files to implement the solution.

If you get stuck, feel free to check out the solution video, or the detailed instructions under each objective. Good luck!

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!