Share on facebook
Share on twitter
Share on linkedin

How to track, analyze, and visualize user group data using AWS

Banjo Obayomi
Banjo Obayomi

In this post, we discuss how to geocode addresses using Amazon Location Service, how to use Amazon EventBridge and AWS Lambda to transform and load data daily to S3, how to visualize and share the data stored using Amazon QuickSight, and how to automate and orchestrate the entire solution using SAM

The COVID-19 pandemic has shifted many in-person community gatherings to virtual events. New research published in the Journal of Applied Psychology indicates that “Zoom fatigue” is real in the age of virtual meetings, meetups, and social activities. 

The community always asks about approaches to support user groups. To that end, I was looking for a fast, effective, and reliable way to analyze user group data and visualize that into a dashboard. This solution helped gain insights in multiple ways: 

  • Tracking active meetups: Visibility into which user groups have been active in the past 12, 6 or 3 months.
  • Visualizing where members are: Using a map to see where our groups are located, and how big their footprint is.
  • Keeping up with events: With a table ordered by last event timestamp to see which groups recently have had an event

Accelerate your career

Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.


Overview of the solution

This post describes how I built the solution, from geocoding address data to the final visualization on Amazon QuickSight

In this solution, I started by developing Python code to scrape user group data from Meetup

In order to plot each meetup on a map, geolocation data was needed. I used Amazon Location Service to geocode the address into a longitude and latitude coordinate. 

The transformed data is then published to an Amazon Simple Storage Service (Amazon S3) bucket. 

I used Amazon EventBridge to set up a daily job to trigger a lambda function to collect the user group data. The reporting and visualization layer is built using QuickSight. Finally, the entire pipeline is deployed by using AWS SAM.

The following diagram illustrates this architecture.

Collecting user group data

AWS user groups are communities that meet regularly to share ideas, answer questions, and learn about new services and best practices. The user groups use meetup.com to organize their events. I am curious about the groups in Canada and the U.S listed on the User Groups in the Americas page.

I used BeautifulSoup and the requests library to scrape the content from the AWS User Group website. 

The script first gets the meetup URL for each user group through the get_user_group_data function. Based on the presence of certain div attributes, it stores the relevant meetup URL and name in a list to be scrapped. 

Next, the get_meetup_info function iterates through the list and parses the information on each individual meetup page such as number of members, and meetup location. The raw data is saved as a CSV for further processing.

The solution in this post is for demonstration purposes only. We recommend running similar scripts only on your own websites after consulting with the team who manages them, or be sure to follow the terms of service for the website that you’re trying to scrape.

The following shows a sample of the script.

meetup_json = {}
	page = requests.get(meetup_url)
	usergroup_html = page.text
	soup = BeautifulSoup(usergroup_html, "html.parser")

	# Get Meetup Name
	meetup_name = soup.findAll("a", {"class": "groupHomeHeader-groupNameLink"})[0].text

	# Meetup location
	meetup_location = soup.findAll("a", {"class": "groupHomeHeaderInfo-cityLink"})[
    	0
	].text

	# Number of members
	meetup_members = (
    	soup.findAll("a", {"class": "groupHomeHeaderInfo-memberLink"})[0]
    	.text.split(" ")[0]
    	.replace(",", "")
	)

	# Past events
	past_events = (
    	soup.findAll("h3", {"class": "text--sectionTitle text--bold padding--bottom"})[
        	0
    	]
    	.text.split("Past events ")[1]
    	.replace("(", "")
    	.replace(")", "")
	)

Geocoding user groups

In order to plot each meetup group on a map, we need the longitude and latitude for each city in the meetup group. I was able to use Amazon Location Service to geocode each city name into longitude and latitude coordinates using a place index. For more information about creating a place index, see Amazon Location Service Developer Guide.    

Here is an example Python code of using a place index for geocoding.

import boto3

def get_location_data(location: str):
	"""
	Purpose:
    	get location data from name
	Args:
    	location - name of location
	Returns:
    	lat, lng -  latitude and longitude of location
	"""
	client = boto3.client("location")
	response = client.search_place_index_for_text(
    	IndexName="my_place_index", Text=location
	)

	print(response)
	geo_data = response["Results"][0]["Place"]["Geometry"]["Point"]

	#  Example output for Arlington, VA:   'Results': [{'Place': {'Country': 'USA', 'Geometry': {'Point': [-77.08628999999996, 38.89050000000003]}, 'Label': 'Arlington, VA, USA', 'Municipality': 'Arlington', 'Region': 'Virginia', 'SubRegion': 'Arlington County'}}
	lat = geo_data[1]
	lng = geo_data[0]

	print(f"{lat},{lng}")

	return lat, lng

Using SAM to orchestrate deployment

After testing the script locally, the next step was to create a mechanism to run the script daily and store the results in S3. I used the AWS Serverless Application Model (SAM) to create a serverless application that does the following.

  1. Create an S3 bucket
  2. Create a CloudWatch event to trigger every 24 hours
  3. Deploy a Python lambda function to run the data scraping code

Here is the outline used to deploy the serverless application highlighting sample code I used.

1. From a terminal window, initialize a new application
sam init

2. Change directory:
cd ./sam-meetup

3. Update dependencies
* update my_app/requirements.txt

requests
pandas
bs4

4. Update the code
Add in your code to example `my_app/app.py`

import json
import logging

import get_meetup_data


def lambda_handler(event, context):

	logging.info("Getting meetup data")

	try:
    	get_meetup_data.main()
	except Exception as error:
    	logging.error(error)
    	raise error

	return {
    	"statusCode": 200,
    	"body": json.dumps(
        	{
            	"message": "meetup data collected",
        	}
    	),
	}

5. Update template.yml

Globals:
  Function:
	Timeout: 600
Resources:
  S3Bucket:
	Type: 'AWS::S3::Bucket'
	Properties:
  	BucketName: MY_BUCKET_NAME
  GetMeetupDataFunction:
	Type: AWS::Serverless::Function
	Properties:
  	CodeUri: my_app/
  	Handler: app.lambda_handler
  	Policies:
    	- S3WritePolicy:
        	BucketName: MY_BUCKET_NAME
  	Runtime: python3.9
  	Architectures:
    	- x86_64
  	Events:
    	GetMeetupData:
      	Type: Schedule
      	Properties:
        	Schedule: 'rate(24 hours)'
        	Name: MeetupData
        	Description: getMeetupData
        	Enabled: True

6.  Run `sam build`

7. Deploy the application to AWS
sam deploy --guided

For more detailed information on developing SAM applications, check out Getting started with AWS SAM.


Automating AWS Cost Optimization
AWS provides unprecedented value to your business, but using it cost-effectively can be a challenge. In this free, on-demand webinar, you’ll get an overview of AWS cost-optimization tools and strategies.


Visualizing data with QuickSight

To share the user group data, I chose to use QuickSight using Amazon S3 as the data source.

QuickSight is a native AWS service that seamlessly integrates with other AWS services such as Amazon Redshift, Athena, Amazon S3, and many other data sources.

As a fully managed service, QuickSight enabled the team to easily create and publish interactive dashboards. In addition to building powerful visualizations, QuickSight provides data preparation tools that make it easy to filter and transform the data into the exact needed dataset. For more information about creating a dataset, see Creating a Dataset Using Amazon S3 Files.

The following are example screenshots from the dashboard.


Get a crash course on Amazon QuickSight and how to put eyes on your data with this AWS BI tool.


Conclusion

In this post, we discussed how to successfully achieve the following:

  • Geocode addresses using Amazon Location Service
  • Use Amazon EventBridge and AWS Lambda to transform and load the data daily to S3
  • Visualize and share the data stored using Amazon QuickSight
  • Automate and orchestrate the entire solution using SAM

This solution can be used to gain insights into engaging with technical communities. If you’re interested in participating in your local community, check out the AWS user group page here.

About the Author

Banjo is a Senior Developer Advocate at AWS, where he helps builders get excited about using AWS. Banjo is passionate about operationalizing data and has started a podcast, a meetup, and open-source projects around utilizing data. When not building the next big thing, Banjo likes to relax by playing video games especially JRPGs and exploring events happening around him.

Recommended

Get more insights, news, and assorted awesomeness around all things cloud learning.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?