More Regular Expressions: Using Groups to Process Text

30 minutes
  • 2 Learning Objectives

About this Hands-on Lab

The use of classes and grouping within regular expressions allows us to fine-tune how we manipulate and reference our text. Capturing groups, in particular, provide us the opportunity to take parts of our match and use them not only in the expression itself but within our greater code or command. In this learning activity, it is necessary to use groups to retrieve the needed parts of the expression, then process that output so humans can understand and make use of the data.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create `referrers.txt`.

Generate a file that contains the page and referrer, pulled from Apache access logs.

Ensure `referrers.txt` contains page and referrer content.

Ensure the referrers.txt file is formatted as dictated by the instructions.

Additional Resources

Marketing needs more information about where our website's hits are coming from. Luckily, we have access to our Apache access logs, which contain the referrer URL. However, we need to clean up these logs, so they are human-readable for our non-technical co-workers. Using regular expressions, with a focus on capturing groups, craft an expression that outputs both the name of the page being accessed and the overall website from which it was referred. Marketing would like it to be in the following format:

Page: PAGE NAME
Referrer: REFERRER

So, if given the following log, we would want to capture the /linux and linuxacademy.com/blog – be sure to strip out the http: and www. from the URL where relevant.

198.51.100.8 - - [30/Oct/2018:00:46:55 ] "GET /linux HTTP/1.0" 200 2733 "http://linuxacademy.com/blog" "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25

Similarly, /linux, /security, scaleyourcode.com and linuxacademy.com would be selected in the below examples:

192.0.2.12 - - [30/Oct/2018:00:03:17 ] "GET /linux HTTP/1.0" 200 2653 "http://scaleyourcode.com" "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25"

203.0.113.532 - - [30/Oct/2018:00:14:48 ] "GET /security HTTP/1.0" 200 4654 "http://www.linuxacademy.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36"

To generate the appropriate output, this can be done with a Perl script or one-liner. Use the following as a baseline:

perl -lne '/REGULAR EXPRESSION HERE/' && print "OUTPUT INFORMATION"' access-logs

The output information can include text as well as any references. Use the n shorthand to create a new line.

Save the file as referrers.txt.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?