In this post, we share some tips on how to troubleshoot your troubleshooting efforts for everything from IT errors to Kubernetes issues when you’re not even sure what to Google.
Here at A Cloud Guru, we’re big fans of learning new things, whether it’s just getting started with something new, or getting as in depth on a specialized skill as possible. Unfortunately, despite our best efforts, we will not have a lesson or lab for every single error or issue someone somewhere may encounter.
*Ducks under my desk to avoid my boss’s glare.*
Anyway, until our highly anticipated “All the Cloud Things You Could Ever Need to Know” course drops — it keeps getting delayed as we find new things to include — I wanted to go over some strategies for when you come up against an error that is so poorly worded or context dependent that you’re not even sure what to Google.
As a coding bootcamp grad with more pluck than formal IT education, these are the techniques I’ve learned and honed to help me when I’ve been completely out of my depth. They are very flexible; I’ve yet to find a problem I can’t apply them to, and it’s my hope in sharing them here that you will find them useful when your only other idea is to throw up your hands and walk away.
Accelerate your career
Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.
1. Five Whys
The Five Whys is a fairly common technique to identify causes of specific conditions.
The idea is that the person seeking a solution makes a statement — what the problem is. The respondent/mentor/chorus responds “why?” and the seeker needs to answer.
Whatever the seeker responds with, the respondent again asks “why?”
The idea is that within five repetitions of this cycle, the seeker will have identified the underlying problem. In practice, it can look something like this:
Seeker: “My Pod won’t create in the cluster.”
Seeker: *Checks error message* “There isn’t sufficient capacity in the cluster to schedule it.”
Seeker: *Check cluster utilization* “There are too many other pods with high CPU requests running”
*Note that you can stop here — delete unneeded pods — or continue*
Seeker: *Checks pod activity* “The pods have requested more CPU than any are using.”
*You can stop here — update pods to have more reasonable CPU requests*
Seeker: *Checks with owners of the pods* “We don’t have a good way to identify how much CPU a pod realistically needs.”
As you can see, after the second “why” we come away with an actionable insight, but if we keep going, we can get to more systemic issues at play. We can delete unneeded pods, but they’ll be recreated.
With a little practice, you can have this conversation as a monologue. (I sometimes let the BB-8 on my desk act as the respondent.)
This is a really powerful technique, especially when there isn’t an error message, or something demonstrably wrong. You could have the first statement be, “The system is running slower today than it was last week”.
2. Dear Alan
Alright, story time: my first role in IT — with my shiny new AWS associate certs in hand — was a challenge. Fortunately, I worked with a really talented team who were happy to help me fill in the gaps in my knowledge.
One engineer in particular (let’s call him “John”) was what cringy recruiters would call a “rockstar” or “10x” developer. He quickly became my go-to for questions.
However, as an insecure junior developer, I didn’t want to come off as useless. Whenever I would message him to ask for help, I made it my goal to include at least three things I had already tried to fix the problem.
What I found over time was that, in the act of actually writing out what I had attempted, I almost always thought of something I hadn’t.
Soon the “Dear John” message started living in my drafts, and then in a text file where I would describe the problem, what I was trying to accomplish, and what I had tried — to no avail.
Much like journaling, where giving myself space to process my day and my thoughts gives me clarity, documenting my attempts would help me see what I had missed.
Eventually “John” moved to a new team, and I moved to a new role. In my new role, I went to create my “Dear John.txt” file, but Dear Johns are a whole other thing, so I switched to “Dear Alan” since, if you’re going to ask for help from anyone, why not the guy who fought Nazis with math?
So, if you have an error or a problem and you’re not sure what more you can do, write out your request for help, outlining what you’ve done, and hey — if writing it out doesn’t spark something, then you have a nice summary of your problem to actually send to someone.
Watch: Putting Kubernetes to Work in Your Environment
In this free, on-demand webinar, get a deep dive into picking which managed Kubernetes service is best for your unique needs.
3. The Art of Zen and Googling
I know the title of this article includes “when you’re not even sure what to Google,” but bear with me. Have you ever had an error message that exceeded Google Search’s 32-word limit? Been trying to figure out a Python error and you keep getting StackOverflow questions about Django, even though you aren’t using Django?
Well, let’s look at some ways to take that useless error message and turn it into something you can Google:
- Remove any references to your code: Some errors will output line and character position, maybe even the method or function name. These values are specific to your code, and keeping them in the search query will only confuse the search engine.
- Include the specific technology you are using, along with the error: This is especially important if you are using a library — you don’t want to understand the generic error handler, you want information about the specific context of the library you are using.
- Practice good Google-fu: There are all sorts of tips to get better Google results. I find the double quotes (“ ”) helpful for terms that have to be in the result — if you need the results for how to format code snippets in Ansible plays, then “Ansible” will save you from more generic YAML formatting results. Likewise, the dash (-) can be added to omit results that contain the term. That example about a non-Django result for a Python error- you can add -Django to filter those out.
Troubleshooting isn’t glamorous — even when it’s not on TV. But little tweaks to an error message can take you from “I can’t do anything with this” to “Oh, so that’s what that means!”
Troubleshooting errors in a roller coaster of emotions: the excitement of a new problem, the thrill of the chase, the frustration of beating your head against a wall, the spark of an idea, and the hit of dopamine when you stop seeing that same error (and sometimes the mix of excitement and bewilderment when you see a brand new error in its place). I hope these strategies will help you attack bigger, less defined problems with, if not more confidence, than at least more zeal.
For more hands on approaches to troubleshooting, check out the new Hands-on Kubernetes Troubleshooting course.
What about you? Do you have some tried and true approaches to handling problems when you’re completely lost? (To save time, my boss told me curling up in a ball under my desk and hoping the problem goes away doesn’t count as a strategy.) Share your tips on the ACG Discord server, or let us know on Twitter or Facebook.