Before moving into teaching full time, I worked as a cloud architect for a professional services consultancy. There I helped clients build exciting new cloud-native solutions and — more often — migrate existing workloads into the cloud.
What did I learn? Wouldn’t you like to know . . . Wait! That was supposed to be a question: Wouldn’t you like to know? If so, continue on for some tales from the cloud migration trenches. These lessons learned will hopefully help anyone moving to the cloud (be it Azure, AWS, GCP — any cloud) avoid some common cloud migration mistakes I saw over the years.
Cloud Migration Lessons Learned
Some of these clients had legacy datacenter estates and didn’t want to front the additional capital expenditure necessary to keep the hardware running, or at least under warranty. Some clients had already deployed multiple applications to the cloud, but had been incentivized to move from Big Provider A to Big Provider B. What they all had in common was a heterogeneous portfolio of systems built with a variety of methodologies and technologies, meaning that any planned migration could not be considered trivial, and would require the skills of some paid consultants. (Thankfully, Big Provider B normally paid the bill.)
Patterns of Cloud Migration
At a high level, there are three recommended approaches to cloud migration.
1. Lift and Shift
First, we have “Lift & Shift”. With this approach, you perform the minimum changes necessary to essentially swap the hosting environment of your workload to your new cloud provider. If your existing infrastructure comprises a bunch of virtual machines, this is quite straightforward. There are even products on the market that promise to do this migration work for you “hands-free”, although in my experience there is always some hand-holding involved. The downside of this approach is that you gain none of the benefits of using cloud technologies following your migration. You’re probably just moving to a different type of bill.
2. Rip and Replace
At the other end of the spectrum is “Rip & Replace” where you rebuild your workload from scratch to be “cloud-native”. If you can make the investment in time and skills development, you will get the maximum benefits offered to you by the scalability and resilience of the cloud, and you’ll have a chance to shed all of your existing technical debt.
3. Move and Improve
Somewhere in the middle is the “Move & Improve” technique, where you make some changes to your application – for example, introducing scaling or automation – without throwing the whole thing out. At first glance, this happy medium can seem like the best option, but if done wrong you may end up keeping all of your technical debt, and getting no cloud-native benefits after all.
When Simple Migrations Go Well
In some cases, a Lift & Shift isn’t such a bad thing. Maybe Big Provider B has promised you a smaller bill than Big Provider A, and maybe you have a small, non-mission-critical workload.
I assisted in one such migration of 35 Windows servers, using one of the aforementioned “hands-free” migration tools recommended by the provider. The servers had simply been running in Big Provider A for some years, not using any other cloud-specific functionality, so the move should have been relatively straightforward. We were prepared for things to not go exactly as planned — after all, this was Windows — and sure enough, we soon found different levels of provider support among the different versions of Windows we were migrating.
In addition, we discovered that a rather overzealous sysadmin who had since left the client company had decided to manually overwrite all of the default networking configurations on each server, which completely broke the automated migration. However, these problems were not insurmountable, and within a few hours, we had lifted and shifted the estate to a new, cheaper cloud bill with slightly more built-in resilience.
When Complex Migrations Go Surprisingly Well
One particular Lift & Shift surprised me, in that we overestimated its complexity. We were tasked with moving around 40 different workloads that had been deployed to several physical Kubernetes clusters running in a datacenter.
As a younger engineer, I had a bit more trepidation dealing with the infamous container orchestrator, but I was soon to learn that all of its supposed YAML complexity just made it more portable — as was its original design goal. We had to manually reconfigure ingresses for the new cloud provider’s load balancer, but the majority of the workloads migrated seamlessly to the new managed Kubernetes service. We even had leftover consulting time which we used to add a CI/CD system, just for fun!
“We’ve Always Done It This Way”
“Move & Improve” migrations didn’t always have the outcome we had hoped for.
Tools, processes, and technologies continually evolve, but teams and organizations sometimes don’t evolve along with them. As a consultancy, we had a preferred way to “improve” certain types of applications during a migration by applying configuration management and attempting to reshape them into horizontally scalable services.
But at a certain point of application complexity, our chosen config management tool just wasn’t up to the job, no matter how many hours of work we poured into it. We’d committed to the approach and it was a long arduous task. In retrospect, we should have swallowed a bit of our pride and changed tact.
This Is Not The Database Service You’re Looking For
Assumptions, it turns out, are a ticking time bomb in any cloud migration plan.
During the same migration, we had simply to move a MySQL database from one cloud provider to another. MySQL is a mature, trusted database with rock-solid import and export capabilities, but we were thwarted by the sheer size of this particular customer DB. It continued to grow at a rate that outpaced any reliable replication to our new provider, despite any method we tried. We even believed that we were being bottlenecked by the original host, as a deterrent to migration in the first place.
This experience shocked us as we had so many successful MySQL migrations under our collective belts. But, as we soon discovered, part of the problem was that we weren’t just migrating a database, we were migrating a component of technical debt. The customer was using MySQL for time-series data, which doesn’t make a great deal of sense for a relational database. It had grown in place into a bloated monster and needed to be almost completely deconstructed in order to move it. We could no longer assume our customers were doing things “properly” in the first place.
Parting Words of Migration Wisdom
What I learned from these experiences is that in the majority of cases, any attempt to save time, training, or investment by short-changing a complete cloud-native migration is just buying yourself more future technical debt.
Of course, every organization and workload is different, but if the purpose of migrating to the cloud is to take advantage of its unique capabilities — scalability, elasticity, agility, resilience — you are going to find yourself disappointed unless you fully embrace cloud-native design principles and do some much-needed refactoring.
You don’t need to “Rip & Replace” everything on day one, and there are gradual approaches that can help you to break down your monoliths over time. Transformations often need to occur in teams, not just in technology. With careful planning, you can reskill your people, refactor your code, and migrate both to a place of cloud maturity.