My story about cloud adoption, talent transformation, and helping an enterprise win
“Any improvements made anywhere besides the bottleneck are an illusion.”— Gene Kim, The Phoenix Project
While at Capital One, I worked across the flow of a large-scale enterprise adoption of cloud computing partnering with AWS. The engineering effort was challenging, but the greatest barrier to success is the short supply of cloud-fluent talent needed to sustain the transition at scale.
TL;DR— Talent transformation is really the hardest part of cloud adoption.
The Early Days of Cloud Adoption
Months 1–6; 3 applications
The energy in the room was driven by the challenge, the risk of failure, and lots of caffeine. Each of our agile bays consisted of full-stack teams of engineers with varying backgrounds in development, security, and operations.
The rooms were setup to co-locate dev & ops teams — specifically designed to accelerate the builds and deployments of our first set of production applications onto Amazon Web Services.
We knew that bringing together the cross-functional teams was going to be essential. Infrastructure was now code, and operations had shifted left into the realm of development.
Within a few sprints, the infrastructure resources were being refreshed and rebuilt as often as the application code. A cloud production readiness checklist anchored the whiteboard in each room to guide the teams towards the expected controls— clear defining the “must have” outcomes to ensure well-managed operations.
Each build offered the teams an opportunity to “fail forward fast”, and iterate upon the evolving checklist. As the builds matured and confidence grew, the teams developed a healthy competition to be the first application deployed into the cloud.
At that point, my role became clear — get the hell out of their way.
To provide the teams some breathing space, an unofficial policy was established. If you don’t have your hands on a keyboard that is connected to a command line, the war room was off-limits — unless you were removing an impediment from their board or delivering more caffeine.
The success of the early adopters wasn’t serendipitous. An integrated programmatic approach of the Cloud Center of Excellence ensured the architecture, controls, and support were all aligned — with advocacy and direct support from the senior leadership to quickly address roadblocks.
The team’s feedback on the cloud production readiness checklist was essential to building a more robust and meaningful set of automated controls. The lessons learned from the early adopters were fed into baseline architectural patterns — which were shared across the enterprise to accelerate the journey for fast followers.
Cloud Operations and the Fast Followers
Months 7–12; 50+ applications
With a set of battle-tested patterns and an established path into the production regions of the cloud, my focus shifted from the early adopter toward solidifying cloud operations across the enterprise.
The visibility into the number of application teams queued in the pipeline preparing to migrate provided a sense of urgency. The Cloud Center of Excellence provided the focal point once again— this time for leveraging the central engineering expertise to implement automated controls at radical scale.
While establishing the cloud operations team, I was fortunate for the opportunities to meet with Adrian Cockcroft — who previously helped lead Netflix’s migration to a large scale, highly available AWS based architecture. Their efforts to create the Netflix Simian Army provided solid insights on how to take an engineering-first approach to cloud governance.
More importantly, Adrian’s philosophical guidance on building for failure was very influential — and laid the foundation for my approach to scaling cloud operations.
The cloud operations team raised our jolly roger flag under the slogan “terminate ’em all and let autoscaling sort ’em out”. Driving the behavior changes toward ephemeral and immutable workloads that treated servers less like pets and more like cattle would take time, and raising the pirate flag was a great start to begin constructively disrupting the status quo.
As the flow of fast-follower migrations increased at a steady rate, so did the late night incident calls. The incidents accumulated while sleep hours deteriorated. The silver lining was the operational data which provided a rich source of metrics that amplified the right-to-left feedback.
In a short amount of time, the primary impediment to scaling flow became very clear — cloud fluency skills.
There is a reason why the #1 challenge regarding cloud computing is the lack of appropriately skilled engineers — it requires a new set of skills which haven’t been needed in the past. Training and certification would be essential for preparing the enterprise to transition to this new operating model.
We already had the people needed to succeed, but lacked the skills at scale to sustain the transition. The talent transformation would be the hardest part of cloud adoption.
A Pivot to Dean of Cloud Engineering
Year 2; 100+ applications
The talent transformation program needed to achieve critical mass where it becomes a self-sustaining transition to a new operating model. We needed to accelerate the transformation through the ‘trough of despair’ so we could unleash the pioneers on emerging patterns and services — without compromising our gains.
To drive the talent transformation across the enterprise, the Cloud Center of Excellence once again provided the tip of the spear for accelerating cloud adoption. A key challenge to cloud fluency is teaching the company-specific usage patterns, or -”isms”, that govern migrations, controls, and compliance.
ism /’izəm/ noun informal plural noun: isms; a distinctive doctrine, theory, system, or practice
As company-specific practices evolve over the course of the journey, it’s vital the education efforts are directly connected to the cloud engineering efforts. It’s one thing to learn the about native AWS services, it’s another to learn how to effectively apply the AWS services within your environment.
Achieving Critical Mass of Cloud Fluency
Year 3; steady state
The talent transformation efforts evolved from a side-of-the-desk self-serving effort designed to reclaim sleep, to a full-time role as Dean of Cloud Engineering.
After creating an internal cloud college and pivoting the curriculum to focus on outcomes, the program attracted executive support and advocacy. The college now has thousands of engineers enrolled and has enabled several hundred AWS Certifications.
A key takeaway was how the rigor of the AWS Certifications are invaluable to establishing cloud fluency. The common knowledge base ensures that engineers are sharing a consistent understanding of cloud computing.
As more individuals became cloud fluent, teams began leveraging standard approaches and patterns— which results in more efficiency and higher quality implementations. For the associates, the AWS Certification also offers a tremendous amount of motivation given their value in the marketplace.
During a recent visit, Werner Vogels offered another clear value proposition for organizations that achieve critical mass of cloud fluency — “you have to be a digital native company if you want to attract digital natives”.
The large number of AWS Certifications enabled by the cloud college reflects an understanding that talent plays a central role in the transition to cloud — but the number of certifications can be considered somewhat of a vanity metric.
Ultimately, the training and education is a means to outcomes that define the business case for cloud migrations.
“Remember, outcomes are what matter — not the process, not controls, or, for that matter, what work you complete.”— Gene Kim, The Phoenix Project
Enterprises should be birddogging their outcome metrics — such as the speed at which an organization is migrating to the cloud, and tracking how well-managed, controlled, compliant, and cost-efficient applications are in the cloud.
The success of cloud adoption and migrations comes down to your people — and the investments you make in a talent transformation program. Until you focus on the #1 bottleneck to the flow of cloud adoption, improvements made anywhere else are an illusion.