Cloud Provider Comparisons

Cloud Provider Comparisons: AWS vs Azure vs GCP – Databases

Episode description

In Cloud Provider Comparisons, we take a look at the same cloud services across the three major public cloud providers – Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). In this video we focus on databases. Join Rob Moore as he dives into comparing Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and cloud-native PaaS SQL databases, and NoSQL databases, with a focus on key-value and document databases. If you’re curious about how the database services of AWS, Azure and GCP match up, watch on to find out!

Links to everything covered in the episode are provided below.

0:00 Introduction
0:43 Databases fundamentals
3:58 SQL (Relational) databases
7:40 PaaS SQL databases
10:36 Cloud-native PaaS SQL databases
14:34 NoSQL Databases


IaaS SQL – Azure:



PaaS SQL – Azure:


Cloud-native PaaS SQL – AWS:

Cloud-native PaaS SQL – Azure:

Cloud-native PaaS SQL – GCP:


NoSQL – Azure:


Series description

In Cloud Provider Comparisons, we explore and compare the same cloud service across the three major public cloud providers - Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).

Hello, Cloud Gurus. I'm Rob Moore. And I welcome you to Cloud Provider Comparisons. In this series, we take a look at the same services across different cloud providers. We'll look at the similarities, differences, and anything else that might be interesting. In this episode, we're going to take a look at databases.

These wonderfully useful services are a core enabler to creating and deploying applications and systems. The cloud has allowed for a lot of innovation in this space. So if you're curious about how the database services of AWS, Azure, and Google Cloud Platform match up, stick around. Before we dive in, there were a few fundamental concepts worth knowing when it comes to databases. The first is scaling, which can be vertical or horizontal.

Vertical scaling is where you have a single database instance that grows by adding more compute memory and or storage so that it can handle more traffic. Horizontal scaling is where you add multiple database instances. So the traffic that hits your database gets distributed across those instances. This can scale a lot more than vertically, but it's hard to do because it requires data to either be deterministically split between the instances or kept in sync between those instances, both of which create complexity around things like data, structure, consistency, and atomicity. Some database services will allow you to scale horizontally for read only operations via read replicas ,where it will synchronize data for read-only purposes between multiple instances, but you'll still have a single primary instance for write operations.

The next concept is availability. This is where the cloud providers offer a service level agreement or SLA for how long the database will be up and running to receive requests within a given time period. For instance, if a database has an availability SLA of 99.9%, it can have a downtime of 44 minutes in a month. Whereas if a database has a 99.999%

SLA, also known as five nines, it can have 26 seconds of downtime a month. Cloud providers apply high availability architectures to allow for fault tolerance across a subset of the various failure modes. Depending on your specific needs, you'll want to balance availability, cost, and features to decide which service is best for you. The final concept to understand are the different types of databases that are available. I'm sure many of you have heard of SQL or sequel databases and no-SQL databases, but when you dig under the covers, you find out that there's actually so many types of databases.

In this video, I'll compare the most common types of databases. Relational databases, often referred to as SQL databases, and no-SQL databases, specifically key value databases and document databases. For relational databases, I'll cover three different subtypes - Infrastructure-as-a-Service, or IaaS, Platform-as-a-Service, or PasS, and cloud-native PaaS. So that I can give you a valuable in-depth comparison, I'm going to focus on the most prevalent general purpose database services, and I'm not going to cover SQL data warehouses or the other types of no-SQL databases like graph, in-memory, time series, ledger, and columnar databases. Relational databases are characterized by rows of data within tables that have columns and relationships and are queryable using Structured Query Language, or SQL. Being some of the oldest general purpose databases,

they are still the most commonly used today. Let's start with IaaS SQL databases. This is where you are responsible for deploying a database server across one or more IaaS virtual machines. There are two main proprietary databases, Microsoft SQL Server and Oracle, and three main open source databases, MySQL, PostgreSQL, and Maria DB. When it comes to database services, IaaS is by far the most work for you to build and maintain, especially if you want to implement a highly reliable or performant design.

So why would you want to use this option? Largely, it comes down to flexibility and control. It can be a good option when you need to lift and shift or have explicit control over specific details, but in return, it's your responsibility to worry about configuration, patching, backups, et cetera. The major differences between cloud providers relate to whether you're allowed to license the database engine on the cloud's hypervisor, and whether any extra advice, services, or capabilities are provided, such as automatic patching. Let's start by looking at Microsoft SQL Server. Support is pretty good across AWS, Azure, and GCP, with all providing comprehensive deployment guidance.

The story is a bit different when it comes to licensing. If you have existing software assurance licensing, you can use that to reduce your costs on Azure and AWS, but not GCP. Azure also provides hybrid benefit to further reduce your costs and has auto patching and auto backup capabilities. When it comes to Oracle, there is comprehensive guidance in AWS and Azure, but due to the licensing constraints, it's not possible to run Oracle on GCP unless you deploy onto expensive complex bare metal servers. Azure has an extra feature available in some regions where they have a partnership with Oracle to provide low latency connectivity into Oracle's cloud environment.

A first-class environment for running Oracle database. When it comes to the open source databases, the support is a little less comprehensive. The official guidance is minimal or non-existent on all cloud platforms. Now there's plenty of guidance online for running these database engines on servers, but you may suffer a lack of specific guidance for how to enable sophisticated, highly available or performance deployments for a given cloud provider and database combination. In terms of availability, the cloud providers all provide a 99.99% SLA

if you deploy across availability zones, and lower SLAs for a single server deployment. It's worth pointing out that all three cloud providers have a marketplace with a bunch of ready to go databases on virtual machine deployments with varying degrees of sophistication and automation that you can deploy at the click of a button and then get charged to your cloud bill. Okay, let's take a look at the PaaS SQL options. This is where the cloud provider takes care of automating the virtual machines and running patching and configuring the operating system and database engines for you. In AWS, the long-standing PasS database service is called Amazon Relational Database Service, or RDS. In Azure,

it's called Azure SQL Database Managed Instance for SQL Server and Azure SQL Database for MySQL, PostgreSQL, or Maria DB. And in GCP, it's called Cloud SQ:. Amazon RDS is the only provider with direct support to host a PaaS Oracle database. And GCP is the only cloud provider to not have a PaaS Maria DB service. It's worth pointing out that Azure SQL Database Managed Instance has a slightly different model to the other services. Rather than choose the database version that you want to host, it provides an evergreen version of SQL Server that is always patched.

All three providers offer into native cloud monitoring solutions and performance insights and recommendations. Azure comes out slightly ahead in terms of availability, with a higher SLA and automatic cross-region fail-over. All providers have automatic backup solutions, but Azure's is the most comprehensive. Default backup retention in GCP can be configured for a much longer period of time, but Azure has an optional long-term retention capability for SQL Server. All three offer a point-in-time restore option, which is handy when you need to recover from accidental data loss, but it's much more clunky to use in GCP, with a smaller retention period as well.

Vertical scaling is where GCP shines. You can scale to a monster database and maintenance is reduced since it will automatically grow your storage for you across all database engines. Azure stands out in terms of horizontal scaling with cross-region read replicas for all database engines and read write replicas in their business critical tier of SQL managed instance. All providers offer strong security with firewall and VNet encryption in transit and at rest and audit logging. The key difference is that RDS doesn't have built-in client encryption and Azure doesn't have native role-based access control, although it does have some more advanced advisory capabilities built in.

Let's take a look at the cloud-native PaaS SQL options. By architecting database solutions natively for the cloud, the providers have SQL services with more advanced features such as active geo replication, faster and broader vertical and horizontal scaling, higher availability, and serverless pricing. AWS has Aurora, a MySQL and PostgreSQL compatible engine that boasts higher throughput on the same hardware. Azure has Azure SQL Database, an evergreen SQL server engine with various operating modes, including general purpose business, critical elastic pools, hyper scale and serverless. Google has Cloud Spanner, which is a proprietary database with some impressive properties.

From a monitoring perspective, all providers offer good coverage, particularly Azure, with a comprehensive array of performance insights tooling. All offerings have similar base SLAs. Azure lifts that higher with its business critical tier. And it's also the only one to have an on recovery point objective and recovery time objective. But the most impressive is GCP's Cloud Spanner with 99.999% availability for multi-region

configuration. To be clear, that's 26 seconds of allowed downtime per month. All providers have automatic in-region fail-over and fast cross-region fail-over, which is automatic in Azure and GCP from a backups perspective. All three providers have point in time restore with AWS and Azure having a longer restore window and automated backups. And Azure having a long-term retention option outside of point-in-time restore. Cloud Spanner only has manual backups,

but it does have an ability to perform a stale read to surgically query accidentally deleted data, which is really cool. Looking at scaling, GCP Cloud Spanner doesn't really have a concept of vertical scaling since it's fully horizontally scalable, all nodes are the same size. Comparing Azure and AWS, Azure's business critical tier has an absolutely monster maximum vertical scale for storage. GCP is practically infinite since it ties storage to its horizontally scaled nodes, with a max storage per node. There are default node quotas, that you can request those quotas increase.

So in reality, the limit is based on your credit card. Amazon Aurora is impressive with huge max storage and auto growth. Whereas Azure is relatively lacking with a much smaller max storage and no auto growth. Although the hyperscale tier has an impressive max storage from a horizontal scale perspective. All three providers have an impressive set of capabilities.

They all provide cross-region read replicas, sharding capabilities, and read write replica options. The clear standout though is Cloud Spanner with automatic sharding and horizontally scalable read write. Both Azure and AWS offer serverless pricing, which includes auto-scale auto pause and auto resume capabilities with pay for use pricing, which is great for intermittent and non-production workloads. From a security perspective, it's a very similar story to what we covered for PaaS SQL databases, except Cloud Spanner doesn't have client-side encryption. Okay. Let's switch gears and dive into the world of no-SQL,

and more specifically key-value and document databases. Key-value databases are simple. You store data, the value, against a key lookup. You can generally store a huge amount of data while maintaining a fast lookup time. The value that you store usually consists of a varying set of properties with different data types.

Document databases are similar to key value databases in that you store a set of values, a document, against a key, the document ID, but these databases have sophisticated support for allowing more complex values to be stored, including arrays and child objects, as well as transactions, indexes, and queries. Based on those documents structures, there is a common theme among the cloud native no-SQL key value and document database services, in that they have a long history with legacy services that have subsequently been deprecated. For instance, AWS started with Simple DB, a simple key value store, in December 2007, and then introduced DynamoDB with a superior set of features, including documents support, in December 2012. It also has a newer service that is Mongo DB compatible called Document DB. Azure introduced Table Storage, a simple key-value store, in October 2008. They then followed that up with Document DB,

a document database, in August 2014 and subsequently they released Cosmos DB, a multi paradigm database supporting key value document graph and columnar stores in May 2017. GCP introduced their key value database Cloud Bigtable in August 2014 after having used it internally since 2005. They then introduced Datastore, their first document database, in October 2008, which in combination with Firebase Realtime Database has been subsequently improved by Cloud Rirestore, which was released in October,2017. In this section, I'll compare these flagship key-value and document databases, Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Bigtable and Firestore. Let's start by looking at consumption pricing models and unique features.

Because these databases are generally proprietary, the mechanisms that the cloud providers give you to consume them are important. All services provide a command line interface and all but Bigtable have at least one user interface. All providers have a local emulator so that you could run a local instance of the database for development and testing activities without needing an internet connection. As you would expect, all services have a broad range of SDKs to choose from with some slight differences for which languages are supported. Looking at the pricing models, Amazon DynamoDB and Azure Cosmos DB give you optionality, with support for both provisioned and serverless pricing models.

GCP's Bigtable only supports provisioned and Firestore only support serverless. Looking at unique features, DynamoDB has an in-memory cache integrated to speed up reads. Cosmos DB has an in-built JavaScript engine to provide transactionally consistent execution of stored procedures, triggers, user-defined functions, and merge procedures. It also supports a range of additional APIs on top of the core key value and document APIs, including Mongo DB, Cassandra, and Gremlin. Bigtable has an in-built Apache HBase API and Firestore has some unique properties from its file-based routes that make it ideally suited for building web and mobile apps directly against it, including offline sync and CDN data bundles.

No-SQL databases have a key set of capabilities around how they handle the data being stored within them. If we start by looking at the data model each service uses, we can see that Cosmo DB provides an extra parent level called Database Account. Cosmos DB, Bigtable and Firestore all support a database level construct. All services have a table or collection concept which contains rows or documents, which have property values. DynamoDB and Cosmos DB have explicit partition keys to allow for sharding and grouping rows.

Bigtable has a unique property where individual cell values are versioned so you can get the history of a value, and Firestore has a powerful feature where you can define nested collections within a document. The type of values each database stores are reasonably similar across the services, apart from Bigtable, which treats all values as binary values. All services provide a level of support for transactions with some slight differences in the isolation level and scope the transactions can be executed against. And all services provide optimistic concurrency, but Firestore allows for pessimistic concurrency too. The consistency model is where there are some interesting differences.

DynamoDB allows you to specify at read time, if you want eventual or strong consistency. Bigtable will give strong consistency if using a single cluster, or eventual consistency if using multiple clusters. Firestore always gives serializable isolation and Cosmos DB gives you ultimate flexibility, allowing you to choose from five different consistency levels for each read request with well-defined availability, latency and throughput trade-offs. All services apart from Bigtable allow for data streaming. All no-SQL databases have limits that apply when you are storing and retrieving data.

Each service has different underlying designs, which create various limit characteristics. For instance, the attributes on an item are limited in Bigtable and Firestore, but not in DynamoDB or Cosmos DB. All databases have a limit for partition keys and IDs in the low kilobytes. Looking at the next level up, an individual item, there's a wild amount of variance, particularly with a relatively low maximum item size in DynamoDB, and a relatively huge item size in Bigtable. Although that size includes all versions of that item.

The other limit related to items is the number or size of items allowed when performing a query and or transaction. DynamoDB has relatively small limits here and Firestore, and in particular Bigtable, have much larger limits. Finally, looking at the table or collection concept, while Cosmos DB and Firestore have no limits, by default DynamoDB and Bigtable do. From a scalability perspective, all services support full horizontal scalability, and therefore provide a massive amount of max storage. So this is largely constrained by your credit card.

From an auto scale perspective, the serverless pricing models in DynamoDB, Cosmos DB and Firestore all automatically scale. Cosmos DB and DynamoDB also support auto scaling in their provisioned pricing models through manually specified rules. Bigtable doesn't support auto scaling out of the box. All services support impressive throughput numbers, assuming you take advantage of their various partition mechanisms. The write performance on Firestore is relatively lacking compared to the other services though. Firestore is similarly lacking when it comes to latency,

with no specified latency target, whilst the other services boast single digit millisecond response times. DynamoDB is notable here with its optional DAX capability, providing the microsecond response times. From a monitoring perspective, all databases have integration to cloud monitoring services and they all have detailed performance insights, except for Firestore. All services have impressive cross-region availability of five nines with lower availability SLAs for single region instances. Cosmos DB is the only service to have SLAs for throughput, consistency and latency.

All services have cross-region automatic fail-over, with Cosmos DB optionally allowing for manual failover. AWS and Azure are far and away in front when it comes to backups. Both DynamoDB and Cosmos DB provide point-in-time restore capabilities and automated backups out of the box. Surprisingly, Cosmos DB doesn't provide the ability to take manual backups, whereas DynamoDB allows for full backups in seconds without affecting performance. GCP Bigtable and Firestore both allow for manually triggered backups.

The security story is pretty consistent with what we saw for the SQL databases, with good coverage across the services. And one surprising emission with no firewall or VNet support with Firestore. Although Firestore has some powerful authentication capabilities for web and mobile apps via Firebase authentication and Cloud Firestore security rules. Cosmos DB provides a range of different mechanisms to authenticate, which gives it a lot of flexibility. And it also has comprehensive security advisory via Azure Defender.

As we've seen, database options across AWS, Azure, and GCP are extensive and impressive. There are largely equivalent services across the cloud platforms in each category, but each have their own unique characteristics. While we haven't covered every single database service each cloud provider has to offer, hopefully the information in this episode has provided a useful starting point to help you compare these different types of services across providers. Hopefully you share my excitement about the amazing possibilities and productivity that these incredible services give us. If you'd like to learn more about any of these cloud services, head on over to the A Cloud Guru platform for videos and hands-on labs covering all of these services. This has been Cloud Provider Comparisons,

databases edition. Thanks for sticking around. I hope you enjoyed the video as much as I enjoyed making it. Keep being awesome Cloud Gurus.

More videos in this series

Master the Cloud with ACG

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?