In this post, we take a look at the databases across different cloud providers. We’ll look at the similarities, differences, and items of interest worth calling out across AWS, Azure, and GCP.
Ah, databases. These amazing and utterly useful services are a core enabler to creating and deploying applications and systems. The cloud has allowed for loads of innovation in this space.
AWS vs Azure vs GCP: Cloud provider comparisons
It’s not all apples to apples in the world of cloud. Here are some other cloud comparisons guides.
- NoSQL databases: Cosmos DB vs DynamoDB vs Cloud Datastore and Bigtable
- Serverless: AWS Lambda vs Azure Functions vs Google Cloud Functions
- Virtual Machines: AWS EC2 vs Azure Virtual Machines vs Google Compute Engine
- IAM: Comparing AWS, Azure, and Google Cloud IAM services
- AKS vs EKS vs GKE: Managed Kubernetes services compared
- Comparing SQL (Relational) databases
Fundamental concepts of cloud databases
Before we dive in, there were a few fundamental concepts worth knowing when it comes to databases.
What is database scaling?
Scaling can be vertical or horizontal.
- Vertical scaling — Vertical scaling is where you have a single database instance that grows by adding more compute, memory, and/or storage so that it can handle more traffic.
- Horizontal scaling — Horizontal scaling is where you add multiple database instances so the traffic that hits your database gets distributed across those instances. This can scale much more than vertically, but it’s hard to do because it requires data to either be deterministically split between the instances or kept in sync between those instances — both of which create complexity around things like data, structure, consistency, and atomicity.
- Read replicas — Some database services will allow you to scale horizontally for read-only operations via read replicas. Here, data is synchronized for read-only purposes between multiple instances, but you’ll still have a single primary instance for write operations.
What is database availability?
The next concept to understand is availability. Availability is where the cloud providers offer a service level agreement (SLA) for how long the database will be up and running to receive requests within a given time period.
- If a database has an availability SLA of 99.9%, it can have a downtime of 44 minutes in a month. Whereas if a database has a 99.999% SLA (aka known as five nines) it can have 26 seconds of downtime a month.
Cloud providers apply high-availability architectures to allow for fault tolerance across a subset of the various failure modes. Depending on your specific needs, you’ll want to balance availability, cost, and features to decide which service is best for you.
Get the Cloud Dictionary of Pain
Speaking cloud doesn’t have to be hard. We analyzed millions of responses to ID the top concepts that trip people up. Grab this cloud guide for succinct definitions of some of the most painful cloud terms.
What are the different types of databases?
Finally, it’s important to understand that there are different types of databases available. Maybe you’ve maybe of SQL and NoSQL databases. But when you dig under the covers, you find out that there are actually many types of databases.
In this post, we’ll compare the most common types of databases: relational databases (often referred to as SQL databases) and NoSQL databases — specifically key-value databases and document databases. For relational databases, we’ll cover three different subtypes: Infrastructure as a Service (or IaaS), Platform as a Service (or PasS), and cloud-native PaaS.
To give you a valuable in-depth comparison, we’ll focus on the most prevalent general-purpose database services — and we won’t cover SQL data warehouses or the other types of NoSQL databases, like graph, in-memory, time series, ledger, and columnar databases.
SQL (Relational) databases
Relational databases are characterized by rows of data within tables that have columns and relationships and are queryable using Structured Query Language (or SQL).
These are some of the oldest general-purpose databases and are still the most commonly used today.
IaaS SQL databases
We’ll start with IaaS SQL databases. This is where you’re responsible for deploying a database server across one or more IaaS virtual machines.
There are two main proprietary databases — Microsoft SQL Server and Oracle Database — and three main open source databases — MySQL, PostgreSQL, and Maria DB.
Why use IaaS Databases?
When it comes to database services, IaaS is by far the most work to build and maintain, especially if you want to implement a highly reliable or performant design. So why would you want to use this option?
Largely, it comes down to flexibility and control. It can be a good option when you need to lift and shift or have explicit control over specific details. But in return, it’s your responsibility to worry about configuration, patching, backups, and so on.
IaaS Database Service comparison: AWS vs Azure vs GCP
The major differences between cloud providers relate to whether you’re allowed to license the database engine on the cloud’s hypervisor, and whether any extra advice, services, or capabilities are provided, such as automatic patching.
Microsoft SQL Server
Let’s start by looking at Microsoft SQL Server. Support is pretty good across AWS, Azure, and GCP, with all providing comprehensive deployment guidance.
The story is a bit different when it comes to licensing. If you have existing software assurance licensing, you can use that to reduce your costs on Azure and AWS, but not GCP.
Azure also provides hybrid benefits to further reduce your costs, and it has auto-patching and auto-backup capabilities.
When it comes to Oracle, there’s comprehensive guidance in AWS and Azure. But due to the licensing constraints, it’s not possible to run Oracle on GCP unless you deploy onto expensive, complex bare metal servers.
Azure has an extra feature available in some regions where they have a partnership with Oracle to provide low-latency connectivity into Oracle’s cloud environment. This is a first-class environment for running Oracle Database.
When it comes to the open-source databases, the support is a little less comprehensive. The official guidance is minimal or non-existent on all cloud platforms.
Now there’s plenty of guidance online for running these database engines on servers, but you may suffer a lack of specific guidance for how to enable sophisticated, highly available, or performant deployments for a given cloud provider and database combination.
Cloud Availability: AWS vs Azure vs GCP
In terms of availability, the cloud providers all provide a 99.99% SLA if you deploy across availability zones, and lower SLAs for a single server deployment.
It’s worth pointing out that all three cloud providers have a marketplace with a bunch of ready-to-go databases on virtual machine deployments with varying degrees of sophistication and automation that you can deploy at the click of a button and then get charged to your cloud bill.
PaaS SQL databases: AWS vs Azure vs GCP
Next, let’s take a look at the PaaS SQL options. This is where the cloud provider takes care of automating the virtual machines and running patching and configuring the operating system and database engines for you.
PaaS SQL database comparison: AWS vs Azure vs GCP
- In AWS, the long-standing PasS database service is called Amazon Relational Database Service (or RDS).
- In Azure, it’s called Azure SQL Database Managed Instance for SQL Server and Azure SQL Database for MySQL, PostgreSQL, or Maria DB.
- And in GCP, it’s called Cloud SQL.
- Amazon RDS is the only provider with direct support to host a PaaS Oracle database. And GCP is the only cloud provider to not have a PaaS Maria DB service.
- It’s worth pointing out that Azure SQL Database Managed Instance has a slightly different model to the other services. Rather than choose the database version that you want to host, it provides an evergreen version of SQL Server that’s always patched.
PaaS Microsoft SQL Server — Monitoring and Availability
- All three providers offer into native cloud monitoring solutions and performance insights and recommendations.
- Azure comes out slightly ahead in terms of availability, with a higher SLA and automatic cross-region fail-over.
- All providers have automatic backup solutions, but Azure’s is the most comprehensive.
- Default backup retention in GCP can be configured for a much longer period of time, but Azure has an optional long-term retention capability for SQL Server.
- All three offer a point-in-time restore option, which is handy when you need to recover from accidental data loss, but it’s much more clunky to use in GCP, with a smaller retention period as well.
PaaS Microsoft SQL Server — Scalability
- Vertical scaling is where GCP shines. You can scale to a monster database and maintenance is reduced since it will automatically grow your storage for you across all database engines.
- Azure stands out in terms of horizontal scaling with cross-region read replicas for all database engines and read write replicas in their business critical tier of SQL managed instance.
PaaS Microsoft SQL Server — Security
All providers offer strong security with firewall and VNet, encryption in transit and at rest, and audit logging.
The key difference is that RDS doesn’t have built-in client encryption and Azure doesn’t have native role-based access control — although it does have some more advanced advisory capabilities built in.
Automating AWS Cost Optimization
UsingAWS cost-effectively can be a challenge. In this free, on-demand webinar, you’ll get an overview of AWS cost-optimization tools and strategies, like data storage optimization.
Cloud-Native Platform as a Service: AWS vs Azure vs GCP
Next, let’s take a look at the cloud-native PaaS SQL options.
By architecting database solutions natively for the cloud, the providers have SQL services with more advanced features such as active geo-replication, faster and broader vertical and horizontal scaling, higher availability, and serverless pricing.
Cloud-Native PaaS SQL databases compared: AWS vs Azure vs GCP
- AWS has Aurora, a MySQL and PostgreSQL compatible engine that boasts higher throughput on the same hardware.
- Azure has Azure SQL Database, an evergreen SQL server engine with various operating modes, including general purpose, business critical, elastic pools, hyper scale, and serverless.
- Google has Cloud Spanner, which is a proprietary database with some impressive properties.
Cloud-Native PaaS SQL databases — Monitoring and Availability
From a monitoring perspective, all providers offer good coverage, particularly Azure, with a comprehensive array of performance insights tooling.
All offerings have similar base SLAs. Azure lifts that higher with its business-critical tier. And it’s also the only one to have an on recovery point objective and recovery time objective.
But the most impressive is GCP’s Cloud Spanner with 99.999% availability for multi-region configuration. To be clear, that’s 26 seconds of allowed downtime per month.
All providers have automatic in-region fail-over and fast cross-region fail-over, which is automatic in Azure and GCP.
Cloud-Native PaaS SQL databases — Backup
From a backups perspective, all three providers have point-in-time restore, with AWS and Azure having a longer restore window and automated backups. And Azure having a long-term retention option outside of point-in-time restore.
Cloud Spanner only has manual backups, but it does have the ability to perform a stale read to surgically query accidentally deleted data, which is really cool.
Cloud-Native PaaS SQL databases — Scalability
Looking at scaling, GCP Cloud Spanner doesn’t really have a concept of vertical scaling since it’s fully horizontally scalable. All nodes are the same size.
Comparing Azure and AWS: Azure’s business-critical tier has an absolutely monster maximum vertical scale.
For storage, GCP is practically infinite since it ties storage to its horizontally scaled nodes, with a max storage per node. There are default node quotas that you can request those quotas increase. So in reality, the limit is based on your credit card.
Amazon Aurora is impressive with huge max storage and auto-growth. Whereas Azure is relatively lacking, with a much smaller max storage and no auto-growth. Although the hyperscale tier has an impressive max storage.
From a horizontal scale perspective, all three providers have an impressive set of capabilities. They all provide cross-region read replicas, sharding capabilities, and read-write replica options. The clear standout though is Cloud Spanner with automatic sharding and horizontally scalable read-write.
Both Azure and AWS offer serverless pricing, which includes auto-scale, auto-pause, and auto-resume capabilities with pay-for-use pricing, which is great for intermittent and non-production workloads.
Cloud-Native PaaS SQL databases — Security
From a security perspective, it’s a very similar story to what we covered for PaaS SQL databases, except Cloud Spanner doesn’t have client-side encryption.
NoSQL Databases (Key-Value and Document Databases)
Okay. Let’s switch gears and dive into the world of NoSQL, and more specifically key-value and document databases.
NoSQL for Grownups: DynamoDB Single-Table Modeling w/ Rick Houlihan
In this free on-demand webinar, Rick Houlihan, Sr. Practice Manager at AWS and inventor of single-table DynamoDB design, shows his tricks for modeling complex data access patterns in DynamoDB.
What are key-value databases?
Key-value databases are simple. You store data, the value, against a key lookup. You can generally store a huge amount of data while maintaining a fast lookup time. The value that you store usually consists of a varying set of properties with different data types.
What are document databases?
Document databases are similar to key-value databases in that you store a set of values, a document, against a key, the document ID, but these databases have sophisticated support for allowing more complex values to be stored, including arrays and child objects, as well as transactions, indexes, and queries based on those documents structures.
History of cloud-native NoSQL databases
There is a common theme among the cloud-native NoSQL key-value and document database services in that they have a long history with legacy services that have subsequently been deprecated.
- For instance, AWS started with SimpleDB, a simple key value store, in December 2007, and then introduced DynamoDB with a superior set of features, including documents support, in December 2012. It also has a newer service that is MongoDB compatible called DocumentDB.
- Azure introduced Table Storage, a simple key-value store, in October 2008. They then followed that up with DocumentDB, a document database, in August 2014, and subsequently they released Cosmos DB, a multi paradigm database supporting key value document graph and columnar stores in May 2017.
- GCP introduced their key-value database Cloud Bigtable in August 2014 after having used it internally since 2005. They then introduced Datastore, their first document database, in October 2008, which in combination with Firebase Realtime Database has been subsequently improved by Cloud Firestore, which was released in October 2017.
NoSQL databases: AWS vs Azure vs GCP
In this section, we’ll compare the following flagship key-value and document databases: Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Bigtable and Firestore.
Let’s start by looking at consumption pricing models and unique features.
NoSQL databases consumption
Because these databases are generally proprietary, the mechanisms that the cloud providers give you to consume them are important.
- All services provide a command line interface (CLI) and all but Bigtable have at least one user interface.
- All providers have a local emulator so that you could run a local instance of the database for development and testing activities without needing an internet connection.
- As you would expect, all services have a broad range of SDKs to choose from with some slight differences for which languages are supported.
NoSQL database pricing models and unique features
- Looking at the pricing models, Amazon DynamoDB and Azure Cosmos DB give you optionality, with support for both provisioned and serverless pricing models.
- GCP’s Bigtable only supports provisioned, and Firestore only supports serverless.
- Looking at unique features, DynamoDB has an in-memory cache integrated to speed up reads.
- Bigtable has an in-built Apache HBase API and Firestore has some unique properties from its file-based routes that make it ideally suited for building web and mobile apps directly against it, including offline sync and CDN data bundles.
NoSQL database key capabilities
NoSQL databases have a key set of capabilities around how they handle the data being stored within them.
- If we start by looking at the data model each service uses, we can see that Cosmo DB provides an extra parent level called Database Account.
- Cosmos DB, Bigtable, and Firestore all support a database level construct.
- All services have a table or collection concept, which contains rows or documents, which have property values.
- DynamoDB and Cosmos DB have explicit partition keys to allow for sharding and grouping rows.
- Bigtable has a unique property where individual cell values are versioned so you can get the history of a value.
- Firestore has a powerful feature where you can define nested collections within a document.
The type of values each database stores are reasonably similar across the services, apart from Bigtable, which treats all values as binary values.
- All services provide a level of support for transactions with some slight differences in the isolation level and scope the transactions can be executed against.
- All services provide optimistic concurrency, but Firestore allows for pessimistic concurrency too.
The consistency model is where there are some interesting differences.
- DynamoDB allows you to specify at read time, if you want eventual or strong consistency.
- Bigtable will give strong consistency if using a single cluster, or eventual consistency if using multiple clusters.
- Firestore always gives serializable isolation.
- Cosmos DB gives you ultimate flexibility, allowing you to choose from five different consistency levels for each read request with well-defined availability, latency, and throughput trade-offs.
- All services apart from Bigtable allow for data streaming.
NoSQL database limits
All NoSQL databases have limits that apply when you are storing and retrieving data. Each service has different underlying designs, which create various limit characteristics.
- For instance, the attributes on an item are limited in Bigtable and Firestore, but not in DynamoDB or Cosmos DB.
- All databases have a limit for partition keys and IDs in the low kilobytes.
- Looking at the next level up, an individual item, there’s a wild amount of variance, particularly with a relatively low maximum item size in DynamoDB, and a relatively huge item size in Bigtable. Although that size includes all versions of that item.
- The other limit related to items is the number or size of items allowed when performing a query and or transaction. DynamoDB has relatively small limits here, and Firestore and — in particular — Bigtable, have much larger limits.
- Finally, looking at the table or collection concept, while Cosmos DB and Firestore have no limits, by default DynamoDB and Bigtable do.
NoSQL database scalability
From a scalability perspective, all services support full horizontal scalability and therefore provide a massive amount of max storage. So this is largely constrained by your credit card.
- From an auto-scale perspective, the serverless pricing models in DynamoDB, Cosmos DB, and Firestore all automatically scale.
- Cosmos DB and DynamoDB also support auto-scaling in their provisioned pricing models through manually specified rules.
- Bigtable doesn’t support auto-scaling out of the box.
- All services support impressive throughput numbers, assuming you take advantage of their various partition mechanisms.
- The write performance on Firestore is relatively lacking compared to the other services though.
- Firestore is similarly lacking when it comes to latency, with no specified latency target, whilst the other services boast single digit millisecond response times.
- DynamoDB is notable here with its optional DAX capability, providing microsecond response times.
NoSQL database monitoring and availability
- From a monitoring perspective, all databases have integration to cloud monitoring services and they all have detailed performance insights, except for Firestore.
- All services have impressive cross-region availability of five nines with lower availability SLAs for single-region instances.
- Cosmos DB is the only service to have SLAs for throughput, consistency, and latency.
- All services have cross-region automatic fail-over, with Cosmos DB optionally allowing for manual failover.
NoSQL database backups
AWS and Azure are far and away in front when it comes to backups.
- Both DynamoDB and Cosmos DB provide point-in-time restore capabilities and automated backups out of the box.
- Surprisingly, Cosmos DB doesn’t provide the ability to take manual backups, whereas DynamoDB allows for full backups in seconds without affecting performance.
- GCP Bigtable and Firestore both allow for manually triggered backups.
NoSQL database security
The security story is pretty consistent with what we saw for the SQL databases, with good coverage across the services. And one surprising emission with no firewall or VNet support with Firestore. (Although Firestore has some powerful authentication capabilities for web and mobile apps via Firebase authentication and Cloud Firestore security rules.)
Cosmos DB provides a range of different mechanisms to authenticate, which gives it a lot of flexibility. And it also has a comprehensive security advisory via Azure Defender.
As we’ve seen, database options across AWS, Azure, and GCP are extensive and impressive. There are largely equivalent services across the cloud platforms in each category, but each has its own unique characteristics.
While we haven’t covered every single database service each cloud provider has to offer, hopefully the information here has provided a useful starting point to help you compare these different types of services across providers.
If you’d like to learn more about any of these cloud services, start a free trial with A Cloud Guru for videos and hands-on labs covering all of these services. And keep being awesome, cloud gurus!
- What’s the future of Microsoft SQL Server?
- Big data, NoSQL, and Google versus AWS
- Building aggregations with DynamoDB Streams
- Azure Cosmos DB APIs: Use cases and trade-offs
- Why Amazon DynamoDB Isn’t for Everyone
Build the skills you need for a better career.
Master modern tech skills, get certified, and level up your career. Whether you’re starting out or a seasoned pro, you can learn by doing and advance your career in cloud with ACG.
About the author
Rob is the Delivery Innovation Lead for Telstra Purple globally where he cultivates a culture of empowered and driven innovation, experimentation, continuous improvement and knowledge sharing.
Rob has a proven track record of successfully delivering software and organisational change projects with a particular focus on accelerated business value delivery, measurement and realisation and he has also been a proponent for and leader towards positive cultural change and gradual continuous improvement across a number of organisations, teams and projects. Rob specialises in leading teams and helping businesses successfully innovate and change by focusing on the delivery of business value using Agile and Lean values and principles and by adopting DevOps and Continuous Delivery practices. He has a keen interest in mobile, web and cloud software development and Intellectual Property (IP) strategy.