Scaling out databases in Azure with sharding

Scaling out (or sharding) by adding more databases usually requires careful planning and provisioning to ensure even distribution of data. It also adds more administrative overhead, and increases the number of points of failure. In this respect, Azure SQL databases are the perfect candidates for sharding because they can be created or deleted on demand, provide near-zero administration, and have built-in fault tolerance.


The challenges to scaling out relational database management systems are well known, and the patterns for sharding are well developed. For on-premise solutions, there are limited opportunities for scaling-up (i.e. increasing hardware resources), so most of the focus is on the design and maintenance of distributed databases. This brings a host of challenges which can result in the data elasticity capabilities of the platform influencing the design of the data model. The more flexibility you have to place and move your shards as data changes, the more freedom you have to design a database that meets your application’s requirements but can still be scaled. Azure Elastic Scale provides this flexibility.


So why did Microsoft choose to retire Azure SQL Database Federations and how does Elastic Scale address the lessons they learnt?


Azure Federations was a bold attempt to implement an elastic scale-out model in the database tier with built-in support for sharding. I was an early (and enthusiastic) adopter but I soon realised that it had a number of serious limitations. For me, the key issues were that sharded databases (Federation Members) could only be accessed via a logical container database (FederationRoot), restrictions were imposed on the design of database tables, and there was no built-in support for fan-out queries. Azure Federations are being retired with Web and Business Editions of Azure SQL Databases. The ‘nail in the coffin’ for Azure Federations was simply that this custom implementation of Azure SQL Databases could not continue to be supported in the rapidly evolving Azure world.


With Azure SQL Database Elastic Scale (to give it its proper title), Microsoft have gone back to basics and provided the functional requirements for developing and managing sharded databases without compromising the structure or integrity of the relational database. Data dependent routing and multi-shard querying capability has been achieved with nothing more than mapping data stored in a management database and cached locally, together with a set of .NET APIs. In fact, the Azure Elastic Scale shard map management and client APIs will work equally well with on-premise SQL Servers.


The other key decision that Microsoft made was to take advantage of existing Azure services to provide the scaling functionality rather than build something proprietary. Horizontal scaling is provided by Azure Cloud Services running worker and web roles that will split or merge databases on a range of sharding keys, while controlling client connections to the data being moved. Vertical scaling is provided by Azure Automation Accounts with scheduled PowerShell runbooks monitoring the sharded databases and scaling up or down based on telemetry data and custom rules and actions.


Elastic Scale


One final consideration: Elastic Scale gives us the tools to scale databases, but your application will still need to meet the usual criteria for sharding databases and be designed accordingly, particularly with regard to transaction processing. The good news is that Elastic Scale will support all the familiar multi-tenant data architectures, simplify development with the client APIs, and provide dynamic elasticity with the scaling options.

Written by Nicholas Revell at 00:00

Categories :



Comments closed