Designing a governance-first, multi-subscription Azure foundation for 50+ product teams — with automated subscription vending, 200+ Policy definitions, hub-and-spoke networking, and 100% infrastructure as code from day one.
A large telecommunications organisation had been expanding its Azure footprint rapidly for three years. What started as a handful of pilot subscriptions had grown to over 140 active subscriptions across dozens of business units, with no centralised governance model to speak of. Teams provisioned resources ad-hoc through the Azure portal, RBAC assignments were inconsistent and undocumented, and cost management was a quarterly fire drill involving three teams and a spreadsheet.nnThere was no Management Group hierarchy beyond the root tenant. Every subscription was a peer of every other. Azure Policy was used in a handful of subscriptions to enforce tagging u2014 inconsistently, with different tag schemas in different parts of the business. Security baselines were aspirational documents rather than enforced controls.nnThe business had decided to move its core network platform and two flagship digital products to Azure over the next 18 months. The scale of what was coming made the existing approach untenable. A proper Azure Landing Zone u2014 built to the Microsoft Cloud Adoption Framework u2014 was the prerequisite for everything else.
The first failure was flat subscription topology. With no Management Group hierarchy, applying policy at scale was impossible. Every governance change had to be made subscription-by-subscription u2014 a process that took weeks and produced inconsistent results.nnThe second was ungoverned RBAC. A spot audit found 340 Owner-level role assignments across the estate, most of them granted during project kickoffs and never reviewed. Several former contractors still held active role assignments.nnThe third was no network baseline. Teams had provisioned VNets with overlapping address spaces, making future peering or interconnection impossible without re-IPing. Some workloads were running with public endpoints that had no justification.nnThe fourth was cost opacity. No tagging standard existed. Finance could not attribute cloud spend to cost centres with any accuracy. Showback reports took three days to produce and were immediately disputed.nnThe fifth was compliance drift. Security baselines existed as Word documents. There was no mechanism to detect when a resource deviated from baseline, and no remediation process when deviations were found.
Before any technical work began, the architecture team agreed on five non-negotiable design principles that would govern every decision in the project.nnPolicy as the enforcement mechanism, not documentation. Every security and governance requirement would be expressed as an Azure Policy definition. If it could not be expressed as a policy, it would not be a requirement.nnInfrastructure as code from day one, with no manual exceptions. The platform team would use Bicep for all platform-layer resources. Product teams would be required to use IaC for their workloads as a condition of subscription vending.nnSubscription as the unit of workload isolation. One product, one subscription for production. Shared subscriptions would not be permitted for new workloads u2014 only for shared platform services.nnNetwork topology decided upfront, locked by policy. Address space allocation would be centralised. Teams could not create VNets with unregistered address ranges. The hub-and-spoke topology would be deployed before any spoke subscriptions were onboarded.nnAutomation over process. Every repeatable operation u2014 subscription vending, RBAC assignment, policy exemption u2014 would have an automated workflow. Manual steps in recurring processes were treated as technical debt from the moment they were identified.
The Management Group hierarchy was the first deliverable, because everything else u2014 policy assignment, RBAC delegation, subscription placement u2014 depends on it being right. We designed a four-layer hierarchy.nnAt the root sat the Tenant Root Management Group, where only emergency break-glass policies were assigned. Directly beneath sat a single top-level Management Group for the organisation, separating the company's subscriptions from any Microsoft or partner subscriptions in the tenant.nnThe second layer divided the estate into Platform and Landing Zones. Platform contained connectivity, identity, and management subscriptions. Landing Zones contained all workload subscriptions.nnThe third layer divided Landing Zones into Corp (internally connected workloads requiring hub connectivity) and Online (internet-facing workloads with no corporate network dependency). This distinction drove fundamentally different network and security policy sets.nnThe fourth layer added environment rings u2014 Prod, NonProd, and Sandbox u2014 under each of Corp and Online. Sandbox subscriptions had relaxed policies to allow experimentation. NonProd had most production controls minus a few cost-optimisation exceptions. Prod was fully locked.nnThis four-layer structure meant that a policy assigned at the Landing Zones level applied to all 50+ product subscriptions automatically. A policy assigned at Corp/Prod applied only to production workloads with corporate network connectivity. Inheritance eliminated the per-subscription configuration problem entirely.
Deployed the four-layer Management Group structure. Assigned 47 built-in Azure Policy initiatives at appropriate hierarchy levels u2014 including Microsoft Cloud Security Benchmark, regulatory compliance initiatives for the organisation's applicable standards, and custom initiative covering internal requirements not addressed by built-ins. All policy assignments used managed identities for remediation tasks.
Deployed the connectivity subscription with a hub VNet in two Azure regions. Azure Firewall Premium with TLS inspection handles all east-west and north-south traffic. Azure Route Server enables BGP connectivity to the on-premises network via ExpressRoute. A centralised private DNS resolver handles all private endpoint DNS resolution across the estate. Each spoke subscription receives a pre-sized VNet from the centralised address space registry u2014 no team can self-provision an address range.
Built a subscription vending pipeline using Azure DevOps and Bicep. A product team submits a YAML configuration file via pull request specifying workload name, environment, cost centre, expected resource types, and network requirements. The pipeline validates the request, provisions the subscription, places it in the correct Management Group, assigns RBAC to the team's Entra ID group, deploys the spoke VNet and peers it to the hub, and creates the budget alert. End-to-end time: under 20 minutes.
Defined a mandatory tagging schema: CostCentre, Environment, WorkloadName, Owner, DataClassification, and SupportTier. Azure Policy enforce mode denies resource creation without required tags. A custom policy initiative validates tag values against allowed enumerations u2014 not just presence. Cost Management budgets are created automatically by the subscription vending pipeline, with 80% and 100% spend alerts routing to the team's Slack channel and the FinOps team.
Enabled Defender for Cloud at the Management Group level, covering all subscriptions automatically including newly vended ones. Defender CSPM provides continuous security posture assessment across the estate. Defender plans for Servers, Containers, SQL, Key Vault, and Storage are enabled by policy u2014 product teams cannot disable them. Security alerts route to the central Sentinel workspace. A custom workbook provides the platform team with a single-pane view of posture score, active recommendations, and compliance status across all subscriptions.
Built a shared Bicep module library covering the most common platform patterns: AKS cluster with security hardening, SQL MI with private endpoint and CMK, App Service Environment, API Management, and Azure Container Apps. Each module encodes the organisation's security baseline u2014 private endpoints, diagnostic settings, CMK where applicable, tagging u2014 so product teams get compliant infrastructure by default without needing to know every policy requirement. Modules are published to an internal Bicep registry in the platform subscription.
Every policy definition went through a mandatory audit phase before being switched to deny or deployIfNotExists enforce mode. This was not optional and not negotiable, even when the business wanted faster progress.nnThe reason was simple: enforce mode on a misconfigured policy definition can break deployments across hundreds of subscriptions simultaneously. We had three incidents during the audit phase where a policy definition would have blocked legitimate workloads if it had been deployed in enforce mode. Catching those in audit mode cost us a week each time. Catching them in enforce mode would have cost us that time plus an incident, a war room, and a trust deficit with product teams that would have taken months to recover.nnThe audit-first principle also gave product teams time to achieve compliance before the control was enforced. Teams were notified of non-compliant resources during the audit phase with a remediation deadline. When enforce mode was activated, the vast majority of existing resources were already compliant.
Plan your Management Group hierarchy before creating a single subscription. Restructuring a hierarchy after workloads are deployed means moving subscriptions between Management Groups, which breaks policy inheritance, requires RBAC reassignment, and creates temporary compliance gaps. The hierarchy decision is the most consequential architectural choice in the entire project and it costs nothing to get right upfront.nnPolicy-as-code from day one is not a best practice u2014 it is the only way to govern at scale. Manual policy assignment does not scale beyond a handful of subscriptions. By the time you have 20 subscriptions, you have already lost control of manual governance. The pipeline is the governance.nnThe subscription vending pipeline was the single highest-leverage investment. Every hour spent automating subscription provisioning paid back in weeks because it made the right path the easy path. Product teams who previously spent three weeks waiting for a subscription now had one in 20 minutes, and it arrived already compliant. The automation created alignment that no policy document ever could.nnAddress space planning deserves a dedicated session with every stakeholder before deployment. We spent three days resolving conflicts between teams who had already provisioned overlapping VNet address spaces in the old estate. The migration work from overlapping to non-overlapping address spaces in those subscriptions took six weeks and delayed two product launches.nnDefender for Cloud at Management Group level is a genuine force multiplier. Enabling it once and having it automatically cover every new subscription u2014 including posture assessment, regulatory compliance mapping, and workload protection plans u2014 is worth the investment regardless of the maturity of the rest of your security programme.nnThe organisational change was harder than the technical design. The platform team had to say no to feature requests from product teams, hold the line on IaC requirements when teams pushed back, and maintain consistent standards when business pressure mounted. Technical governance without organisational backing is just documentation.