Self-Service Infrastructure
Let Teams Deploy Without Filing Tickets
The defining characteristic of a mature platform is that developers can go from "I need infrastructure" to "my infrastructure is running" without any human intervention from the platform team. This does not mean giving developers admin access to AWS. It means exposing curated, policy-compliant infrastructure primitives through simple interfaces that handle the complexity behind the scenes.
The Abstraction Spectrum
There is a spectrum from "raw cloud console" to "fully abstracted PaaS," and you need to figure out where your organization should land:
Level 0: Ticket-Based. Developer files a Jira ticket, infra team provisions manually. Lead time: days to weeks. This is where most organizations start, and where too many stay far longer than they should.
Level 1: Terraform Modules. Platform team publishes versioned Terraform modules with sensible defaults. Developers add a module reference to their repo, the PR gets reviewed, and CI/CD applies the plan. Lead time: hours.
Level 2: Internal APIs. Platform team exposes infrastructure as Kubernetes custom resources (via Crossplane) or internal REST APIs. Developers declare what they need in a YAML file or through a portal form. Provisioning is fully automated. Lead time: minutes.
Level 3: Intent-Based. Developers express intent ("I need a data store for 10K reads/second with 99.9% availability") and the platform selects the appropriate infrastructure. This is where Google and Netflix operate. Lead time: seconds.
Most organizations should target Level 2. Resist the temptation to jump to Level 3 prematurely. It requires a level of operational maturity that takes years to build.
Crossplane vs Terraform Modules
Crossplane shines when your platform is Kubernetes-native. It extends the Kubernetes API with custom resources for cloud infrastructure (RDS databases, S3 buckets, CloudFront distributions). Developers use kubectl apply or GitOps workflows they already know. The reconciliation loop automatically detects and corrects drift. The downside is real though: composition authoring and debugging have a steep learning curve, and the error messages can be cryptic.
Terraform Modules work well for organizations that are not all-in on Kubernetes or need to support multiple provisioning workflows. Publish modules to a private registry with strict versioning. Use Atlantis or Spacelift for plan/apply workflows integrated into pull requests. The downside here is no built-in drift detection and increasing state management complexity at scale.
Guardrails for Self-Service
Self-service without guardrails is just a cloud bill explosion waiting to happen. You need several layers of protection:
- Cost policies: Maximum instance sizes, required auto-scaling configurations, tagging enforcement for cost allocation
- Security baselines: No public endpoints by default, encryption at rest required, network policies for inter-service communication
- Compliance: Audit logging enabled, data residency constraints, retention policies for regulated data
- Lifecycle management: TTL on non-production environments, automated cleanup of orphaned resources, cost anomaly alerts
Implement all of these as policy-as-code (OPA, Kyverno, Sentinel) and enforce them in the provisioning pipeline. Manual review steps do not scale, and they defeat the purpose of self-service.
Measuring Self-Service Maturity
Track these numbers: provisioning lead time (ticket to running infrastructure), percentage of infrastructure requests that require zero human intervention, developer satisfaction with infrastructure workflows, cloud cost efficiency (cost per transaction/request), and infrastructure drift incidents per month. If you are not measuring, you are guessing.
Key Points
- •Self-service means developers can provision, configure, and manage infrastructure without filing tickets or waiting for a platform team response
- •Crossplane and Terraform modules are the two dominant approaches: Crossplane for Kubernetes-native declarative APIs, Terraform modules for broader cloud coverage
- •Expose infrastructure as versioned APIs with sensible defaults. Developers request 'a PostgreSQL database for my staging environment,' not 'an RDS instance with specific VPC, subnet, and security group configurations.'
- •Guardrails are essential. Use OPA/Kyverno policies to enforce cost limits, security baselines, and compliance requirements without manual approval gates.
- •Track provisioning time as a key metric. If it takes more than 15 minutes to get a new environment, you are not self-service yet.
Common Mistakes
- ✗Giving developers raw Terraform with no abstraction. They should not need to understand VPC peering to get a database.
- ✗Building self-service without cost visibility. Teams will over-provision if they cannot see the dollar impact of their infrastructure choices.
- ✗Skipping the cleanup problem. Self-service provisioning without automated deprovisioning creates cloud waste that compounds monthly.
- ✗Making the self-service interface a web form that generates a ticket behind the scenes. This is self-service theater, not actual self-service.