Container Runtime Platform

Managed Kubernetes Decisions

The self-managed Kubernetes debate is over for most organizations. Unless you have specific compliance requirements or are running at massive scale (5,000+ nodes), use a managed offering. The real question is which one.

GKE is the most polished. Autopilot mode abstracts away node management entirely. You define pods and GKE handles compute. Release channels keep clusters updated. The downside is vendor lock-in to GCP.

EKS gives you more control and works well in AWS-heavy environments. You manage node groups, choose your CNI plugin (VPC CNI is default), and handle more operational details. EKS Anywhere extends this to on-prem. Cost is $0.10/hr per cluster ($73/month) plus EC2 compute.

AKS is the strongest option on Azure. It's free for the control plane (you pay only for nodes), has tight integration with Azure AD for RBAC, and supports virtual nodes via Azure Container Instances for burst workloads.

Node Pool Design

Design node pools around workload characteristics. A typical production cluster has three to four node pools. General-purpose nodes (m5.xlarge or equivalent) handle most workloads. Memory-optimized nodes (r5.xlarge) run databases, caches, and JVM applications. Spot/preemptible nodes run batch jobs, CI runners, and stateless services that handle interruption gracefully. GPU nodes (p3 or g4dn instances) serve ML inference workloads.

Use taints and tolerations to control pod placement. Spot nodes get a taint that only workloads explicitly configured for spot instances will tolerate. GPU nodes get a taint so only ML workloads land there. Without taints, the scheduler will pack expensive GPU nodes with web servers.

ARM-based nodes (Graviton on AWS, Tau T2A on GCP) offer 20-40% better price-performance for compatible workloads. Most Go, Java, and Node.js applications run on ARM without code changes. Multi-arch container builds using Docker buildx make this transparent.

Multi-Cluster Strategy

One cluster is a single point of failure. Two clusters give you redundancy and a migration path for upgrades. The common pattern is active-active across two regions or active-passive with one production and one DR cluster.

Multi-cluster management tools matter at scale. Cluster API provides declarative cluster lifecycle management. Fleet (Rancher) and ArgoCD's ApplicationSet handle multi-cluster deployments. The platform team provisions and manages clusters. Application teams deploy to clusters through the platform's abstractions without knowing cluster details.

Runtime Security

Container security operates at multiple layers. Image scanning (Trivy, Grype) catches known vulnerabilities before deployment. Admission controllers (OPA Gatekeeper, Kyverno) enforce policies at deploy time: no running as root, no privileged containers, no images from untrusted registries.

Runtime security is the layer most teams skip. Falco monitors system calls inside containers and alerts on suspicious behavior: a shell spawned in a production container, unexpected network connections, file access outside normal patterns. This catches attacks that pass image scanning because the vulnerability is exploited after deployment.

Seccomp profiles and AppArmor restrict what system calls containers can make. The RuntimeDefault seccomp profile blocks 44 dangerous syscalls and should be applied to all workloads. Most applications work fine with it.

Image Management

Run a private container registry (ECR, Artifact Registry, Harbor) with automated vulnerability scanning on push. Enforce image signing with Cosign or Notary so only verified images run in production. Implement garbage collection policies that keep the last 10 tags per repository and delete untagged images after 7 days. Without garbage collection, registries grow to terabytes within a year.

Use semantic versioning for image tags. Never use latest in production manifests. Pin to exact digests or version tags so every deployment is reproducible and rollbacks are a simple tag change.

Managed Kubernetes Decisions

Node Pool Design

Multi-Cluster Strategy

Runtime Security

Image Management

Use semantic versioning for image tags. Never use latest in production manifests. Pin to exact digests or version tags so every deployment is reproducible and rollbacks are a simple tag change.

Managed Kubernetes Decisions

Node Pool Design

Multi-Cluster Strategy

Runtime Security

Image Management

Key Points

Common Mistakes

Related Topics

Container Runtime Platform

Managed Kubernetes Decisions

Node Pool Design

Multi-Cluster Strategy

Runtime Security

Image Management

Key Points

Common Mistakes

Related Topics