Security & EncryptionTopic 9 of 9

Security & EncryptionAdvanced

mTLS — Mutual Authentication

mTLSTLS 1.2TLS 1.3SPIFFE

mTLS is TLS where both sides show their ID — the server proves it is the real server, and the client proves it is an authorized caller.

The Problem

In a microservices architecture with hundreds of services, how does the system ensure that only authorized services talk to each other? Network-level controls (firewalls, security groups) are too coarse — any service in the same network segment can reach any other. mTLS solves this by requiring every service to cryptographically prove its identity on every connection.

Mental Model

Like two people showing each other their ID cards before having a conversation. In regular TLS, only the server shows ID (like a store showing its license). In mTLS, both sides show ID — neither will talk until they have verified who the other is.

Architecture Diagram

How It Works

In standard TLS, only the server authenticates itself — the server presents a certificate, the client verifies it, and the encrypted connection is established. The server has no cryptographic proof of who the client is. It might check an API key or JWT in the HTTP layer, but at the transport layer, any client can connect.

Mutual TLS (mTLS) adds a second authentication step: the server also requests a certificate from the client during the TLS handshake. Both sides must present valid certificates signed by a trusted CA. If either side fails verification, the connection is rejected before any application data flows.

The mTLS Handshake

The mTLS handshake extends the standard TLS handshake with additional messages:

ClientHello — Same as standard TLS. Client sends supported versions and cipher suites.
ServerHello + Server Certificate — Server sends its certificate chain, same as standard TLS.
CertificateRequest — This is the mTLS-specific addition. The server sends a list of trusted CA distinguished names, telling the client "I will only accept client certificates signed by these CAs."
Client Certificate — The client sends its certificate chain. If the client has no valid certificate from an acceptable CA, it can send an empty certificate message (which the server may reject depending on configuration).
CertificateVerify — The client proves it owns the private key corresponding to its certificate by signing a hash of the handshake transcript.
Finished — Both sides verify the handshake integrity, same as standard TLS.

# Test mTLS connection with openssl
openssl s_client -connect service.internal:443 \
  -cert client.crt \
  -key client.key \
  -CAfile ca-bundle.crt

# Generate a client certificate signed by the internal CA
openssl req -new -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
  -nodes -keyout client.key -out client.csr \
  -subj "/CN=payment-service/O=mycompany"

openssl x509 -req -in client.csr -CA intermediate-ca.crt -CAkey intermediate-ca.key \
  -CAcreateserial -out client.crt -days 1 -sha256 \
  -extfile <(echo "extendedKeyUsage=clientAuth")

Why Not Just Use API Keys?

API keys, JWTs, and OAuth tokens all authenticate at Layer 7 (application layer). They work, but they have fundamental limitations that mTLS addresses:

Property	API Keys / JWTs	mTLS
Authentication layer	Application (L7)	Transport (L4/L5)
Encrypted in transit	Only if TLS is configured	Inherently encrypted
Replay protection	Must be implemented separately	Built into TLS
Key rotation	Application must handle	Sidecar/mesh handles transparently
Performance	Parse on every request	Authenticated once per connection
Identity granularity	Per-token	Per-workload (certificate)

mTLS does not replace application-layer auth — it complements it. mTLS answers "is this a legitimate service?" while JWTs answer "does this service have permission for this specific action?"

SPIFFE and Workload Identity

Managing certificates manually works for 5 services. It does not work for 500. SPIFFE (Secure Production Identity Framework for Everyone) provides a standardized way to assign cryptographic identities to workloads.

SPIFFE IDs

Every workload gets a SPIFFE ID — a URI that identifies it:

spiffe://production.example.com/payment-service
spiffe://production.example.com/order-service
spiffe://staging.example.com/payment-service

The format is spiffe://<trust-domain>/<workload-path>. The trust domain is typically the organization, and the workload path identifies the specific service.

SPIFFE Verifiable Identity Documents (SVIDs)

A SVID is a cryptographic document that proves a workload's SPIFFE ID. The most common form is an X.509 certificate where the SPIFFE ID is encoded in the Subject Alternative Name (SAN) URI field:

X509v3 Subject Alternative Name:
    URI:spiffe://production.example.com/payment-service

SPIRE: The SPIFFE Runtime Environment

SPIRE is the production implementation of SPIFFE. It consists of:

SPIRE Server — The central authority that defines which workloads get which identities, based on attestation policies.
SPIRE Agent — Runs on each node, performs workload attestation (verifying the workload is what it claims to be), and delivers SVIDs to authorized workloads.

SPIRE can attest workloads using multiple signals: Kubernetes service account, AWS IAM role, Docker labels, binary path, or any combination. This means there is no need to pre-provision certificates — SPIRE dynamically issues them based on the workload's actual runtime identity.

# SPIRE registration entry — defines who gets what identity
spire-server entry create \
  -spiffeID spiffe://example.com/payment-service \
  -parentID spiffe://example.com/node-agent \
  -selector k8s:ns:production \
  -selector k8s:sa:payment-service \
  -ttl 3600  # 1-hour certificate lifetime

Service Mesh mTLS

The most common way to deploy mTLS in production is through a service mesh. The mesh handles certificate issuance, rotation, and enforcement transparently — application code is completely unaware.

Istio mTLS

Istio injects an Envoy sidecar proxy next to every pod. All traffic in and out of the pod flows through Envoy, which handles mTLS automatically:

istiod (the control plane) acts as the CA, issuing short-lived certificates to each Envoy sidecar.
When Pod A calls Pod B, Pod A's Envoy initiates an mTLS connection to Pod B's Envoy.
Both Envoys verify each other's certificates. The SPIFFE IDs in the certificates identify the workloads.
Authorization policies can then reference these identities:

# Istio PeerAuthentication — require mTLS for all services in the namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # Reject any non-mTLS connections

---
# Istio AuthorizationPolicy — only payment-service can call billing-service
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: billing-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: billing-service
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/payment-service"

Linkerd mTLS

Linkerd takes a simpler approach. It uses its own identity system (also SPIFFE-based) and automatically enables mTLS between all meshed pods with zero configuration. There is no PERMISSIVE mode — if both sides are meshed, mTLS happens automatically.

# Check mTLS status for a pod in Linkerd
linkerd viz stat deploy/payment-service -o wide
# Shows "SECURED" column indicating mTLS percentage

# View the identity of connected peers
linkerd viz edges deploy

Certificate Rotation at Scale

In a microservice environment with hundreds of services, certificate rotation is the hardest operational challenge. If certificates expire and are not rotated, services cannot communicate. If rotation requires restarts, the result is cascading failures.

The Short-Lived Certificate Pattern

The modern approach is to use very short-lived certificates — typically 1 to 24 hours. This eliminates several problems:

No revocation needed — By the time a compromise is detected, the certificate has already expired.
Reduced blast radius — A stolen certificate is only useful for hours, not years.
Forced automation — Manually managing certificates that expire in hours is not viable. Automation is required from day one.

SPIRE issues certificates with configurable TTLs (often 1 hour) and automatically rotates them before expiry. The workload receives the new certificate through the SPIFFE Workload API and can reload it without restarting.

Graceful Rotation

Certificate rotation must handle the transition period where some connections use the old certificate and others use the new one. The standard approach:

Issue the new certificate before the old one expires (overlap period).
Update the trust bundle to accept both old and new CA certificates during rotation.
Start presenting the new certificate.
After all old connections drain, remove the old certificate from the trust bundle.

# Vault PKI — issue a short-lived certificate
vault write pki/issue/my-role \
  common_name="payment-service.internal" \
  ttl="1h" \
  alt_names="payment-service.production.svc.cluster.local"

Debugging mTLS Issues

mTLS failures are notoriously hard to debug because the error messages are vague. Here is a systematic approach:

# Step 1: Verify the server requires client certificates
openssl s_client -connect service:443 </dev/null 2>&1 | grep "Acceptable client certificate"
# If this is empty, the server is not requesting client certs

# Step 2: Verify the client certificate is valid
openssl x509 -in client.crt -noout -text | grep -E "Issuer|Subject|Not After|Extended Key Usage"
# Ensure: Not expired, correct issuer, has clientAuth EKU

# Step 3: Test the full mTLS connection
openssl s_client -connect service:443 \
  -cert client.crt -key client.key -CAfile ca.crt \
  -verify 4 -verify_return_error

# Step 4: In Kubernetes with Istio
istioctl proxy-config secret <pod-name> -n <namespace>
# Shows the active certificate, its expiry, and the trust bundle

# Step 5: Check for common error patterns
# "certificate required" — Client did not send a certificate
# "bad certificate" — Client cert is invalid or not trusted
# "certificate expired" — Self-explanatory
# "unknown ca" — Server does not trust the CA that signed the client cert

The most common mTLS failure in practice is a trust bundle mismatch — the server's list of trusted CAs does not include the CA that signed the client's certificate. This often happens during CA rotation when the new CA certificate has not been distributed to all servers yet.

Key Points

•Standard TLS only authenticates the server. mTLS adds client authentication, creating a two-way identity verification.
•Service meshes like Istio and Linkerd automate mTLS transparently — application code never touches certificates.
•SPIFFE provides a standardized workload identity framework, and SPIRE is its production-grade implementation.
•Short-lived certificates (hours, not years) reduce the blast radius of key compromise and often eliminate the need for revocation.
•mTLS is the foundation of zero trust networking — every connection must prove identity, regardless of network location.

Key Components

Component	Role
Client Certificate	Proves the identity of the calling service or user, presented during the TLS handshake
Server Certificate	Proves the identity of the server, same as standard TLS — but now both sides authenticate
CertificateRequest Message	Server's TLS handshake message asking the client to present a certificate and specifying acceptable CAs
SPIFFE ID	A standardized identity format (spiffe://trust-domain/workload) embedded in X.509 SVIDs for workload identity
Certificate Rotation	Automated replacement of expiring certificates without service restarts, critical at microservice scale

When to Use

Use mTLS for all service-to-service communication in production microservice architectures. It is especially important in zero trust environments where network location does not imply trust. Use a service mesh to automate it — manual mTLS management does not scale beyond a handful of services.

Tool Comparison

Tool	Type	Best For	Scale
Istio	Open Source	Full service mesh with automatic mTLS, traffic management, and observability in Kubernetes	Enterprise
Linkerd	Open Source	Lightweight service mesh focused on simplicity, automatic mTLS with minimal resource overhead	Small-Enterprise
SPIRE	Open Source	Standalone workload identity and certificate issuance without requiring a full service mesh	Small-Enterprise
HashiCorp Vault PKI	Open Source	Private CA with dynamic certificate issuance, fine-grained policies, and multi-cloud support	Enterprise

Debug Checklist

Verify the client certificate is being sent: openssl s_client -connect host:443 -cert client.crt -key client.key.
Check the server's CertificateRequest to see which CAs it trusts: openssl s_client -connect host:443 -state 2>&1 | grep -A5 'Acceptable client certificate CA names'.
Confirm certificate validity and expiry on both sides: openssl x509 -in cert.pem -noout -dates -subject.
In Istio, check mTLS status: istioctl authn tls-check <pod> — shows if mTLS is STRICT or PERMISSIVE.
Look for 'certificate required' or 'bad certificate' errors in server logs — these indicate the client is not presenting a valid cert.

Common Mistakes

Implementing mTLS at the application level instead of using a sidecar proxy or service mesh, creating massive maintenance burden.
Using long-lived client certificates (years) that become impossible to rotate without coordinated downtime.
Not validating the full certificate chain on both sides — just checking that a certificate exists is not enough.
Forgetting to handle certificate rotation gracefully, causing connection drops when certs are renewed.
Hardcoding trust anchors instead of loading them from a dynamic trust bundle that can be updated without redeployment.

Real World Usage

•Google's BeyondCorp uses mTLS for every internal service-to-service call — network location grants zero implicit trust.
•Netflix uses mTLS across all microservices, with certificates issued by their internal CA and rotated every few hours.
•Uber's microservice platform uses SPIFFE-based identity with mTLS for all inter-service communication.
•Stripe uses mTLS for all internal APIs, with automatic certificate provisioning and rotation via their platform team.
•Kubernetes API server supports mTLS for kubelet authentication, ensuring only authorized nodes join the cluster.

RFCs & Specs

RFC 8446 — TLS 1.3 (Client Authentication)RFC 5246 — TLS 1.2 (Client Certificate)SPIFFE Specification — Secure Production Identity Framework for EveryoneRFC 5280 — X.509 Certificate Profile

mTLS — Mutual Authentication

mTLSTLS 1.2TLS 1.3SPIFFE

mTLS is TLS where both sides show their ID — the server proves it is the real server, and the client proves it is an authorized caller.

The Problem

Mental Model

Architecture Diagram

How It Works

The mTLS Handshake

The mTLS handshake extends the standard TLS handshake with additional messages:

ClientHello — Same as standard TLS. Client sends supported versions and cipher suites.
ServerHello + Server Certificate — Server sends its certificate chain, same as standard TLS.
CertificateRequest — This is the mTLS-specific addition. The server sends a list of trusted CA distinguished names, telling the client "I will only accept client certificates signed by these CAs."
Client Certificate — The client sends its certificate chain. If the client has no valid certificate from an acceptable CA, it can send an empty certificate message (which the server may reject depending on configuration).
CertificateVerify — The client proves it owns the private key corresponding to its certificate by signing a hash of the handshake transcript.
Finished — Both sides verify the handshake integrity, same as standard TLS.

# Test mTLS connection with openssl
openssl s_client -connect service.internal:443 \
  -cert client.crt \
  -key client.key \
  -CAfile ca-bundle.crt

# Generate a client certificate signed by the internal CA
openssl req -new -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
  -nodes -keyout client.key -out client.csr \
  -subj "/CN=payment-service/O=mycompany"

openssl x509 -req -in client.csr -CA intermediate-ca.crt -CAkey intermediate-ca.key \
  -CAcreateserial -out client.crt -days 1 -sha256 \
  -extfile <(echo "extendedKeyUsage=clientAuth")

Why Not Just Use API Keys?

API keys, JWTs, and OAuth tokens all authenticate at Layer 7 (application layer). They work, but they have fundamental limitations that mTLS addresses:

Property	API Keys / JWTs	mTLS
Authentication layer	Application (L7)	Transport (L4/L5)
Encrypted in transit	Only if TLS is configured	Inherently encrypted
Replay protection	Must be implemented separately	Built into TLS
Key rotation	Application must handle	Sidecar/mesh handles transparently
Performance	Parse on every request	Authenticated once per connection
Identity granularity	Per-token	Per-workload (certificate)

mTLS does not replace application-layer auth — it complements it. mTLS answers "is this a legitimate service?" while JWTs answer "does this service have permission for this specific action?"

SPIFFE and Workload Identity

SPIFFE IDs

Every workload gets a SPIFFE ID — a URI that identifies it:

spiffe://production.example.com/payment-service
spiffe://production.example.com/order-service
spiffe://staging.example.com/payment-service

The format is spiffe://<trust-domain>/<workload-path>. The trust domain is typically the organization, and the workload path identifies the specific service.

SPIFFE Verifiable Identity Documents (SVIDs)

A SVID is a cryptographic document that proves a workload's SPIFFE ID. The most common form is an X.509 certificate where the SPIFFE ID is encoded in the Subject Alternative Name (SAN) URI field:

X509v3 Subject Alternative Name:
    URI:spiffe://production.example.com/payment-service

SPIRE: The SPIFFE Runtime Environment

SPIRE is the production implementation of SPIFFE. It consists of:

SPIRE Server — The central authority that defines which workloads get which identities, based on attestation policies.
SPIRE Agent — Runs on each node, performs workload attestation (verifying the workload is what it claims to be), and delivers SVIDs to authorized workloads.

# SPIRE registration entry — defines who gets what identity
spire-server entry create \
  -spiffeID spiffe://example.com/payment-service \
  -parentID spiffe://example.com/node-agent \
  -selector k8s:ns:production \
  -selector k8s:sa:payment-service \
  -ttl 3600  # 1-hour certificate lifetime

Service Mesh mTLS

Istio mTLS

Istio injects an Envoy sidecar proxy next to every pod. All traffic in and out of the pod flows through Envoy, which handles mTLS automatically:

istiod (the control plane) acts as the CA, issuing short-lived certificates to each Envoy sidecar.
When Pod A calls Pod B, Pod A's Envoy initiates an mTLS connection to Pod B's Envoy.
Both Envoys verify each other's certificates. The SPIFFE IDs in the certificates identify the workloads.
Authorization policies can then reference these identities:

# Istio PeerAuthentication — require mTLS for all services in the namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # Reject any non-mTLS connections

---
# Istio AuthorizationPolicy — only payment-service can call billing-service
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: billing-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: billing-service
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/production/sa/payment-service"

Linkerd mTLS

# Check mTLS status for a pod in Linkerd
linkerd viz stat deploy/payment-service -o wide
# Shows "SECURED" column indicating mTLS percentage

# View the identity of connected peers
linkerd viz edges deploy

Certificate Rotation at Scale

The Short-Lived Certificate Pattern

The modern approach is to use very short-lived certificates — typically 1 to 24 hours. This eliminates several problems:

No revocation needed — By the time a compromise is detected, the certificate has already expired.
Reduced blast radius — A stolen certificate is only useful for hours, not years.
Forced automation — Manually managing certificates that expire in hours is not viable. Automation is required from day one.

Graceful Rotation

Certificate rotation must handle the transition period where some connections use the old certificate and others use the new one. The standard approach:

Issue the new certificate before the old one expires (overlap period).
Update the trust bundle to accept both old and new CA certificates during rotation.
Start presenting the new certificate.
After all old connections drain, remove the old certificate from the trust bundle.

# Vault PKI — issue a short-lived certificate
vault write pki/issue/my-role \
  common_name="payment-service.internal" \
  ttl="1h" \
  alt_names="payment-service.production.svc.cluster.local"

Debugging mTLS Issues

mTLS failures are notoriously hard to debug because the error messages are vague. Here is a systematic approach:

# Step 1: Verify the server requires client certificates
openssl s_client -connect service:443 </dev/null 2>&1 | grep "Acceptable client certificate"
# If this is empty, the server is not requesting client certs

# Step 2: Verify the client certificate is valid
openssl x509 -in client.crt -noout -text | grep -E "Issuer|Subject|Not After|Extended Key Usage"
# Ensure: Not expired, correct issuer, has clientAuth EKU

# Step 3: Test the full mTLS connection
openssl s_client -connect service:443 \
  -cert client.crt -key client.key -CAfile ca.crt \
  -verify 4 -verify_return_error

# Step 4: In Kubernetes with Istio
istioctl proxy-config secret <pod-name> -n <namespace>
# Shows the active certificate, its expiry, and the trust bundle

# Step 5: Check for common error patterns
# "certificate required" — Client did not send a certificate
# "bad certificate" — Client cert is invalid or not trusted
# "certificate expired" — Self-explanatory
# "unknown ca" — Server does not trust the CA that signed the client cert

Key Points

•Standard TLS only authenticates the server. mTLS adds client authentication, creating a two-way identity verification.
•Service meshes like Istio and Linkerd automate mTLS transparently — application code never touches certificates.
•SPIFFE provides a standardized workload identity framework, and SPIRE is its production-grade implementation.
•Short-lived certificates (hours, not years) reduce the blast radius of key compromise and often eliminate the need for revocation.
•mTLS is the foundation of zero trust networking — every connection must prove identity, regardless of network location.

Key Components

Component	Role
Client Certificate	Proves the identity of the calling service or user, presented during the TLS handshake
Server Certificate	Proves the identity of the server, same as standard TLS — but now both sides authenticate
CertificateRequest Message	Server's TLS handshake message asking the client to present a certificate and specifying acceptable CAs
SPIFFE ID	A standardized identity format (spiffe://trust-domain/workload) embedded in X.509 SVIDs for workload identity
Certificate Rotation	Automated replacement of expiring certificates without service restarts, critical at microservice scale

When to Use

Tool Comparison

Tool	Type	Best For	Scale
Istio	Open Source	Full service mesh with automatic mTLS, traffic management, and observability in Kubernetes	Enterprise
Linkerd	Open Source	Lightweight service mesh focused on simplicity, automatic mTLS with minimal resource overhead	Small-Enterprise
SPIRE	Open Source	Standalone workload identity and certificate issuance without requiring a full service mesh	Small-Enterprise
HashiCorp Vault PKI	Open Source	Private CA with dynamic certificate issuance, fine-grained policies, and multi-cloud support	Enterprise

Debug Checklist

Verify the client certificate is being sent: openssl s_client -connect host:443 -cert client.crt -key client.key.
Check the server's CertificateRequest to see which CAs it trusts: openssl s_client -connect host:443 -state 2>&1 | grep -A5 'Acceptable client certificate CA names'.
Confirm certificate validity and expiry on both sides: openssl x509 -in cert.pem -noout -dates -subject.
In Istio, check mTLS status: istioctl authn tls-check <pod> — shows if mTLS is STRICT or PERMISSIVE.
Look for 'certificate required' or 'bad certificate' errors in server logs — these indicate the client is not presenting a valid cert.

Common Mistakes

Implementing mTLS at the application level instead of using a sidecar proxy or service mesh, creating massive maintenance burden.
Using long-lived client certificates (years) that become impossible to rotate without coordinated downtime.
Not validating the full certificate chain on both sides — just checking that a certificate exists is not enough.
Forgetting to handle certificate rotation gracefully, causing connection drops when certs are renewed.
Hardcoding trust anchors instead of loading them from a dynamic trust bundle that can be updated without redeployment.

Real World Usage

•Google's BeyondCorp uses mTLS for every internal service-to-service call — network location grants zero implicit trust.
•Netflix uses mTLS across all microservices, with certificates issued by their internal CA and rotated every few hours.
•Uber's microservice platform uses SPIFFE-based identity with mTLS for all inter-service communication.
•Stripe uses mTLS for all internal APIs, with automatic certificate provisioning and rotation via their platform team.
•Kubernetes API server supports mTLS for kubelet authentication, ensuring only authorized nodes join the cluster.

RFCs & Specs

RFC 8446 — TLS 1.3 (Client Authentication)RFC 5246 — TLS 1.2 (Client Certificate)SPIFFE Specification — Secure Production Identity Framework for EveryoneRFC 5280 — X.509 Certificate Profile

The Problem

Mental Model

Architecture Diagram

How It Works

The mTLS Handshake

Why Not Just Use API Keys?

SPIFFE and Workload Identity

SPIFFE IDs

SPIFFE Verifiable Identity Documents (SVIDs)

SPIRE: The SPIFFE Runtime Environment

Service Mesh mTLS

Istio mTLS

Linkerd mTLS

Certificate Rotation at Scale

The Short-Lived Certificate Pattern

Graceful Rotation

Debugging mTLS Issues

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics

The Problem

Mental Model

Architecture Diagram

How It Works

The mTLS Handshake

Why Not Just Use API Keys?

SPIFFE and Workload Identity

SPIFFE IDs

SPIFFE Verifiable Identity Documents (SVIDs)

SPIRE: The SPIFFE Runtime Environment

Service Mesh mTLS

Istio mTLS

Linkerd mTLS

Certificate Rotation at Scale

The Short-Lived Certificate Pattern

Graceful Rotation

Debugging mTLS Issues

Key Points

Key Components

When to Use

Tool Comparison

Debug Checklist

Common Mistakes

Real World Usage

RFCs & Specs

Related Topics