Certificates & PKI
PKI is a chain of cryptographic trust — root CAs sign intermediates, intermediates sign leaf certificates, and clients verify the chain to confirm a server is who it claims to be.
The Problem
On the open internet, anyone can claim to be any server. Without a mechanism to verify identity, encrypted connections are meaningless — the client could be encrypting traffic directly to an attacker. PKI solves this by creating a hierarchy of trusted entities that can vouch for a server's identity through cryptographic signatures.
Mental Model
Like a chain of trust in notarization — the government trusts a notary, who stamps the document, and everyone trusts the stamp because they trust the chain. If the notary goes rogue, the government revokes their license without replacing the entire system.
Architecture Diagram
How It Works
Public Key Infrastructure (PKI) is the trust system that makes HTTPS possible. When a browser shows a padlock icon, it is not just encrypting traffic — it has verified the server's identity through a chain of cryptographic signatures that traces back to a handful of root Certificate Authorities embedded in the operating system.
The Certificate Chain
Trust in PKI flows in a hierarchy with three levels:
Root CAs are the trust anchors. There are roughly 150 root CA certificates embedded in browser and OS trust stores. These are self-signed — they vouch for themselves. Root CA private keys are stored in Hardware Security Modules (HSMs) that are air-gapped and physically secured. A compromised root CA would undermine trust for millions of websites, so these keys are used extremely rarely — typically only to sign intermediate CA certificates.
Intermediate CAs are signed by root CAs and do the actual work of issuing certificates. They exist for a critical security reason: if an intermediate CA is compromised, it can be revoked without revoking the root. The root stays safely offline. Most certificates in the wild are issued by intermediates — Let's Encrypt's intermediates (R3, R10, R11) are signed by ISRG Root X1.
Leaf certificates are what servers present to clients. A leaf certificate binds a domain name (like example.com) to a public key. It is signed by an intermediate CA, and includes metadata like validity dates, the subject (domain), and Subject Alternative Names (SANs) for additional domains.
# Inspect a certificate chain from a live server
openssl s_client -connect google.com:443 -showcerts </dev/null 2>/dev/null | \
openssl x509 -noout -subject -issuer -dates
# Output:
# subject=CN = *.google.com
# issuer=C = US, O = Google Trust Services, CN = WR2
# notBefore=Mar 10 08:36:18 2025 GMT
# notAfter=Jun 2 08:36:17 2025 GMT
X.509 Certificate Format
An X.509 certificate is a structured document containing:
| Field | Purpose |
|---|---|
| Version | Almost always v3 |
| Serial Number | Unique identifier from the CA |
| Signature Algorithm | e.g., SHA-256 with RSA or ECDSA |
| Issuer | The CA that signed this certificate |
| Validity (Not Before / Not After) | When the certificate is valid |
| Subject | The entity the certificate identifies (CN, O, etc.) |
| Subject Public Key | The public key bound to this identity |
| Subject Alternative Names (SAN) | Additional DNS names and IPs the cert covers |
| Basic Constraints | Whether this is a CA cert or end-entity |
| Key Usage / Extended Key Usage | What the key can be used for (server auth, client auth, etc.) |
# Generate a private key and CSR (Certificate Signing Request)
openssl req -new -newkey rsa:2048 -nodes \
-keyout server.key -out server.csr \
-subj "/CN=example.com"
# View the full certificate details
openssl x509 -in cert.pem -noout -text
# Check SANs specifically
openssl x509 -in cert.pem -noout -ext subjectAltName
The ACME Protocol and Let's Encrypt
Before Let's Encrypt, getting a certificate meant paying a CA, generating a CSR, waiting for manual validation, and manually installing the certificate. Renewal was a calendar reminder that everyone forgot.
The ACME (Automatic Certificate Management Environment) protocol, defined in RFC 8555, automated this entirely. Here is the flow:
- Account creation — Client registers with the ACME server using a public key.
- Order placement — Client requests a certificate for specific domain(s).
- Challenge issuance — Server provides challenges to prove domain control:
- HTTP-01: Place a file at
http://domain/.well-known/acme-challenge/<token>(simplest, most common) - DNS-01: Create a
_acme-challenge.domainTXT record (required for wildcards) - TLS-ALPN-01: Respond on port 443 with a special self-signed certificate
- HTTP-01: Place a file at
- Validation — ACME server verifies the challenge from multiple vantage points.
- Certificate issuance — Server signs and returns the certificate.
- Renewal — Repeat before expiry (Let's Encrypt certs expire in 90 days).
# Issue a certificate with certbot (HTTP-01 challenge)
sudo certbot certonly --standalone -d example.com
# Issue a wildcard certificate (DNS-01 challenge)
sudo certbot certonly --manual --preferred-challenges dns -d "*.example.com"
# Auto-renew all certificates
sudo certbot renew --dry-run
Certificate Revocation
When a private key is compromised, the certificate must be revoked before it expires. There are two mechanisms:
CRL (Certificate Revocation Lists) — The CA publishes a signed list of revoked certificate serial numbers. Clients download the CRL and check if the certificate is on it. Problem: CRLs can be huge (millions of entries) and clients must download the entire list. Most browsers stopped checking CRLs years ago.
OCSP (Online Certificate Status Protocol) — The client asks the CA's OCSP responder "Is certificate serial number X revoked?" and gets a signed yes/no response. This is more efficient but adds latency and creates a privacy issue (the CA knows which sites each client visits).
OCSP Stapling — The server periodically fetches its own OCSP response and "staples" it to the TLS handshake. The client gets the revocation status without contacting the CA. This is the recommended approach.
# Check OCSP status of a certificate
openssl ocsp -issuer intermediate.pem -cert server.pem \
-url http://ocsp.example.com -resp_text
# Verify OCSP stapling is working
openssl s_client -connect example.com:443 -status </dev/null 2>&1 | grep "OCSP Response"
Certificate Transparency
Certificate Transparency (CT) is a system of publicly auditable, append-only logs that record every certificate issued by participating CAs. It was created after several high-profile CA compromises where rogue certificates were issued for domains like google.com.
How it works: When a CA issues a certificate, it submits the certificate to multiple CT logs and receives Signed Certificate Timestamps (SCTs). These SCTs are embedded in the certificate or delivered during the TLS handshake. Browsers (Chrome requires it) verify that SCTs are present and valid.
CT logs are searchable — every certificate ever issued for a given domain is visible:
# Search CT logs for certificates issued for a domain
curl -s "https://crt.sh/?q=example.com&output=json" | jq '.[0:5] | .[] | {id, issuer_name, not_before, not_after}'
This is invaluable for security teams — if someone issues a certificate for a domain without authorization, CT logs will reveal it.
Certificate Lifecycle in Production
Managing certificates at scale requires automation. Here are the patterns used by mature organizations:
Public-facing services: Use AWS ACM or Let's Encrypt with cert-manager. ACM is zero-effort for AWS resources — it auto-provisions and auto-renews certificates for ALBs, CloudFront distributions, and API Gateways. For Kubernetes, cert-manager watches Ingress resources and handles the entire lifecycle.
Internal services: Use a private CA. HashiCorp Vault's PKI secrets engine or AWS Private CA can issue short-lived certificates (hours to days) for internal service-to-service communication. Short-lived certificates reduce the blast radius of compromise and often eliminate the need for revocation entirely — by the time an attacker could use a stolen certificate, it has already expired.
Certificate monitoring: Set up alerts for certificates expiring within 30, 14, and 7 days. Tools like Prometheus with the blackbox_exporter can probe TLS endpoints and expose ssl_cert_not_after as a metric for Grafana dashboards.
# cert-manager Certificate resource for Kubernetes
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: example-com-tls
spec:
secretName: example-com-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- example.com
- "*.example.com"
renewBefore: 360h # Renew 15 days before expiry
The biggest operational risk with certificates is not the cryptography — it is expiry. More production outages are caused by forgotten certificate renewals than by any cryptographic attack. Automate everything.
Key Points
- •Browsers and operating systems ship with ~150 root CA certificates that form the foundation of internet trust.
- •Intermediate CAs exist so root keys can stay offline in HSMs — if an intermediate is compromised, only it is revoked, not the root.
- •Let's Encrypt issues over 3 million certificates per day using the automated ACME protocol, making HTTPS free and ubiquitous.
- •Certificate Transparency (CT) logs have caught multiple CA misissuance incidents, including the Symantec distrust event.
- •Certificate pinning was deprecated by Chrome because it caused more outages than it prevented — use CT logs instead.
Key Components
| Component | Role |
|---|---|
| Root Certificate Authority | Self-signed trust anchor embedded in operating systems and browsers, signs intermediate CA certificates |
| Intermediate CA | Issues leaf certificates on behalf of the root CA, allowing the root key to stay offline and protected |
| Leaf Certificate | The end-entity certificate presented by a server, containing its public key and domain name(s) |
| Certificate Chain | Ordered sequence from leaf to root that a client traverses to verify trust, each cert signed by the one above |
| Certificate Transparency Logs | Publicly auditable append-only logs of all issued certificates, preventing CAs from issuing rogue certs undetected |
When to Use
Every public-facing service needs a certificate from a publicly trusted CA. Internal services should use a private CA (Vault, AWS Private CA) with shorter-lived certificates. Automate issuance and renewal — manual certificate management is a ticking time bomb.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Let's Encrypt | Open Source | Free, automated DV certificates for public-facing domains | Small-Enterprise |
| DigiCert | Commercial | EV and OV certificates with SLA-backed issuance and support | Enterprise |
| AWS ACM | Managed | Auto-provisioned and auto-renewed certificates for AWS resources (ALB, CloudFront, API Gateway) | Enterprise |
| cert-manager | Open Source | Automated certificate lifecycle management in Kubernetes clusters | Small-Enterprise |
Debug Checklist
- Run openssl s_client -connect host:443 -showcerts to see the full certificate chain the server presents.
- Check certificate expiry: openssl x509 -in cert.pem -noout -dates.
- Verify the chain: openssl verify -CAfile ca-bundle.crt -untrusted intermediate.crt leaf.crt.
- Test with SSL Labs (ssllabs.com/ssltest) for a full certificate and chain analysis.
- Check Certificate Transparency logs at crt.sh to see all certificates issued for a given domain.
Common Mistakes
- Forgetting to include intermediate certificates in the server config, causing failures in non-browser clients like curl and mobile apps.
- Letting certificates expire in production because nobody set up automated renewal. This causes full outages with no graceful degradation.
- Using self-signed certificates in production without proper trust distribution — every client must explicitly trust the CA.
- Generating RSA keys smaller than 2048 bits. Anything below this is considered insecure and rejected by modern browsers.
- Storing private keys in plaintext on disk or in version control. Use HSMs, Vault, or at minimum encrypted file systems.
Real World Usage
- •Let's Encrypt automates certificate issuance for over 300 million websites using the ACME protocol with DNS or HTTP challenges.
- •AWS ACM manages certificate renewal automatically — engineers provision an ALB and never touch a certificate manually.
- •Google's Certificate Transparency initiative requires all publicly trusted certificates to be logged, catching misissued certs.
- •Kubernetes cert-manager watches Ingress resources and automatically provisions and renews TLS certificates from ACME issuers.
- •Large enterprises like banks use private CAs (via Vault or AWS Private CA) for internal service-to-service encryption.