Certificates from
First Principles

A treatise on public-key cryptography, X.509 certificates,
Let’s Encrypt, Kubernetes PKI, and cluster bootstrapping

— ✼ —

Part One

Cryptographic Foundations

The Adversary

Before certificates make any sense, you need to see the internet for what it actually is: a postcard system. Every packet you send bounces through dozens of routers, switches, and cables owned by people you don’t know and have zero reason to trust. Your ISP can read your traffic. The guy running the coffee-shop WiFi can read your traffic. A government with a fiber tap can read your traffic. This isn’t paranoia — it’s just how networking works. Data moves through shared infrastructure in plaintext unless you go out of your way to prevent it.

In 2013, the Snowden disclosures revealed that the NSA’s MUSCULAR program was tapping the unencrypted links between Google’s data centers. Google engineers were furious — and immediately encrypted all inter-datacenter traffic. The threat model isn’t hypothetical.

So you have two problems:

Confidentiality. How do you stop eavesdroppers from reading your data? Encryption — scramble the message so only the intended recipient can unscramble it.

Authentication. Even if you encrypt, how do you know you’re encrypting to the right person? An attacker could intercept your connection, pretend to be your bank, and you’d happily encrypt your password and hand it right to them. This is a man-in-the-middle attack, and encryption alone does nothing to stop it. The server needs a way to prove it is who it claims to be.

Authentication is the harder problem by far. Encryption is “just math” — well-understood algorithms that work the same way every time. Authentication requires trust infrastructure: someone, somewhere, has to vouch for identities. That infrastructure is called PKI (Public Key Infrastructure), and certificates are its documents.

Certificates don’t do the encryption themselves. They solve the authentication half — they let your browser verify that the public key it just received actually belongs to google.com and not to some guy in a hoodie. But to understand certificates, you first need to understand the crypto primitives they’re built on.

Tidbit — Firesheep

In 2010, a Firefox extension called Firesheep made the lack of HTTPS viscerally real. It let anyone on a coffee-shop WiFi click a button and hijack other people’s Facebook and Twitter sessions — no hacking skills required. Over a million people downloaded it in its first week. The resulting panic was one of the catalysts that pushed major sites to adopt HTTPS by default.

Symmetric Encryption

The simplest kind of encryption: both parties share the same secret key. Sender encrypts with it, receiver decrypts with it. A lockbox with two identical keys.

Modern symmetric ciphers like AES-256-GCM are fast — billions of operations per second on any CPU with hardware AES-NI instructions (which is basically all of them now). And they’re effectively unbreakable when used correctly. AES-256 has a keyspace of 2²⁵⁶ possible keys. If every atom in the observable universe were a computer trying a billion keys per second, it would take longer than the age of the universe to try them all. The math is just comically in your favor.

Why GCM? AES is a block cipher — it encrypts 16 bytes at a time. You need a mode of operation to handle messages longer than 16 bytes. GCM (Galois/Counter Mode) is an AEAD (Authenticated Encryption with Associated Data): it encrypts and authenticates in one pass. If anyone tampers with even a single bit of the ciphertext, decryption fails entirely rather than producing corrupted plaintext.

Symmetric encryption — one key, two roles

flowchart LR
    A["ALICE<br/><i>plaintext 'hello bob'</i>"] --> E["AES-256-GCM<br/>ENCRYPT 🔑"]
    E --> C["a3f8c1...9e2b<br/><i>ciphertext</i>"]
    C --> D["AES-256-GCM<br/>DECRYPT 🔑"]
    D --> B["BOB<br/><i>plaintext 'hello bob'</i>"]
    E -.-|"same shared secret key"| D

But there’s a catch, and it’s a brutal one: how do you get the shared key to both parties in the first place? If Alice wants to talk securely to Bob, she needs to get the key to him somehow. Send it over the network? An eavesdropper grabs it. If she could already communicate securely with Bob, she wouldn’t need the key in the first place. You’re going in circles.

For centuries, this meant encryption required physical key exchange — diplomatic couriers, codebooks handed over in person, sealed envelopes. Fine for embassies. Useless when you want to buy something from a website you’ve never visited before.

Tidbit — The AES Competition

AES wasn’t designed in a back room. In 1997, NIST held a public competition to replace the aging DES cipher. Fifteen algorithms from teams worldwide were submitted. After three years of public cryptanalysis, Rijndael (by Belgian cryptographers Joan Daemen and Vincent Rijmen) won. The open process was deliberate — a cipher hiding a backdoor would be caught by the global community.

The Key Exchange Problem

In 1976, Whitfield Diffie and Martin Hellman published “New Directions in Cryptography.” The paper proposed something that sounded flat-out impossible: two strangers could agree on a shared secret over a public channel, even if an eavesdropper heard every single word of their conversation.

The mathematical equivalent of the paint-mixing analogy is modular exponentiation or elliptic curve point multiplication — easy forward, computationally infeasible to reverse.

Diffie-Hellman — the paint-mixing analogy

sequenceDiagram
    participant Alice
    participant Bob
    Note over Alice,Bob: Public parameters: g, p (shared openly)
    Note left of Alice: secret a (private)
    Note right of Bob: secret b (private)
    Note left of Alice: Compute A = g^a mod p
    Note right of Bob: Compute B = g^b mod p
    Alice->>Bob: sends A (public value)
    Bob->>Alice: sends B (public value)
    Note over Alice,Bob: 👁 Eve sees A and B — but cannot compute the secret
    Note left of Alice: s = B^a mod p
    Note right of Bob: s = A^b mod p
    Note over Alice,Bob: ✅ SAME SHARED SECRET

This is the Diffie-Hellman key exchange. Modern TLS uses ECDHE — Elliptic Curve Diffie-Hellman Ephemeral — which does the same thing with smaller numbers and faster math.

Pay attention to the “ephemeral” part. Both sides generate new, throwaway key pairs for every single session. Even if a server’s long-term private key is compromised years later, an attacker who recorded past traffic still can’t decrypt it — the ephemeral keys are long gone, never written to disk, never reused. This property is called forward secrecy, and it’s why TLS 1.3 mandates ECDHE and dropped support for static RSA key exchange entirely.

Why forward secrecy matters politically: Without it, a state actor can record all encrypted traffic today, steal a server’s private key next year, and decrypt everything retroactively. With forward secrecy, recorded ciphertext is permanently useless.

But Diffie-Hellman only solves half the problem. You get a shared secret — great — but you have no idea who you derived it with. A man-in-the-middle could do DH with Alice, separately do DH with Bob, and sit in between relaying (and reading) traffic in both directions. Neither Alice nor Bob would notice. To prevent this, you need authentication — and that requires asymmetric cryptography used in a different way.

Tidbit — The Secret History

Diffie and Hellman weren’t actually first. In 1970 — six years earlier — James Ellis at Britain’s GCHQ independently discovered public-key cryptography. His colleague Clifford Cocks then invented what we now call RSA, also years before Rivest, Shamir, and Adleman. But it was all classified. The GCHQ work wasn’t declassified until 1997. Ellis died a month before the public announcement.

Asymmetric Cryptography

Asymmetric (public-key) cryptography flips the model. Instead of one shared key, you get a key pair: a public key you hand out to anyone who wants it, and a private key you never let anyone touch. The two keys are linked by a trapdoor function — a computation that’s cheap to do in one direction and essentially impossible to reverse.

RSA

The classic. Named after Rivest, Shamir, and Adleman (1977). The trapdoor is integer factorization:

Pick two large random primes, p and q (each 1024+ bits).
Compute n = p × q. This is your modulus — it’s public.
Compute φ(n) = (p−1)(q−1). This requires knowing p and q.
Choose a public exponent e (commonly 65537). Compute the private exponent d such that ed ≡ 1 (mod φ(n)).
Public key: (n, e). Private key: (n, d).

The whole thing rests on one asymmetry: multiplying two 1024-bit primes takes microseconds. Factoring their product — given only n, a 2048-bit number — is something nobody on earth knows how to do efficiently. Not slow. Infeasible.

Why RSA is fading: RSA keys need to be enormous — 2048 bits minimum, 4096 for comfort. ECDSA and Ed25519 get equivalent security with 256-bit keys. That’s not a minor improvement; it’s a 10x reduction in key size, which means smaller certificates, faster handshakes, and less bandwidth on every connection. Most new systems default to EC keys. RSA sticks around mainly for compatibility.

Elliptic Curves (ECDSA, Ed25519)

Instead of factoring, EC crypto relies on the Elliptic Curve Discrete Logarithm Problem. You have a curve, a base point G, and a random integer k (your private key). Your public key is Q = kG. Given G and Q, recovering k is computationally infeasible. The payoff: 256-bit EC keys give you the same security as 3072-bit RSA keys, which means smaller certs, faster handshakes, and less bandwidth.

So what can you actually do with a key pair?

Encrypt: Anyone can encrypt with your public key. Only your private key can decrypt.
Sign: You can create a digital signature with your private key. Anyone with your public key can verify it.

For certificates, signing is the operation that matters. Encryption is a nice trick, but the entire PKI system is built on signatures.

Tidbit — The Quantum Threat

Everything above has a ticking clock. Shor’s algorithm, run on a sufficiently powerful quantum computer, can factor large integers and solve discrete logarithms in polynomial time — breaking both RSA and elliptic curve crypto. “Harvest now, decrypt later” attacks are already a concern. NIST finalized its first post-quantum cryptography standards in 2024 (ML-KEM, ML-DSA, SLH-DSA). The next generation of certificates will use lattice-based math.

• • •

Digital Signatures

A digital signature answers two questions at once: who produced a piece of data, and has it been tampered with since?

Why hash first? Asymmetric operations are ~1000× slower than symmetric crypto. You don’t want to sign a 500MB file directly. Instead, you hash it to a 32-byte digest (SHA-256) and sign that. The hash is a faithful fingerprint — change one bit of the file, and the hash changes unpredictably.

Hash the message

Run the data through SHA-256. This produces a fixed-size 32-byte digest. It’s preimage-resistant (can’t reverse it) and collision-resistant (can’t find two inputs with the same hash).

Sign the hash

Encrypt the digest with your private key. The result is the signature — a blob that could only have been produced by someone possessing that private key.

Transmit message + signature

Send the original message, the signature, and your public key (or a certificate containing it).

Verify

The receiver independently hashes the message, decrypts the signature with your public key, and compares. If the hashes match, the signature is valid.

Digital signature — sign and verify

flowchart TB
    subgraph sign ["✏️ SIGNING"]
    direction LR
    M1["Message"] --> H1["SHA-256"] --> D1["32-byte Hash"] --> S1["Sign with PRIVATE KEY"] --> SIG["Signature ✍️"]
    end
    subgraph verify ["✅ VERIFYING"]
    direction LR
    M2["Message"] --> H2["SHA-256"] --> D2["Hash"]
    SIG2["Signature"] --> V["Decrypt with PUBLIC KEY"] --> D3["Hash"]
    D2 --> CMP{"Match?"}
    D3 --> CMP
    end

This same mechanism shows up everywhere: TLS, code signing, JWTs, git commits, package managers, and — most relevant to us — certificates themselves.

When hash functions break: MD5 was standard until 2004, when researchers found practical collision attacks. SHA-1 fell in 2017 — Google’s “SHAttered” attack produced two different PDFs with the same SHA-1 hash. Both are now broken for signatures. This is why SHA-256 is everywhere.

Certificates

With all that machinery in place, a certificate turns out to be a pretty simple thing.

A certificate is a signed statement: “I, the issuer, vouch that this public key belongs to this identity.”

A certificate binds a public key to an identity, and a trusted third party’s signature is what makes that binding worth anything. The standard format is X.509v3, and here’s what’s inside one:

Field	Purpose	Why it matters
`Subject`	Who this cert identifies	For web: the domain. For K8s: the component identity.
`SANs`	Additional identities	Modern TLS uses SANs, not Subject CN, for hostname verification. A cert can cover multiple domains or IPs.
`Issuer`	Who signed this cert	Points up the chain of trust. If the issuer is trusted and the signature is valid, the cert is trusted.
`Subject Public Key`	The key being certified	The actual payload. The whole point of the cert is to vouch for this key.
`Validity Period`	Not Before / Not After	Limits exposure if a key is compromised. Let’s Encrypt: 90 days. K8s: 1 year.
`Key Usage`	Allowed operations	`Digital Signature`, `Key Encipherment`, `Cert Sign`. A leaf cert must NOT have `Cert Sign`.
`Basic Constraints`	Is this a CA?	`CA:TRUE` = can sign other certs. `CA:FALSE` = leaf. Critical security boundary.
`Signature`	Issuer’s digital signature	The proof. Hash all fields, sign with issuer’s private key.

Why X.509? It originated from the X.500 directory standard — an OSI-era attempt at a global directory that mostly failed. The format survived because it was already in use by the time the web needed it. It’s overly complex — ASN.1 encoding, DER vs PEM — but too entrenched to replace.

Anyone can create a certificate claiming anything. You could generate one right now saying “this key belongs to google.com.” Nothing stops you. What makes a certificate trustworthy isn’t its content — it’s the signature on it. And a signature is only meaningful if you trust whoever signed it. Which brings us to the obvious question: who do you trust, and why?

Tidbit — Certificate Transparency

Since 2018, Chrome requires all publicly-trusted certificates to be logged in Certificate Transparency (CT) logs — public, append-only, cryptographically auditable ledgers. Anyone can monitor them. If a CA issues a cert for google.com that Google didn’t request, Google’s monitoring catches it within minutes. CT has already exposed mis-issuances by Symantec, WoSign, and others.

The Chain of Trust

If you need a trusted third party to sign your cert, who signs their cert? And who signs that one? It’s turtles all the way down — until it isn’t. The chain stops at root Certificate Authorities.

A root CA is a certificate that signs itself. The issuer field points to itself. Yes, that’s circular. The reason you trust it anyway is that Apple, Microsoft, or Mozilla has pre-installed it into your operating system’s trust store — a curated list of roughly 150 root certificates that your machine trusts on sight, no questions asked.

Root store governance is serious. Getting accepted into Apple’s or Mozilla’s root program requires annual WebTrust audits, compliance with CA/Browser Forum Baseline Requirements, and financial stability assessments. Violate the rules: distrusted. See Symantec (2017) and CNNIC (2015).

In practice, root CAs don’t directly sign your server’s certificate. The chain has three levels:

The three-level chain

Root CA

ISRG Root X1 (self-signed)

Private key lives in an HSM inside a locked cage inside a secure facility, powered on a few times per year to sign intermediate CA certificates. Air-gapped. Ceremony-based access with multiple key holders.

signs (rarely, with ceremony)

Intermediate CA

Let’s Encrypt R10

Online, handles the day-to-day signing. If compromised, it can be revoked and replaced without disturbing the root — the root stays safe in its vault and signs a new intermediate.

signs (automated, per-request)

Leaf Certificate

example.com

CA:FALSE — cannot sign other certs. Short-lived (90 days with Let’s Encrypt). This is what your server presents during the TLS handshake.

Why intermediates? The root CA’s private key is the single point of trust. If it’s compromised, every certificate in the chain is suspect, and there’s no fix short of replacing the root in every device’s trust store worldwide — billions of devices. So root keys stay offline. The intermediates do the daily work. If one gets compromised, the root signs a replacement, the old one gets revoked, and the damage stays contained.

Revocation is messy. CRLs (lists of revoked serials — get huge, stale). OCSP (real-time check — adds latency, fails-open). OCSP Stapling (server pre-fetches status — best option, but incomplete adoption). Let’s Encrypt’s strategy: make certs so short-lived that revocation matters less.

Verification in practice

Server sends its leaf cert + intermediate cert (root is omitted — you already have it locally).
Check the leaf’s signature using the intermediate’s public key. ✓
Check the intermediate’s signature using the root’s public key (from your trust store). ✓
Root is trusted. Chain complete. Connection trusted.
Also: validity dates, SANs match hostname, key usage is appropriate, not revoked.

Tidbit — The DigiNotar Disaster

In 2011, attackers compromised DigiNotar, a Dutch CA, and issued fraudulent certificates for over 500 domains including *.google.com. The fake certs were used to intercept Gmail traffic of Iranian dissidents. When discovered, every browser vendor revoked DigiNotar’s root. The company filed for bankruptcy within a month.

The TLS Handshake

This is where it all comes together. Every primitive we’ve covered — symmetric encryption, key exchange, signatures, certificates, chain of trust — gets composed into a single protocol that runs every time you open a webpage. TLS 1.3 is the current version.

TLS 1.3 removed a graveyard: RSA key exchange (no forward secrecy), CBC ciphers (BEAST, Lucky 13), compression (CRIME), renegotiation. TLS 1.2 needed 2 round-trips; 1.3 does it in 1. Every removed feature had a CVE history.

TLS 1.3 handshake — 1 round-trip

sequenceDiagram
    participant C as Client (browser)
    participant S as Server (website)
    Note over C,S: 🔑 KEY EXCHANGE
    C->>S: ClientHello + ECDHE key share + cipher suites
    S->>C: ServerHello + ECDHE key share
    Note over C,S: 🔒 ENCRYPTED FROM HERE
    Note over C,S: 🛡️ AUTHENTICATION
    S->>C: Certificate + CertificateVerify + Finished
    C->>S: Finished
    Note over C,S: 📦 APPLICATION DATA (AES-256-GCM)
    C-->>S: encrypted data
    S-->>C: encrypted data

Notice how everything layers: ECDHE handles key exchange (forward secrecy), certificates + signatures handle authentication, AES-GCM handles bulk encryption. Each primitive does the one thing it’s good at. After this handshake, everything you see in the browser — the padlock icon, the “Connection is secure” dialog — is the visible result of this process completing successfully.

CertificateVerify is a subtle but critical step: the server signs the entire handshake transcript with its private key. This proves the server actually possesses the private key matching the certificate, not just the certificate file itself. Without this step, anyone who got a copy of the .crt file (which is public information) could impersonate the server.

Tidbit — 0-RTT Resumption

TLS 1.3 has a trick: 0-RTT. If a client has connected before, it can send data in the very first message — zero round trips. The catch: 0-RTT data is replayable. An attacker can capture and resend it. So 0-RTT should only be used for idempotent requests (GET, not POST). It’s a deliberate security/performance tradeoff.

HSTS — forcing HTTPS

TLS only protects you if you actually use it. When you type example.com into your browser, the first request often goes over plain HTTP. An attacker on your network can intercept that initial request and downgrade you to HTTP permanently — you’d never notice because you never had the padlock in the first place. This is called an SSL stripping attack.

HSTS (HTTP Strict Transport Security) fixes this. The server sends a header: Strict-Transport-Security: max-age=31536000; includeSubDomains. Your browser remembers this and refuses to connect over plain HTTP for an entire year. Even better, sites can submit to the HSTS preload list — hardcoded into browsers so the very first connection is forced to HTTPS, with no window for an attacker to exploit.

Certificate pinning was another defense — browsers would remember which CA issued a site’s cert and reject any other CA. Sounded great in theory, but was a nightmare in practice: pin the wrong cert and you lock your own users out. Chrome killed HTTP Public Key Pinning (HPKP) in 2018. Certificate Transparency turned out to be the better answer to the same problem.

Let’s Encrypt & ACME

Before November 2015, getting a TLS certificate was a chore. Pay $50–$300 per year. Generate a CSR by hand. Email it to the CA. Wait a few days. Receive the cert via email (yes, email). Install it. Set a calendar reminder to do it all again next year. Most small sites just didn’t bother — HTTPS was a luxury for companies that could afford the overhead. Let’s Encrypt changed the equation by making certificates free, automated, and open.

Why 90-day certs? Three reasons. (1) Limiting damage: compromised key exposure is at most 90 days. (2) Forcing automation: you can’t manually manage 90-day certs, so you must automate. (3) Agility: if a vulnerability is found, the fleet rotates within 90 days with zero human intervention.

How ACME works

The idea is dead simple: prove you control the domain, and the CA will sign your cert. No identity verification, no phone calls, no paperwork. Let’s Encrypt only does Domain Validation (DV) — it doesn’t care who you are, only that you can answer challenges for the domain you’re claiming.

ACME protocol — domain validation and certificate issuance

sequenceDiagram
    participant AC as ACME Client (certbot/caddy)
    participant LE as Let's Encrypt (ACME CA)
    participant YS as Your Server (example.com)
    AC->>LE: 1. Order: cert for example.com
    LE->>AC: 2. Challenge: put token at /.well-known/...
    AC-->>YS: 3. Places token on server
    LE->>YS: 4. HTTP GET token
    YS->>LE: token ✓
    AC->>LE: 5. CSR (public key + domain)
    LE->>AC: 6. 🔒 Signed certificate

Let’s Encrypt’s chain: Root = ISRG Root X1 (RSA 4096) and ISRG Root X2 (ECDSA P-384). Intermediates = R10/R11 (RSA) and E5/E6 (ECDSA). They’ve issued billions of certificates, securing over 360 million websites.

DNS-01 is the other challenge type: create a TXT record at _acme-challenge.example.com. It works for wildcard certs and doesn’t need any open ports, but you need API access to your DNS provider. Tools like Caddy and Traefik handle ACME natively — you point them at a domain and they do the rest, including renewal. If you’re doing local development and just need a trusted cert for localhost, mkcert creates a local CA and installs it in your trust store in one command.

— ◆ —

That covers web PKI end-to-end. Now let’s see what happens when Kubernetes takes these same primitives and runs them inside a cluster.

— ☵ —

Part Two

Kubernetes PKI

Why Kubernetes Needs Its Own PKI

On the web, TLS is one-directional: the server proves its identity, the client stays anonymous. Your browser doesn’t present a certificate to Google. Kubernetes flips this — every component authenticates to every other component using mutual TLS (mTLS). Both sides present certificates. Both sides verify the other.

Why mTLS, not tokens? Defense in depth. A bearer token can be stolen and replayed from anywhere. A client certificate proves the holder possesses a private key that never left the node. K8s also supports token auth, but cert-based auth is primary for infrastructure components because it provides identity at the transport layer.

Here’s what it looks like in practice: the kubelet on worker-3 wants to report pod status. It connects to the API server and presents a client cert with CN=system:node:worker-3, O=system:nodes. The API server checks the signature against the cluster CA, pulls out the identity, runs it through RBAC. Meanwhile, the kubelet is also checking the API server’s cert. Neither side trusts the other until the math checks out.

Kubernetes cluster — certificate-authenticated connections

flowchart TB
    KCT["kubectl"] -->|"kubeconfig cert"| API
    subgraph CP ["CONTROL PLANE"]
        CM["controller-manager"] -->|"client cert"| API
        SCH["scheduler"] -->|"client cert"| API
        API["kube-apiserver :6443 HTTPS"]
        API -->|"mTLS etcd CA"| ETCD["etcd — separate CA"]
        API -->|"front-proxy"| MS["metrics-server"]
    end
    subgraph WN ["WORKER NODES"]
        W1["worker-1 kubelet\nCN=system:node:worker-1"]
        W2["worker-2 kubelet\nCN=system:node:worker-2"]
        W3["worker-3 kubelet\nCN=system:node:worker-3"]
    end
    W1 -->|"mTLS cluster CA"| API
    W2 -->|"mTLS cluster CA"| API
    W3 -->|"mTLS cluster CA"| API

Every arrow in that diagram requires at least one certificate, usually two. If you’ve ever looked at /etc/kubernetes/pki/ and wondered why there are so many files in there — this is why. It’s not overengineering. It’s the minimum number of credentials needed for every component to verify every other component.

The Cluster CA

Run kubeadm init and the very first thing it does is generate the cluster CA:

/etc/kubernetes/pki/ca.crt    # Root certificate (public, distributed everywhere)
/etc/kubernetes/pki/ca.key    # Root private key (the crown jewel)

This CA is the root of trust for the entire cluster. Every other K8s certificate is either signed directly by this CA or by a subordinate signed by it. The ca.crt is embedded in every kubeconfig.

If ca.key is stolen: The attacker signs any cert with any subject — create a cert with O=system:masters and get unrestricted cluster-admin access. In production, consider an external CA (HashiCorp Vault), where the root key lives in a hardware-backed secret store and signing happens through an auditable API.

API Server Certificates

The API server sits at the center of everything. Every other component talks to it, and it talks to several of them back. That means it needs certificates for both directions:

Server certificate (incoming connections)

/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/apiserver.key

Presented to anything connecting to the API server. Its SANs must include every reachable name/IP: kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, the node’s hostname and IP, the cluster IP (10.96.0.1), and any load-balancer addresses.

Why so many SANs? TLS hostname verification checks that the hostname matches a SAN. If a pod connects to kubernetes.default.svc but the cert only lists 10.96.0.1, the connection fails. This is the most common source of “x509: certificate is valid for X, not Y” errors.

Client certificate (outgoing to kubelets)

/etc/kubernetes/pki/apiserver-kubelet-client.crt / .key

Used when the API server connects to kubelets (kubectl logs, kubectl exec). Subject: O=system:masters.

Kubelet Certificates

Each kubelet — one per node — needs its own pair of certificates.

Client certificate (kubelet → API server)

The subject IS the kubelet’s RBAC identity:

Subject: CN = system:node:worker-3
         O  = system:nodes

CN identifies the node. O maps to a K8s group. The system:nodes group is bound to the system:node ClusterRole, which scopes kubelet permissions to pods on its own node.

Identity in cert subjects: K8s uses certificate fields directly as the authenticated identity — no separate user database. CN = username, O = groups. Consequence: you can’t revoke access without revoking the cert or waiting for expiry. This is why cert lifetimes and rotation matter so much.

Server certificate (API server → kubelet)

When the API server initiates connections to the kubelet (logs, exec, port-forward), the kubelet presents its server cert with the node’s IP and hostname as SANs.

Node Bootstrapping

Here’s the chicken-and-egg problem: a new node needs a client cert to talk to the API server. But to get one signed, it has to submit a CSR to the API server. Which requires talking to the API server. Which requires a cert.

TLS Bootstrap breaks the loop with a short-lived, low-privilege bootstrap token — just enough access to request a real certificate, and nothing more.

Why not pre-generate certs? You could, for small static clusters. But in a cloud-native world with autoscaling, spot instances, and thousands of nodes churning, you need nodes to self-provision certificates.

TLS Bootstrap — complete node joining sequence

Actor	Action	Why
Phase 1 — Preparation
`admin`	`kubeadm token create` — generates `abcdef.0123456789abcdef`.	Token is limited: 24h expiry, only permission to create CSRs.
`admin`	Provides the node a bootstrap kubeconfig: API server address, cluster CA cert, bootstrap token.	CA cert lets the node verify the API server. `--discovery-token-ca-cert-hash` prevents MITM during bootstrap.
Phase 2 — Initial Contact
`kubelet`	Connects to API server using the bootstrap token. Authenticated as `system:bootstrappers`.	Almost no permissions — only enough to submit a CSR.
Phase 3 — Certificate Request
`kubelet`	Generates a fresh key pair locally. Private key never leaves the node.	API server only receives the public key inside the CSR.
`kubelet`	Submits CertificateSigningRequest: `CN=system:node:<name>, O=system:nodes`, usage: `client auth`.	CSR is a standard K8s resource. Visible with `kubectl get csr`.
Phase 4 — Approval & Signing
`csrapproving`	Policy check: from `system:bootstrappers`? Subject matches `system:node:`? Only `client auth`? → auto-approve*.	A request for `O=system:masters` from a bootstrap token would be rejected.
`csrsigning`	Signs the CSR with the cluster CA key.	This is where `ca.key` is used.
Phase 5 — Normal Operation
`kubelet`	Downloads signed cert, writes it to disk, reconnects as `system:node:<name>`.	Bootstrap token discarded. Full node-level permissions via RBAC.
`kubelet`	With `--rotate-certificates=true`: auto-submits new CSR before expiry. Zero-downtime rotation.	Continuous rotation for the node’s entire lifetime.

kubeadm join wraps all of this. One command: connect with token, verify CA cert by hash, execute the full TLS bootstrap flow, start the kubelet. The ceremony, automated.

The clever bit is the privilege escalation ladder: start with a nearly-useless token, use it to get a real certificate, then throw the token away. At no point does the node hold more privilege than it needs for the current step.

etcd, Front Proxy & Service Accounts

The cluster CA isn’t the only CA in play. Kubernetes also maintains two more CAs and a signing key pair, each deliberately isolated into its own trust domain.

The etcd CA

/etc/kubernetes/pki/etcd/ca.crt      # etcd root CA
/etc/kubernetes/pki/etcd/server.crt  # etcd server cert
/etc/kubernetes/pki/etcd/peer.crt    # etcd-to-etcd replication

etcd stores every piece of cluster state — secrets, RBAC rules, pod specs, everything. Its CA is deliberately separate from the cluster CA. The reason is blast radius.

Why separate? Compromised cluster CA = impersonate K8s components, but can’t impersonate etcd (different CA). Compromised etcd CA = read/write cluster state, but can’t impersonate K8s components. Two independent breaches required for full compromise.

The Front Proxy CA

For the API aggregation layer. When the API server proxies to an extension server (metrics-server), it presents the front-proxy-client cert and passes user identity via headers. The extension server trusts only the front-proxy CA. Third trust domain.

Service Account Key Pair

/etc/kubernetes/pki/sa.key    # Signs JWTs
/etc/kubernetes/pki/sa.pub    # Verifies JWTs

Not X.509. A raw key pair for signing/verifying ServiceAccount tokens (JWTs). Controller-manager signs with sa.key; API server verifies with sa.pub.

Bound tokens (1.22+): Old K8s mounted long-lived, non-expiring JWTs. Since 1.22, the TokenRequest API issues bound tokens: audience-restricted, time-limited (1h default, auto-refreshed), deleted with the pod.

Three independent trust domains

Cluster CA

ca.crt / ca.key
apiserver.crt
apiserver-kubelet-client.crt
kubelet client & server certs
scheduler.conf
controller-manager.conf
admin.conf

etcd CA

etcd/ca.crt / ca.key
etcd/server.crt
etcd/peer.crt
apiserver-etcd-client.crt

Front Proxy CA

front-proxy-ca.crt
front-proxy-client.crt

+ sa.key/sa.pub (JWT, not X.509)

The Complete Certificate Map

Here is every certificate and key file on a kubeadm-provisioned control-plane node:

File	Type	Signed By	Used By	Purpose
Cluster CA Trust Domain
`ca.crt / ca.key`	Root CA	Self-signed	Everything	Cluster root of trust
`apiserver.crt`	Server	Cluster CA	kube-apiserver	TLS for incoming connections
`apiserver-kubelet-client.crt`	Client	Cluster CA	kube-apiserver	API server → kubelets
`kubelet client cert`	Client	Cluster CA	kubelet	Kubelet → API server
`kubelet server cert`	Server	Cluster CA	kubelet	Kubelet HTTPS (port 10250)
`scheduler.conf`	Client	Cluster CA	kube-scheduler	Scheduler → API server
`controller-manager.conf`	Client	Cluster CA	controller-manager	CM → API server
`admin.conf`	Client	Cluster CA	kubectl	Cluster-admin (`O=system:masters`)
etcd CA Trust Domain
`etcd/ca.crt`	Root CA	Self-signed	etcd	Separate root for etcd
`etcd/server.crt`	Server	etcd CA	etcd	Client → etcd TLS
`etcd/peer.crt`	Peer	etcd CA	etcd	etcd ↔ etcd replication
`apiserver-etcd-client.crt`	Client	etcd CA	kube-apiserver	API server → etcd
Front Proxy CA Trust Domain
`front-proxy-ca.crt`	Root CA	Self-signed	Aggregation	API aggregation trust root
`front-proxy-client.crt`	Client	Front Proxy CA	kube-apiserver	Proxying to extension APIs
Service Account Keys (not X.509)
`sa.key / sa.pub`	Key pair	N/A	CM / apiserver	Sign & verify SA JWTs

~14 cert/key pairs + 1 SA key pair on a single control-plane node. In a 3-node HA setup: 30+ certificates.

Lifetimes: CA = 10 years. Components = 1 year. Kubelet auto-rotates (default since 1.19). Monitor with kubeadm certs check-expiration or Prometheus alert on apiserver_client_certificate_expiration_seconds.

Bootstrapping a Cluster

If you’ve ever run kubeadm init and watched it spit out a wall of output, here’s what’s actually happening under the hood. Whether you use kubeadm, kubespray, or build it the hard way following Kelsey Hightower’s guide, the same sequence plays out. Understanding it explains all those files in /etc/kubernetes/pki/.

The kubeadm init sequence

When you run kubeadm init, the following happens in order:

Generate the PKI

Three CAs are created: the cluster CA (ca.crt/ca.key), the etcd CA (etcd/ca.crt/ca.key), and the front-proxy CA (front-proxy-ca.crt/ca.key). Then all component certificates are signed — API server, kubelet client, etcd server/peer, front-proxy client, and the service account key pair.

Generate kubeconfigs

Four kubeconfig files are created embedding client certificates: admin.conf, controller-manager.conf, scheduler.conf, and kubelet.conf. Each contains the cluster CA cert (for verifying the API server) and a client cert (for authenticating to it).

Write static pod manifests

The API server, controller manager, scheduler, and etcd are defined as static pods — YAML manifests written directly to /etc/kubernetes/manifests/. The kubelet on the control-plane node watches this directory and starts them without needing an API server (because the API server doesn’t exist yet).

Start etcd

etcd starts first, presenting its server cert and requiring mTLS from any client. It creates the initial cluster state database. Nothing else can start until etcd is healthy.

Start the API server

The API server starts, connects to etcd (using apiserver-etcd-client.crt), and begins serving on port 6443. It loads the cluster CA to verify incoming client certs and the SA public key to verify JWT tokens.

Start controller manager & scheduler

Both connect to the API server using their respective kubeconfigs. The controller manager also loads ca.key — it needs this to sign CSRs for node bootstrap and to sign SA tokens with sa.key.

Bootstrap token & RBAC

kubeadm creates a bootstrap token, sets up the system:bootstrappers ClusterRoleBindings, and configures the CSR auto-approval rules. The cluster is now ready to accept worker nodes.

Install addons

CoreDNS and kube-proxy are deployed as cluster addons. A CNI plugin (Calico, Cilium, Flannel) must be installed separately — without it, pods cannot communicate across nodes and the cluster is not fully functional.

The static pod trick: Kubernetes needs a kubelet to run pods, but the kubelet needs an API server to get pod specs. Static pods break this by letting the kubelet read manifests from a local directory. The control-plane bootstraps itself this way — the kubelet starts the API server as a pod, then connects to it.

Kubernetes Components & Networking

Before we go further, it’s worth stepping back and looking at all the moving pieces in a running cluster. You need to know what each component does — and how they talk to each other — or the certificate map won’t make much sense.

Control plane components

kube-apiserver is the hub. Everything talks to it — nothing talks directly to anything else (except etcd, which only the API server can reach). It exposes the Kubernetes API over HTTPS on port 6443, authenticates every request via client certs or bearer tokens, and runs it through RBAC.

etcd is the database. A distributed key-value store holding all cluster state: pod specs, service definitions, secrets, configmaps, RBAC policies, the lot. It runs Raft consensus for replication across control-plane nodes. Only the API server connects to it, over a separate CA. If etcd is lost and you don’t have backups, the cluster state is gone. Back up etcd.

kube-controller-manager runs the reconciliation loops. The node controller notices when nodes go dark. The deployment controller manages ReplicaSets. The endpoint controller populates Endpoints. Crucially for our topic, it also runs the CSR signing controller — the thing that actually signs kubelet certificate requests during TLS bootstrap.

kube-scheduler watches for pods with no assigned node and picks one based on resource requests, affinity, taints, and topology constraints. It only writes the nodeName field — the kubelet on that node picks it up from there.

All roads through the API server: The scheduler doesn’t tell the kubelet to run a pod. It writes to the API server. The kubelet watches the API server. This hub-and-spoke model means there’s exactly one component to secure, audit, and rate-limit: the API server.

Node components

kubelet is the node agent. It watches the API server for pods assigned to its node, tells the container runtime to start them, and reports status back. It also exposes its own HTTPS API on port 10250 — that’s how kubectl logs and kubectl exec work (the API server connects to the kubelet, not the other way around). Both its client cert and server cert are managed via TLS bootstrap and auto-rotation.

kube-proxy runs on every node and implements Service networking. When you create a Service, kube-proxy programs the node’s network rules so traffic to the Service’s ClusterIP gets load-balanced across the backing pods. Three modes:

iptables mode (default): Creates iptables rules for DNAT. Fast for small clusters, O(n) rule updates for n services.
IPVS mode: Uses the kernel’s IPVS load balancer. O(1) lookups regardless of service count. Better for large clusters.
nftables mode (1.29+): Uses nftables, the successor to iptables. Atomic rule updates, better performance.

kube-proxy authenticates to the API server via a kubeconfig with a client cert or a ServiceAccount token.

Cilium — replacing kube-proxy with eBPF

Cilium is a CNI plugin that uses eBPF (extended Berkeley Packet Filter) to do networking, security, and observability directly in the Linux kernel. In a lot of production clusters, Cilium replaces kube-proxy entirely.

The difference is architectural. kube-proxy watches the API server for Service/Endpoint changes, then programs iptables or IPVS rules. Every packet hitting a Service IP walks through the iptables chain. Cilium skips all of that:

eBPF programs attached to network hooks: Instead of iptables rules, Cilium compiles eBPF programs that run directly in the kernel at the socket and TC (traffic control) layers. Service load-balancing happens at the socket level — the packet never even gets an iptables chain.
Identity-based security: Cilium assigns each pod a numeric identity based on its labels, not its IP address. Network policies are enforced by identity, which survives pod restarts and IP changes. This is fundamentally more robust than IP-based firewalling.
Hubble observability: Cilium includes Hubble, a network observability platform that gives you flow logs, service maps, and DNS-aware visibility — all powered by eBPF, with near-zero overhead.
Transparent encryption: Cilium can encrypt all pod-to-pod traffic using WireGuard or IPsec, without a service mesh sidecar. Node-to-node tunnels are established automatically.

Why eBPF matters: iptables rules are evaluated linearly. With 10,000 services, that’s 10,000+ rules per packet. eBPF hash-maps give O(1) lookups. In benchmarks, Cilium shows 2–4x throughput improvement over iptables-based kube-proxy at scale. Google’s GKE Dataplane V2 is built on Cilium.

Cilium agents run as a DaemonSet. Each agent authenticates to the API server (via ServiceAccount token or cert) and watches Pods, Services, Endpoints, and CiliumNetworkPolicies. When you deploy Cilium with kubeProxyReplacement=true, you can skip installing kube-proxy entirely.

CoreDNS

CoreDNS is the cluster’s internal DNS server. When a pod looks up my-service.default.svc.cluster.local, CoreDNS resolves it to the Service’s ClusterIP. It watches the API server for Service and Endpoint changes and serves DNS on the cluster DNS IP (typically 10.96.0.10). Every pod’s /etc/resolv.conf points there automatically.

Container runtime (containerd)

containerd actually pulls images and runs containers. The kubelet talks to it over a local Unix socket using the CRI (Container Runtime Interface) protocol. No network involved, so no TLS — just filesystem permissions keeping unauthorized processes from talking to the socket.

Component interactions — every arrow is authenticated

flowchart TB
    KCT["kubectl"] -->|"kubeconfig cert"| API
    CM["controller-manager"] -->|"client cert"| API
    SCH["scheduler"] -->|"client cert"| API
    API["kube-apiserver :6443\nthe only hub"] -->|"mTLS etcd CA"| ETCD["etcd\nstate store"]
    API -->|"front-proxy cert"| MS["metrics-server"]
    API <-->|"mTLS cluster CA"| KL["kubelet :10250\nnode agent"]
    KP["kube-proxy"] -->|"SA token"| API
    CIL["cilium-agent\neBPF networking"] -->|"SA token"| API
    KL -->|"CRI Unix socket"| CTD["containerd"]
    DNS["CoreDNS"] -->|"SA token"| API

Tidbit — The kube-proxy vs. Cilium Decision

For small clusters, kube-proxy in iptables mode works fine. At scale (500+ services), iptables rule updates become a bottleneck — each Service change triggers a full iptables save/restore. Cilium’s eBPF approach scales to tens of thousands of services with constant-time lookups. The tradeoff: Cilium requires a Linux kernel ≥4.19 (ideally 5.10+) and adds operational complexity. Most managed Kubernetes offerings (GKE, EKS) now offer Cilium-based dataplanes as a first-class option.

cert-manager

kubeadm handles control-plane certs, but what about certs your applications need? Your Ingress controller needs TLS certs for your domains. Internal services might need certs for mTLS. You’re not going to SSH into a node and run openssl every 90 days. cert-manager is the answer — it’s become the de facto standard for certificate lifecycle management in Kubernetes.

cert-manager runs as a set of controllers inside the cluster. It watches for Certificate resources, talks to whatever CA you’ve configured, stores the signed certs as Kubernetes Secrets, and renews them before they expire. You define what you want; it handles the rest.

cert-manager is not a CA. It’s a certificate lifecycle controller. It talks to CAs (Let’s Encrypt, Vault, self-signed, Venafi, AWS Private CA) on your behalf. It handles the ACME dance, CSR generation, renewal, and secret distribution — so you don’t.

The resource model

cert-manager introduces four key CRDs:

Issuer / ClusterIssuer — defines where to get certs from. An ACME issuer for Let’s Encrypt, a CA issuer for self-signed, a Vault issuer for HashiCorp Vault. Issuer is namespaced; ClusterIssuer is cluster-wide.
Certificate — declares what cert you need: domain names, duration, renewal window, which issuer to use. cert-manager creates the cert and stores it in a Secret.
CertificateRequest — the internal representation of a CSR. You rarely create these directly.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  secretName: example-com-tls     # cert stored here as a Secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - example.com
    - www.example.com
  duration: 2160h                 # 90 days
  renewBefore: 360h               # renew 15 days before expiry

Apply that YAML and cert-manager takes over: creates an ACME order, solves the HTTP-01 or DNS-01 challenge, submits the CSR to Let’s Encrypt, stores the signed cert and private key in the example-com-tls Secret, and renews it every 75 days. Your Ingress controller picks up the Secret and starts serving TLS. You never think about it again.

Ingress annotations shortcut: For simple cases, you can skip the Certificate resource entirely. Add cert-manager.io/cluster-issuer: letsencrypt-prod to your Ingress annotations, and cert-manager creates the Certificate automatically from the Ingress’s TLS configuration.

Tidbit — The trust-manager companion

cert-manager has a companion project: trust-manager. While cert-manager distributes leaf certificates, trust-manager distributes CA bundles. It ensures that every namespace has an up-to-date trust bundle (ConfigMap) containing the CAs your workloads need to verify connections. Together, they handle both sides of the trust equation: “here’s my cert” and “here’s who I trust.”

Service Mesh PKI

Everything we’ve covered so far secures communication between infrastructure components. But what about the traffic between your actual application pods? By default, that traffic is unencrypted plaintext on the cluster network. If someone compromises a node or taps the network fabric, they can read all of it.

Service meshes fix this. Istio, Linkerd, and Cilium can extend mTLS to every pod-to-pod connection, transparently, without touching your application code.

SPIFFE identities: Service meshes use the SPIFFE standard (Secure Production Identity Framework for Everyone) to assign identities. A SPIFFE ID looks like spiffe://cluster.local/ns/default/sa/payment-service. The cert’s SAN carries this URI. Identity is tied to the Kubernetes ServiceAccount, not the pod IP.

How Istio does it

istiod runs as the mesh’s own CA (or delegates to an external CA like Vault).
Each pod gets a sidecar proxy (Envoy) injected automatically. The sidecar intercepts all inbound/outbound traffic.
On startup, the sidecar requests a short-lived certificate from istiod via SDS (Secret Discovery Service). The cert encodes the pod’s SPIFFE identity.
Every connection between pods is automatically upgraded to mTLS. The sidecars handle the handshake — the application sees plain HTTP.
Certificates are rotated automatically, typically every 24 hours (configurable, default in Istio).

So every pod-to-pod connection ends up encrypted and mutually authenticated, with identity tied to ServiceAccounts and certs that rotate every 24 hours. If an attacker compromises a pod, the cert they steal is valid for at most a day and can only identify as that one specific service.

Linkerd takes a different approach: Instead of Envoy sidecars, it uses its own ultra-lightweight Rust-based proxy (linkerd2-proxy). The cert rotation is even more aggressive — 24 hours by default, configurable down to minutes. Linkerd’s design prioritizes simplicity and low resource overhead over Istio’s feature breadth.

Tidbit — Zero-Trust Networking

Service mesh mTLS is the practical implementation of zero-trust networking inside a cluster. The old model: “the network perimeter is secure, trust everything inside.” The new model: “verify every connection, regardless of source.” The 2020 SolarWinds attack — where attackers moved laterally through trusted internal networks for months — was the definitive proof that perimeter-based trust models fail. mTLS ensures that even if an attacker is inside the cluster, they can’t impersonate other services.

Webhook & Extension Certificates

Kubernetes is extensible, and every extension point that talks over the network needs certificates. The most common case: admission webhooks.

When you create a ValidatingWebhookConfiguration or MutatingWebhookConfiguration, the API server has to make HTTPS calls to your webhook server for every matching request. The webhook needs to serve TLS, and the API server needs to trust its certificate. If either side is misconfigured, object creation starts failing across the cluster.

Why not plain HTTP? Admission webhooks see every object being created, updated, or deleted in the cluster. A MITM on this connection could silently modify resources, inject containers, or escalate privileges. TLS is non-negotiable here.

Three approaches to webhook certs

cert-manager + CA Injector: cert-manager generates the cert. The cainjector component automatically patches the webhook configuration with the correct CA bundle. This is the recommended approach.
Self-managed: Generate a self-signed CA, create a cert, mount it in the webhook pod, and set the caBundle field in the webhook config. Works but requires manual rotation.
Kubernetes API: Use the CertificateSigningRequest API to get the cluster CA to sign your webhook cert. The webhook config can then reference the cluster CA with no caBundle needed.

Webhooks aren’t the only extension point with cert requirements. API aggregation (custom API servers via APIService) needs certs. External admission controllers like OPA Gatekeeper and Kyverno need certs. All of them need rotation. cert-manager is usually the answer.

Tidbit — The caBundle Trap

A common gotcha: you create a webhook, cert-manager generates the cert, everything works. Six months later, cert-manager rotates to a new CA — but the caBundle in your webhook configuration still has the old CA. Suddenly every resource mutation fails because the API server can’t verify the webhook. The fix: always use cert-manager’s cainjector (annotate your webhook with cert-manager.io/inject-ca-from) so the CA bundle updates automatically when the cert rotates.

Troubleshooting Certificate Issues

Certificate errors are some of the most common and most frustrating things you’ll deal with in Kubernetes. The error messages are cryptic, the root causes are varied, and the fix is usually “a cert or CA file is wrong somewhere.” Here’s the field guide.

The error menagerie

Error	Meaning	Fix
`x509: certificate signed by unknown authority`	The verifier doesn’t have the CA that signed this cert in its trust store.	Ensure `--client-ca-file` or `--root-ca-file` points to the correct CA bundle. Check if the CA was rotated.
`x509: certificate is valid for X, not Y`	The hostname you connected to doesn’t match any SAN in the cert.	Regenerate the cert with the missing SAN. For apiserver: `kubeadm init --apiserver-cert-extra-sans=...`
`x509: certificate has expired`	Current time is past the cert’s `Not After` date.	`kubeadm certs renew all`, then restart control-plane components. Check kubelet rotation is enabled.
`tls: bad certificate`	The server rejected the client’s cert (or vice versa). CA mismatch.	Verify both sides trust each other’s CA. Common when etcd CA and cluster CA are confused.
`remote error: tls: internal error`	The remote side crashed during the handshake. Often a misconfigured cert/key pair.	Verify the cert and key match: `openssl x509 -noout -modulus -in cert \| md5` should equal `openssl rsa -noout -modulus -in key \| md5`.
`certificate-authority-data is empty`	kubeconfig is missing the CA cert.	Re-extract from `/etc/kubernetes/pki/ca.crt` and base64-encode into the kubeconfig.

Essential debug commands

# Check all control-plane cert expiry dates at a glance
kubeadm certs check-expiration

# Inspect a specific cert in detail
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

# See what a live server presents (without trusting it)
openssl s_client -connect <api-server>:6443 -showcerts 2>/dev/null | \
  openssl x509 -text -noout

# Verify a cert was signed by a specific CA
openssl verify -CAfile /etc/kubernetes/pki/ca.crt \
  /etc/kubernetes/pki/apiserver.crt

# Check if cert and key match (compare modulus hashes)
diff <(openssl x509 -noout -modulus -in cert.crt | md5) \
     <(openssl rsa -noout -modulus -in cert.key | md5)

# View pending and approved CSRs
kubectl get csr -o wide

# Check cert-manager certificate status
kubectl get certificates -A
kubectl describe certificate <name>

# Decode a cert from a K8s secret
kubectl get secret example-tls -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -text -noout

Monitoring tip: Set a Prometheus alert on apiserver_client_certificate_expiration_seconds < 604800 (7 days) to catch expiring certs before they break your cluster. cert-manager exports its own metrics: certmanager_certificate_expiration_timestamp_seconds.

Tidbit — The 1-Year Cert Cliff

A surprising number of Kubernetes outages are caused by expired certificates. kubeadm component certs expire after 1 year. If you don’t run kubeadm certs renew all (or upgrade, which auto-renews) before the anniversary, the API server stops accepting connections. The cluster is up but unreachable. Entirely preventable with monitoring — yet it catches teams every year, including large-scale production environments.

That’s the full stack. Trapdoor functions at the bottom, chains of trust built on top, TLS handshakes composing them into a protocol, ACME automating the paperwork, and Kubernetes running its own private PKI internally with the same primitives the web uses. cert-manager handles the lifecycle, service meshes push mTLS down to every pod, and webhook certs keep the extension layer locked down.

None of this is magic. It’s signed documents, verified by math, organized into trust hierarchies, and automated by protocols that took decades to get right. The whole system exists because the internet was built without authentication, and we’ve been bolting it on ever since.

✼ Typeset in Literata. Code in JetBrains Mono.
Diagrams rendered as inline SVG.

A treatise on trust.

Certificates fromFirst Principles

RSA