Certificates from
First Principles

— ✼ —
Part One
Cryptographic Foundations
01
The Adversary

Before we talk about certificates, we need to understand what world we live in without them. The internet is, by default, a postcard system. Every packet you send travels through dozens of routers, switches, and cables owned by people you don’t know and have no reason to trust. Your ISP can read your traffic. The coffee-shop WiFi operator can read your traffic. A government with a fiber tap can read your traffic. This isn’t paranoia — it’s physics. Data moves through shared infrastructure in plaintext unless you explicitly make it not.

In 2013, the Snowden disclosures revealed that the NSA’s MUSCULAR program was tapping the unencrypted links between Google’s data centers. Google engineers were furious — and immediately encrypted all inter-datacenter traffic. The threat model isn’t hypothetical.

This gives us two problems that certificates ultimately exist to solve:

Confidentiality. How do you prevent eavesdroppers from reading your data? You need encryption — a way to scramble the message so only the intended recipient can unscramble it.

Authentication. Even if you encrypt, how do you know you’re talking to the right server? An attacker could intercept your connection, present themselves as your bank, and you’d happily encrypt your password and send it straight to them. This is a man-in-the-middle attack, and encryption alone doesn’t prevent it. You need a way for the server to prove its identity.

Authentication is the harder problem. Encryption is “just math” — well-understood algorithms. But authentication requires trust infrastructure: someone, somewhere, needs to vouch for identities. That infrastructure is what we call PKI (Public Key Infrastructure), and certificates are its documents.

Certificates don’t do the encryption themselves. They solve the authentication problem — they let your browser verify that the public key it just received actually belongs to google.com and not to someone pretending to be Google. But to understand certificates, you need to understand the cryptographic primitives they’re built on. So let’s build up from nothing.

Tidbit — Firesheep

In 2010, a Firefox extension called Firesheep made the lack of HTTPS viscerally real. It let anyone on a coffee-shop WiFi click a button and hijack other people’s Facebook and Twitter sessions — no hacking skills required. Over a million people downloaded it in its first week. The resulting panic was one of the catalysts that pushed major sites to adopt HTTPS by default.

02
Symmetric Encryption

The oldest and most intuitive form of encryption: both parties share the same secret key. The sender uses the key to encrypt, the receiver uses the same key to decrypt. A lockbox with two identical keys.

Modern symmetric ciphers like AES-256-GCM are extraordinarily fast — billions of operations per second on CPUs with hardware AES-NI instructions. They’re also, as far as we know, unbreakable when used correctly. AES-256 has a keyspace of 2256 possible keys. If every atom in the observable universe were a computer trying a billion keys per second, it would take longer than the age of the universe to try them all.

Why GCM? AES is a block cipher — it encrypts 16 bytes at a time. You need a mode of operation to handle messages longer than 16 bytes. GCM (Galois/Counter Mode) is an AEAD (Authenticated Encryption with Associated Data): it encrypts and authenticates in one pass. If anyone tampers with even a single bit of the ciphertext, decryption fails entirely rather than producing corrupted plaintext.
Symmetric encryption — one key, two roles
flowchart LR
    A["ALICE<br/><i>plaintext 'hello bob'</i>"] --> E["AES-256-GCM<br/>ENCRYPT 🔑"]
    E --> C["a3f8c1...9e2b<br/><i>ciphertext</i>"]
    C --> D["AES-256-GCM<br/>DECRYPT 🔑"]
    D --> B["BOB<br/><i>plaintext 'hello bob'</i>"]
    E -.-|"same shared secret key"| D

But symmetric encryption has a devastating bootstrapping problem: how do you get the shared key to both parties in the first place? If Alice wants to talk securely to Bob, she needs to somehow transmit the key to him. But if she sends it over the network, an eavesdropper captures it. If she could already communicate securely with Bob, she wouldn’t need the key. It’s circular.

For centuries, this meant encryption required physical key exchange — diplomatic couriers, codebooks distributed in person, sealed envelopes. That’s fine for embassies. It doesn’t work when you want to buy something from a website you’ve never visited before.

Tidbit — The AES Competition

AES wasn’t designed in a back room. In 1997, NIST held a public competition to replace the aging DES cipher. Fifteen algorithms from teams worldwide were submitted. After three years of public cryptanalysis, Rijndael (by Belgian cryptographers Joan Daemen and Vincent Rijmen) won. The open process was deliberate — a cipher hiding a backdoor would be caught by the global community.

03
The Key Exchange Problem

In 1976, Whitfield Diffie and Martin Hellman published “New Directions in Cryptography,” one of the most important papers in the history of computer science. They proposed something that sounded impossible: two strangers could agree on a shared secret over a public channel, even if an eavesdropper heard every word.

The mathematical equivalent of the paint-mixing analogy is modular exponentiation or elliptic curve point multiplication — easy forward, computationally infeasible to reverse.
Diffie-Hellman — the paint-mixing analogy
sequenceDiagram
    participant Alice
    participant Bob
    Note over Alice,Bob: Public parameters: g, p (shared openly)
    Note left of Alice: secret a (private)
    Note right of Bob: secret b (private)
    Note left of Alice: Compute A = g^a mod p
    Note right of Bob: Compute B = g^b mod p
    Alice->>Bob: sends A (public value)
    Bob->>Alice: sends B (public value)
    Note over Alice,Bob: 👁 Eve sees A and B — but cannot compute the secret
    Note left of Alice: s = B^a mod p
    Note right of Bob: s = A^b mod p
    Note over Alice,Bob: ✅ SAME SHARED SECRET

This is the Diffie-Hellman key exchange. Modern TLS uses ECDHE — Elliptic Curve Diffie-Hellman Ephemeral — which achieves the same thing with smaller numbers and faster computation.

The “ephemeral” part is critical. Both sides generate new, temporary key pairs for every session. Even if a server’s long-term private key is compromised years later, an attacker who recorded past traffic can’t decrypt it, because the ephemeral keys are long gone. This property is called forward secrecy, and it’s why TLS 1.3 mandates ECDHE and dropped support for static RSA key exchange.

Why forward secrecy matters politically: Without it, a state actor can record all encrypted traffic today, steal a server’s private key next year, and decrypt everything retroactively. With forward secrecy, recorded ciphertext is permanently useless.

But Diffie-Hellman solves only half the problem. It gives you a shared secret — but it doesn’t tell you who you derived that secret with. A man-in-the-middle could perform DH with Alice, separately perform DH with Bob, and relay traffic between them. To prevent this, you need authentication — and that requires asymmetric cryptography used in a different way.

Tidbit — The Secret History

Diffie and Hellman weren’t actually first. In 1970 — six years earlier — James Ellis at Britain’s GCHQ independently discovered public-key cryptography. His colleague Clifford Cocks then invented what we now call RSA, also years before Rivest, Shamir, and Adleman. But it was all classified. The GCHQ work wasn’t declassified until 1997. Ellis died a month before the public announcement.

04
Asymmetric Cryptography

Asymmetric (public-key) cryptography uses a key pair: a public key you give to the world, and a private key you guard with your life. The two are mathematically linked by a trapdoor function — a computation that’s efficient in one direction and infeasible in the other.

RSA

The classic. Named after Rivest, Shamir, and Adleman (1977). The trapdoor is integer factorization:

  1. Pick two large random primes, p and q (each 1024+ bits).
  2. Compute n = p × q. This is your modulus — it’s public.
  3. Compute φ(n) = (p−1)(q−1). This requires knowing p and q.
  4. Choose a public exponent e (commonly 65537). Compute the private exponent d such that ed ≡ 1 (mod φ(n)).
  5. Public key: (n, e). Private key: (n, d).

The security rests on one fact: given n (a 2048-bit number), nobody knows how to efficiently find p and q. Multiplying two 1024-bit primes takes microseconds. Factoring their product takes longer than the universe will exist.

Why RSA is fading: RSA keys need to be huge (2048–4096 bits). ECDSA and Ed25519 achieve equivalent security with 256-bit keys — smaller certificates, faster handshakes, less bandwidth. Most new systems prefer EC keys.

Elliptic Curves (ECDSA, Ed25519)

Instead of factoring, EC crypto relies on the Elliptic Curve Discrete Logarithm Problem. You have a curve, a base point G, and a random integer k (your private key). Your public key is Q = kG. Given G and Q, recovering k is computationally infeasible. The result: 256-bit keys as strong as 3072-bit RSA keys.

What can you do with these keys?

Certificates care about the second operation — signing — far more than the first.

Tidbit — The Quantum Threat

Everything above has a ticking clock. Shor’s algorithm, run on a sufficiently powerful quantum computer, can factor large integers and solve discrete logarithms in polynomial time — breaking both RSA and elliptic curve crypto. “Harvest now, decrypt later” attacks are already a concern. NIST finalized its first post-quantum cryptography standards in 2024 (ML-KEM, ML-DSA, SLH-DSA). The next generation of certificates will use lattice-based math.

• • •
05
Digital Signatures

A digital signature proves two things simultaneously: who produced a piece of data, and that the data hasn’t been altered since it was signed.

Why hash first? Asymmetric operations are ~1000× slower than symmetric crypto. You don’t want to sign a 500MB file directly. Instead, you hash it to a 32-byte digest (SHA-256) and sign that. The hash is a faithful fingerprint — change one bit of the file, and the hash changes unpredictably.
Hash the message

Run the data through SHA-256. This produces a fixed-size 32-byte digest. It’s preimage-resistant (can’t reverse it) and collision-resistant (can’t find two inputs with the same hash).

Sign the hash

Encrypt the digest with your private key. The result is the signature — a blob that could only have been produced by someone possessing that private key.

Transmit message + signature

Send the original message, the signature, and your public key (or a certificate containing it).

Verify

The receiver independently hashes the message, decrypts the signature with your public key, and compares. If the hashes match, the signature is valid.

Digital signature — sign and verify
flowchart TB
    subgraph sign ["✏️ SIGNING"]
    direction LR
    M1["Message"] --> H1["SHA-256"] --> D1["32-byte Hash"] --> S1["Sign with PRIVATE KEY"] --> SIG["Signature ✍️"]
    end
    subgraph verify ["✅ VERIFYING"]
    direction LR
    M2["Message"] --> H2["SHA-256"] --> D2["Hash"]
    SIG2["Signature"] --> V["Decrypt with PUBLIC KEY"] --> D3["Hash"]
    D2 --> CMP{"Match?"}
    D3 --> CMP
    end

This mechanism underpins everything: TLS, code signing, JWTs, git commits, package managers, and certificates themselves.

When hash functions break: MD5 was standard until 2004, when researchers found practical collision attacks. SHA-1 fell in 2017 — Google’s “SHAttered” attack produced two different PDFs with the same SHA-1 hash. Both are now broken for signatures. This is why SHA-256 is everywhere.
06
Certificates

Now we can define what a certificate actually is, and it’s simpler than you might expect.

A certificate is a document that says: “I, the issuer, certify that this public key belongs to this identity.” And then the issuer signs that statement with their own private key.

That’s it. A certificate binds a public key to an identity, and a trusted third party’s signature makes that binding believable. The standard format is X.509v3:

FieldPurposeWhy it matters
SubjectWho this cert identifiesFor web: the domain. For K8s: the component identity.
SANsAdditional identitiesModern TLS uses SANs, not Subject CN, for hostname verification. A cert can cover multiple domains or IPs.
IssuerWho signed this certPoints up the chain of trust. If the issuer is trusted and the signature is valid, the cert is trusted.
Subject Public KeyThe key being certifiedThe actual payload. The whole point of the cert is to vouch for this key.
Validity PeriodNot Before / Not AfterLimits exposure if a key is compromised. Let’s Encrypt: 90 days. K8s: 1 year.
Key UsageAllowed operationsDigital Signature, Key Encipherment, Cert Sign. A leaf cert must NOT have Cert Sign.
Basic ConstraintsIs this a CA?CA:TRUE = can sign other certs. CA:FALSE = leaf. Critical security boundary.
SignatureIssuer’s digital signatureThe proof. Hash all fields, sign with issuer’s private key.
Why X.509? It originated from the X.500 directory standard — an OSI-era attempt at a global directory that mostly failed. The format survived because it was already in use by the time the web needed it. It’s overly complex — ASN.1 encoding, DER vs PEM — but too entrenched to replace.

Anyone can create a certificate claiming anything. I can generate a cert right now saying “this key belongs to google.com.” What makes it trustworthy isn’t its content — it’s the signature. And the signature is only meaningful if you trust the entity that signed it.

Tidbit — Certificate Transparency

Since 2018, Chrome requires all publicly-trusted certificates to be logged in Certificate Transparency (CT) logs — public, append-only, cryptographically auditable ledgers. Anyone can monitor them. If a CA issues a cert for google.com that Google didn’t request, Google’s monitoring catches it within minutes. CT has already exposed mis-issuances by Symantec, WoSign, and others.

07
The Chain of Trust

If you need a trusted third party to sign your cert, who signs their cert? And who signs that entity’s cert? This regression stops at root Certificate Authorities.

A root CA is a certificate that signs itself. Its issuer is itself. This is obviously circular — so why trust it? Because your operating system’s vendor has pre-installed it into your trust store — a curated list of roughly 150 root certificates that your machine trusts implicitly.

Root store governance is serious. Getting accepted into Apple’s or Mozilla’s root program requires annual WebTrust audits, compliance with CA/Browser Forum Baseline Requirements, and financial stability assessments. Violate the rules: distrusted. See Symantec (2017) and CNNIC (2015).

In practice, root CAs don’t directly sign your server’s certificate. The chain has three levels:

The three-level chain
Root CA
ISRG Root X1 (self-signed)
Private key lives in an HSM inside a locked cage inside a secure facility, powered on a few times per year to sign intermediate CA certificates. Air-gapped. Ceremony-based access with multiple key holders.
signs (rarely, with ceremony)
Intermediate CA
Let’s Encrypt R10
Online, handles the day-to-day signing. If compromised, it can be revoked and replaced without disturbing the root — the root stays safe in its vault and signs a new intermediate.
signs (automated, per-request)
Leaf Certificate
example.com
CA:FALSE — cannot sign other certs. Short-lived (90 days with Let’s Encrypt). This is what your server presents during the TLS handshake.

Why intermediates? The root CA’s private key is the single point of trust. If compromised, every certificate in the chain becomes untrustworthy, and there’s no recovery short of replacing the root in every device’s trust store worldwide. So root keys are kept offline. The intermediates do the daily work. If one is compromised, the root signs a new one, the old gets revoked, damage contained. Defense in depth applied to trust infrastructure.

Revocation is messy. CRLs (lists of revoked serials — get huge, stale). OCSP (real-time check — adds latency, fails-open). OCSP Stapling (server pre-fetches status — best option, but incomplete adoption). Let’s Encrypt’s strategy: make certs so short-lived that revocation matters less.

Verification in practice

  1. Server sends its leaf cert + intermediate cert (root is omitted — you already have it locally).
  2. Check the leaf’s signature using the intermediate’s public key. ✓
  3. Check the intermediate’s signature using the root’s public key (from your trust store). ✓
  4. Root is trusted. Chain complete. Connection trusted.
  5. Also: validity dates, SANs match hostname, key usage is appropriate, not revoked.
Tidbit — The DigiNotar Disaster

In 2011, attackers compromised DigiNotar, a Dutch CA, and issued fraudulent certificates for over 500 domains including *.google.com. The fake certs were used to intercept Gmail traffic of Iranian dissidents. When discovered, every browser vendor revoked DigiNotar’s root. The company filed for bankruptcy within a month.

08
The TLS Handshake

This is where every concept snaps together into a single coherent protocol. TLS 1.3 is the current version — faster and more secure than its predecessors.

TLS 1.3 removed a graveyard: RSA key exchange (no forward secrecy), CBC ciphers (BEAST, Lucky 13), compression (CRIME), renegotiation. TLS 1.2 needed 2 round-trips; 1.3 does it in 1. Every removed feature had a CVE history.
TLS 1.3 handshake — 1 round-trip
sequenceDiagram
    participant C as Client (browser)
    participant S as Server (website)
    Note over C,S: 🔑 KEY EXCHANGE
    C->>S: ClientHello + ECDHE key share + cipher suites
    S->>C: ServerHello + ECDHE key share
    Note over C,S: 🔒 ENCRYPTED FROM HERE
    Note over C,S: 🛡️ AUTHENTICATION
    S->>C: Certificate + CertificateVerify + Finished
    C->>S: Finished
    Note over C,S: 📦 APPLICATION DATA (AES-256-GCM)
    C-->>S: encrypted data
    S-->>C: encrypted data

Notice the layering: ECDHE for key exchange (forward secrecy), certificates + signatures for authentication, AES-GCM for bulk encryption. Each primitive does what it’s best at.

CertificateVerify deserves special attention: the server signs the handshake transcript with its private key. This proves the server possesses the private key, not just the (public) certificate. Without it, an attacker who obtained the cert file (but not the key) could impersonate the server.

Tidbit — 0-RTT Resumption

TLS 1.3 has a trick: 0-RTT. If a client has connected before, it can send data in the very first message — zero round trips. The catch: 0-RTT data is replayable. An attacker can capture and resend it. So 0-RTT should only be used for idempotent requests (GET, not POST). It’s a deliberate security/performance tradeoff.

09
Let’s Encrypt & ACME

Before November 2015, getting a TLS certificate meant: paying $50–$300/year, generating a CSR manually, emailing it, waiting days, receiving the cert via email, installing it, and setting a calendar reminder. Let’s Encrypt changed everything by making certificates free, automated, and open.

Why 90-day certs? Three reasons. (1) Limiting damage: compromised key exposure is at most 90 days. (2) Forcing automation: you can’t manually manage 90-day certs, so you must automate. (3) Agility: if a vulnerability is found, the fleet rotates within 90 days with zero human intervention.

How ACME works

The core idea: prove you control the domain before the CA signs. Let’s Encrypt only does Domain Validation (DV).

ACME protocol — domain validation and certificate issuance
sequenceDiagram
    participant AC as ACME Client (certbot/caddy)
    participant LE as Let's Encrypt (ACME CA)
    participant YS as Your Server (example.com)
    AC->>LE: 1. Order: cert for example.com
    LE->>AC: 2. Challenge: put token at /.well-known/...
    AC-->>YS: 3. Places token on server
    LE->>YS: 4. HTTP GET token
    YS->>LE: token ✓
    AC->>LE: 5. CSR (public key + domain)
    LE->>AC: 6. 🔒 Signed certificate
Let’s Encrypt’s chain: Root = ISRG Root X1 (RSA 4096) and ISRG Root X2 (ECDSA P-384). Intermediates = R10/R11 (RSA) and E5/E6 (ECDSA). They’ve issued billions of certificates, securing over 360 million websites.

DNS-01 is the other challenge type: create a TXT record at _acme-challenge.example.com. Advantage: works for wildcard certs, no ports needed. Disadvantage: requires DNS API access.

— ◆ —

That is the complete web PKI story. From trapdoor functions, through chains of trust, through TLS handshakes, to ACME automation. Now let’s see how Kubernetes takes these same primitives and uses them internally.

— ☵ —
Part Two
Kubernetes PKI
10
Why Kubernetes Needs Its Own PKI

On the web, TLS is usually one-directional: the server proves its identity, but the client stays anonymous. Kubernetes is different. Every component must authenticate to every other component, using mutual TLS (mTLS) — both sides present certificates.

Why mTLS, not tokens? Defense in depth. A bearer token can be stolen and replayed from anywhere. A client certificate proves the holder possesses a private key that never left the node. K8s also supports token auth, but cert-based auth is primary for infrastructure components because it provides identity at the transport layer.

The kubelet on worker-3 reports pod status. The API server checks: is this a legitimate kubelet? The kubelet presents a client cert with CN=system:node:worker-3, O=system:nodes. The API server verifies it against the cluster CA, extracts the identity, applies RBAC. Meanwhile, the kubelet verifies the API server’s cert too.

Kubernetes cluster — certificate-authenticated connections
flowchart TB
    KCT["kubectl"] -->|"kubeconfig cert"| API
    subgraph CP ["CONTROL PLANE"]
        CM["controller-manager"] -->|"client cert"| API
        SCH["scheduler"] -->|"client cert"| API
        API["kube-apiserver :6443 HTTPS"]
        API -->|"mTLS etcd CA"| ETCD["etcd — separate CA"]
        API -->|"front-proxy"| MS["metrics-server"]
    end
    subgraph WN ["WORKER NODES"]
        W1["worker-1 kubelet\nCN=system:node:worker-1"]
        W2["worker-2 kubelet\nCN=system:node:worker-2"]
        W3["worker-3 kubelet\nCN=system:node:worker-3"]
    end
    W1 -->|"mTLS cluster CA"| API
    W2 -->|"mTLS cluster CA"| API
    W3 -->|"mTLS cluster CA"| API

Every arrow requires at least one, usually two, certificates. This is why a cluster has so many — not overengineering, but the minimum machinery for authenticated communication.

11
The Cluster CA

When you initialize a cluster with kubeadm init, the first thing generated is the cluster CA:

/etc/kubernetes/pki/ca.crt    # Root certificate (public, distributed everywhere)
/etc/kubernetes/pki/ca.key    # Root private key (the crown jewel)

This CA is the root of trust for the entire cluster. Every other K8s certificate is either signed directly by this CA or by a subordinate signed by it. The ca.crt is embedded in every kubeconfig.

If ca.key is stolen: The attacker signs any cert with any subject — create a cert with O=system:masters and get unrestricted cluster-admin access. In production, consider an external CA (HashiCorp Vault), where the root key lives in a hardware-backed secret store and signing happens through an auditable API.
12
API Server Certificates

The API server is the nexus — every component talks to it. It needs certificates for both directions:

Server certificate (incoming connections)

/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/apiserver.key

Presented to anything connecting to the API server. Its SANs must include every reachable name/IP: kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, the node’s hostname and IP, the cluster IP (10.96.0.1), and any load-balancer addresses.

Why so many SANs? TLS hostname verification checks that the hostname matches a SAN. If a pod connects to kubernetes.default.svc but the cert only lists 10.96.0.1, the connection fails. This is the most common source of “x509: certificate is valid for X, not Y” errors.

Client certificate (outgoing to kubelets)

/etc/kubernetes/pki/apiserver-kubelet-client.crt / .key

Used when the API server connects to kubelets (kubectl logs, kubectl exec). Subject: O=system:masters.

13
Kubelet Certificates

Each kubelet — one per node — needs its own pair of certificates.

Client certificate (kubelet → API server)

The subject IS the kubelet’s RBAC identity:

Subject: CN = system:node:worker-3
         O  = system:nodes

CN identifies the node. O maps to a K8s group. The system:nodes group is bound to the system:node ClusterRole, which scopes kubelet permissions to pods on its own node.

Identity in cert subjects: K8s uses certificate fields directly as the authenticated identity — no separate user database. CN = username, O = groups. Consequence: you can’t revoke access without revoking the cert or waiting for expiry. This is why cert lifetimes and rotation matter so much.

Server certificate (API server → kubelet)

When the API server initiates connections to the kubelet (logs, exec, port-forward), the kubelet presents its server cert with the node’s IP and hostname as SANs.

14
Node Bootstrapping

Here is the chicken-and-egg problem that makes K8s cert management genuinely interesting: a new node needs a client cert to talk to the API server. To get one signed, it must submit a CSR to the API server. To talk to the API server, it needs a cert.

The solution: TLS Bootstrap — a protocol using a short-lived, low-privilege bootstrap token to break the circularity.

Why not pre-generate certs? You could, for small static clusters. But in a cloud-native world with autoscaling, spot instances, and thousands of nodes churning, you need nodes to self-provision certificates.
TLS Bootstrap — complete node joining sequence
Actor Action Why
Phase 1 — Preparation
admin kubeadm token create — generates abcdef.0123456789abcdef. Token is limited: 24h expiry, only permission to create CSRs.
admin Provides the node a bootstrap kubeconfig: API server address, cluster CA cert, bootstrap token. CA cert lets the node verify the API server. --discovery-token-ca-cert-hash prevents MITM during bootstrap.
Phase 2 — Initial Contact
kubelet Connects to API server using the bootstrap token. Authenticated as system:bootstrappers. Almost no permissions — only enough to submit a CSR.
Phase 3 — Certificate Request
kubelet Generates a fresh key pair locally. Private key never leaves the node. API server only receives the public key inside the CSR.
kubelet Submits CertificateSigningRequest: CN=system:node:<name>, O=system:nodes, usage: client auth. CSR is a standard K8s resource. Visible with kubectl get csr.
Phase 4 — Approval & Signing
csrapproving Policy check: from system:bootstrappers? Subject matches system:node:*? Only client auth? → auto-approve. A request for O=system:masters from a bootstrap token would be rejected.
csrsigning Signs the CSR with the cluster CA key. This is where ca.key is used.
Phase 5 — Normal Operation
kubelet Downloads signed cert, writes it to disk, reconnects as system:node:<name>. Bootstrap token discarded. Full node-level permissions via RBAC.
kubelet With --rotate-certificates=true: auto-submits new CSR before expiry. Zero-downtime rotation. Continuous rotation for the node’s entire lifetime.
kubeadm join wraps all of this. One command: connect with token, verify CA cert by hash, execute the full TLS bootstrap flow, start the kubelet. The ceremony, automated.

The beauty of TLS Bootstrap is how it decomposes a chicken-and-egg problem into minimal privilege escalations: start with a low-value token, use it to obtain a proper certificate, discard the token.

15
etcd, Front Proxy & Service Accounts

Beyond the cluster CA, Kubernetes maintains two additional CAs and a signing key pair, each for a distinct trust domain.

The etcd CA

/etc/kubernetes/pki/etcd/ca.crt      # etcd root CA
/etc/kubernetes/pki/etcd/server.crt  # etcd server cert
/etc/kubernetes/pki/etcd/peer.crt    # etcd-to-etcd replication

etcd stores every piece of cluster state. Its CA is deliberately separate — blast radius reduction.

Why separate? Compromised cluster CA = impersonate K8s components, but can’t impersonate etcd (different CA). Compromised etcd CA = read/write cluster state, but can’t impersonate K8s components. Two independent breaches required for full compromise.

The Front Proxy CA

For the API aggregation layer. When the API server proxies to an extension server (metrics-server), it presents the front-proxy-client cert and passes user identity via headers. The extension server trusts only the front-proxy CA. Third trust domain.

Service Account Key Pair

/etc/kubernetes/pki/sa.key    # Signs JWTs
/etc/kubernetes/pki/sa.pub    # Verifies JWTs

Not X.509. A raw key pair for signing/verifying ServiceAccount tokens (JWTs). Controller-manager signs with sa.key; API server verifies with sa.pub.

Bound tokens (1.22+): Old K8s mounted long-lived, non-expiring JWTs. Since 1.22, the TokenRequest API issues bound tokens: audience-restricted, time-limited (1h default, auto-refreshed), deleted with the pod.
Three independent trust domains
Cluster CA
ca.crt / ca.key
apiserver.crt
apiserver-kubelet-client.crt
kubelet client & server certs
scheduler.conf
controller-manager.conf
admin.conf
etcd CA
etcd/ca.crt / ca.key
etcd/server.crt
etcd/peer.crt
apiserver-etcd-client.crt
Front Proxy CA
front-proxy-ca.crt
front-proxy-client.crt

+ sa.key/sa.pub (JWT, not X.509)
16
The Complete Certificate Map

Here is every certificate and key file on a kubeadm-provisioned control-plane node:

FileTypeSigned ByUsed ByPurpose
Cluster CA Trust Domain
ca.crt / ca.keyRoot CASelf-signedEverythingCluster root of trust
apiserver.crtServerCluster CAkube-apiserverTLS for incoming connections
apiserver-kubelet-client.crtClientCluster CAkube-apiserverAPI server → kubelets
kubelet client certClientCluster CAkubeletKubelet → API server
kubelet server certServerCluster CAkubeletKubelet HTTPS (port 10250)
scheduler.confClientCluster CAkube-schedulerScheduler → API server
controller-manager.confClientCluster CAcontroller-managerCM → API server
admin.confClientCluster CAkubectlCluster-admin (O=system:masters)
etcd CA Trust Domain
etcd/ca.crtRoot CASelf-signedetcdSeparate root for etcd
etcd/server.crtServeretcd CAetcdClient → etcd TLS
etcd/peer.crtPeeretcd CAetcdetcd ↔ etcd replication
apiserver-etcd-client.crtClientetcd CAkube-apiserverAPI server → etcd
Front Proxy CA Trust Domain
front-proxy-ca.crtRoot CASelf-signedAggregationAPI aggregation trust root
front-proxy-client.crtClientFront Proxy CAkube-apiserverProxying to extension APIs
Service Account Keys (not X.509)
sa.key / sa.pubKey pairN/ACM / apiserverSign & verify SA JWTs

~14 cert/key pairs + 1 SA key pair on a single control-plane node. In a 3-node HA setup: 30+ certificates.

Lifetimes: CA = 10 years. Components = 1 year. Kubelet auto-rotates (default since 1.19). Monitor with kubeadm certs check-expiration or Prometheus alert on apiserver_client_certificate_expiration_seconds.

17
Bootstrapping a Cluster

Building a Kubernetes cluster from scratch reveals the careful choreography of PKI generation, component startup, and trust establishment. Whether you use kubeadm, kubespray, or do it the hard way, the same sequence plays out. Understanding it demystifies the ~20 files that appear in /etc/kubernetes/pki/.

The kubeadm init sequence

When you run kubeadm init, the following happens in order:

Generate the PKI

Three CAs are created: the cluster CA (ca.crt/ca.key), the etcd CA (etcd/ca.crt/ca.key), and the front-proxy CA (front-proxy-ca.crt/ca.key). Then all component certificates are signed — API server, kubelet client, etcd server/peer, front-proxy client, and the service account key pair.

Generate kubeconfigs

Four kubeconfig files are created embedding client certificates: admin.conf, controller-manager.conf, scheduler.conf, and kubelet.conf. Each contains the cluster CA cert (for verifying the API server) and a client cert (for authenticating to it).

Write static pod manifests

The API server, controller manager, scheduler, and etcd are defined as static pods — YAML manifests written directly to /etc/kubernetes/manifests/. The kubelet on the control-plane node watches this directory and starts them without needing an API server (because the API server doesn’t exist yet).

Start etcd

etcd starts first, presenting its server cert and requiring mTLS from any client. It creates the initial cluster state database. Nothing else can start until etcd is healthy.

Start the API server

The API server starts, connects to etcd (using apiserver-etcd-client.crt), and begins serving on port 6443. It loads the cluster CA to verify incoming client certs and the SA public key to verify JWT tokens.

Start controller manager & scheduler

Both connect to the API server using their respective kubeconfigs. The controller manager also loads ca.key — it needs this to sign CSRs for node bootstrap and to sign SA tokens with sa.key.

Bootstrap token & RBAC

kubeadm creates a bootstrap token, sets up the system:bootstrappers ClusterRoleBindings, and configures the CSR auto-approval rules. The cluster is now ready to accept worker nodes.

Install addons

CoreDNS and kube-proxy are deployed as cluster addons. A CNI plugin (Calico, Cilium, Flannel) must be installed separately — without it, pods cannot communicate across nodes and the cluster is not fully functional.

The static pod trick: Kubernetes needs a kubelet to run pods, but the kubelet needs an API server to get pod specs. Static pods break this by letting the kubelet read manifests from a local directory. The control-plane bootstraps itself this way — the kubelet starts the API server as a pod, then connects to it.
18
Kubernetes Components & Networking

A running Kubernetes cluster is a system of cooperating components, each with a distinct role. Understanding what each does — and how they authenticate to each other — is essential for diagnosing issues and securing the cluster.

Control plane components

kube-apiserver is the central hub. Every other component communicates through it — no component talks directly to another (except etcd, which only the API server reaches). It exposes the Kubernetes API over HTTPS on port 6443, authenticates every request via client certificates or bearer tokens, and authorizes via RBAC.

etcd is the cluster’s brain — a distributed key-value store that holds all cluster state: pod specs, service definitions, secrets, configmaps, RBAC policies. It runs its own Raft consensus protocol for replication across control-plane nodes. Only the API server talks to etcd, using a separate CA for mTLS. If etcd is lost and unrecoverable, the cluster state is gone.

kube-controller-manager runs the control loops that reconcile desired state with actual state. The node controller detects unreachable nodes. The deployment controller manages ReplicaSets. The endpoint controller populates Endpoints. It also runs the CSR signing controller that signs kubelet certificate requests during TLS bootstrap.

kube-scheduler watches for newly created pods with no assigned node and selects the best node based on resource requests, affinity rules, taints, and topology. It only writes the nodeName field — the kubelet on that node then does the actual work.

All roads through the API server: The scheduler doesn’t tell the kubelet to run a pod. It writes to the API server. The kubelet watches the API server. This hub-and-spoke model means there’s exactly one component to secure, audit, and rate-limit: the API server.

Node components

kubelet is the node agent. It watches the API server for pods assigned to its node, tells the container runtime to start them, reports status back, and exposes a local HTTPS API on port 10250 (for kubectl logs, kubectl exec). It presents a client cert to the API server and a server cert to incoming connections. Both are managed via TLS bootstrap and auto-rotation.

kube-proxy runs on every node and implements Kubernetes Service networking. When you create a Service, kube-proxy programs the node’s network rules so that traffic to the Service’s ClusterIP gets load-balanced across the backing pods. It has three modes:

kube-proxy authenticates to the API server via a kubeconfig with a client cert or a ServiceAccount token.

Cilium — replacing kube-proxy with eBPF

Cilium is a CNI plugin that uses eBPF (extended Berkeley Packet Filter) to implement networking, security, and observability directly in the Linux kernel. In many production clusters, Cilium replaces kube-proxy entirely.

Traditional kube-proxy operates in userspace-adjacent mode: it watches the API server for Service/Endpoint changes, then programs iptables/IPVS rules. Every packet hitting a Service IP traverses the iptables chain. Cilium takes a radically different approach:

Why eBPF matters: iptables rules are evaluated linearly. With 10,000 services, that’s 10,000+ rules per packet. eBPF hash-maps give O(1) lookups. In benchmarks, Cilium shows 2–4x throughput improvement over iptables-based kube-proxy at scale. Google’s GKE Dataplane V2 is built on Cilium.

Cilium agents run as a DaemonSet. Each agent authenticates to the API server (via ServiceAccount token or cert) and watches Pods, Services, Endpoints, and CiliumNetworkPolicies. When you deploy Cilium with kubeProxyReplacement=true, you can skip installing kube-proxy entirely.

CoreDNS

CoreDNS provides cluster-internal DNS. When a pod looks up my-service.default.svc.cluster.local, CoreDNS resolves it to the Service’s ClusterIP. It runs as a Deployment with a ServiceAccount, watches the API server for Service and Endpoint changes, and serves DNS on the cluster DNS IP (typically 10.96.0.10). Every pod’s /etc/resolv.conf points to this IP.

Container runtime (containerd)

containerd is the container runtime that actually pulls images and runs containers. The kubelet communicates with it over a local Unix socket using the CRI (Container Runtime Interface) protocol. This connection is local (not network), so it doesn’t use TLS — it relies on filesystem permissions for security.

Component interactions — every arrow is authenticated
flowchart TB
    KCT["kubectl"] -->|"kubeconfig cert"| API
    CM["controller-manager"] -->|"client cert"| API
    SCH["scheduler"] -->|"client cert"| API
    API["kube-apiserver :6443\nthe only hub"] -->|"mTLS etcd CA"| ETCD["etcd\nstate store"]
    API -->|"front-proxy cert"| MS["metrics-server"]
    API <-->|"mTLS cluster CA"| KL["kubelet :10250\nnode agent"]
    KP["kube-proxy"] -->|"SA token"| API
    CIL["cilium-agent\neBPF networking"] -->|"SA token"| API
    KL -->|"CRI Unix socket"| CTD["containerd"]
    DNS["CoreDNS"] -->|"SA token"| API
Tidbit — The kube-proxy vs. Cilium Decision

For small clusters, kube-proxy in iptables mode works fine. At scale (500+ services), iptables rule updates become a bottleneck — each Service change triggers a full iptables save/restore. Cilium’s eBPF approach scales to tens of thousands of services with constant-time lookups. The tradeoff: Cilium requires a Linux kernel ≥4.19 (ideally 5.10+) and adds operational complexity. Most managed Kubernetes offerings (GKE, EKS) now offer Cilium-based dataplanes as a first-class option.


19
cert-manager

kubeadm manages control-plane certificates, but what about certificates your applications need? Ingress controllers need TLS certs for your domains. Internal services need certs for mTLS. This is where cert-manager comes in — the de facto standard for certificate lifecycle management in Kubernetes.

cert-manager runs as a set of controllers inside the cluster. It watches for Certificate resources you define, requests certs from configured issuers, stores them as Kubernetes Secrets, and automatically renews them before expiry.

cert-manager is not a CA. It’s a certificate lifecycle controller. It talks to CAs (Let’s Encrypt, Vault, self-signed, Venafi, AWS Private CA) on your behalf. It handles the ACME dance, CSR generation, renewal, and secret distribution — so you don’t.

The resource model

cert-manager introduces four key CRDs:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  secretName: example-com-tls     # cert stored here as a Secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - example.com
    - www.example.com
  duration: 2160h                 # 90 days
  renewBefore: 360h               # renew 15 days before expiry

When this resource is created, cert-manager automatically: creates an ACME order, solves the challenge (HTTP-01 or DNS-01), submits a CSR to Let’s Encrypt, stores the signed cert + private key in the example-com-tls Secret, and renews it every 75 days. Your Ingress controller picks up the Secret and serves TLS. Zero human involvement.

Ingress annotations shortcut: For simple cases, you can skip the Certificate resource entirely. Add cert-manager.io/cluster-issuer: letsencrypt-prod to your Ingress annotations, and cert-manager creates the Certificate automatically from the Ingress’s TLS configuration.
Tidbit — The trust-manager companion

cert-manager has a companion project: trust-manager. While cert-manager distributes leaf certificates, trust-manager distributes CA bundles. It ensures that every namespace has an up-to-date trust bundle (ConfigMap) containing the CAs your workloads need to verify connections. Together, they handle both sides of the trust equation: “here’s my cert” and “here’s who I trust.”

20
Service Mesh PKI

Kubernetes PKI secures communication between infrastructure components. But what about the traffic between your application pods? By default, pod-to-pod traffic is unencrypted plaintext traveling over the cluster network. Anyone who compromises a node — or taps the network fabric — can read it all.

This is the problem service meshes solve. Istio, Linkerd, and Cilium extend mTLS to every pod-to-pod connection, transparently and without application code changes.

SPIFFE identities: Service meshes use the SPIFFE standard (Secure Production Identity Framework for Everyone) to assign identities. A SPIFFE ID looks like spiffe://cluster.local/ns/default/sa/payment-service. The cert’s SAN carries this URI. Identity is tied to the Kubernetes ServiceAccount, not the pod IP.

How Istio does it

  1. istiod runs as the mesh’s own CA (or delegates to an external CA like Vault).
  2. Each pod gets a sidecar proxy (Envoy) injected automatically. The sidecar intercepts all inbound/outbound traffic.
  3. On startup, the sidecar requests a short-lived certificate from istiod via SDS (Secret Discovery Service). The cert encodes the pod’s SPIFFE identity.
  4. Every connection between pods is automatically upgraded to mTLS. The sidecars handle the handshake — the application sees plain HTTP.
  5. Certificates are rotated automatically, typically every 24 hours (configurable, default in Istio).

The result: every pod-to-pod connection is encrypted and mutually authenticated, with cryptographic identity tied to ServiceAccounts, and certificates that rotate on a 24-hour cadence. An attacker who compromises a pod gets a cert that’s valid for at most a day and can only identify as that specific service.

Linkerd takes a different approach: Instead of Envoy sidecars, it uses its own ultra-lightweight Rust-based proxy (linkerd2-proxy). The cert rotation is even more aggressive — 24 hours by default, configurable down to minutes. Linkerd’s design prioritizes simplicity and low resource overhead over Istio’s feature breadth.
Tidbit — Zero-Trust Networking

Service mesh mTLS is the practical implementation of zero-trust networking inside a cluster. The old model: “the network perimeter is secure, trust everything inside.” The new model: “verify every connection, regardless of source.” The 2020 SolarWinds attack — where attackers moved laterally through trusted internal networks for months — was the definitive proof that perimeter-based trust models fail. mTLS ensures that even if an attacker is inside the cluster, they can’t impersonate other services.

21
Webhook & Extension Certificates

Kubernetes is deeply extensible, and every extension point that involves network communication needs certificates. The most common: admission webhooks.

When you create a ValidatingWebhookConfiguration or MutatingWebhookConfiguration, the API server needs to make HTTPS calls to your webhook server. This requires the webhook to serve TLS, and the API server needs to trust the webhook’s certificate.

Why not plain HTTP? Admission webhooks see every object being created, updated, or deleted in the cluster. A MITM on this connection could silently modify resources, inject containers, or escalate privileges. TLS is non-negotiable here.

Three approaches to webhook certs

Other extension points with certificate requirements: API aggregation (custom API servers registered via APIService), CRI (container runtime interface, kubelet-to-runtime TLS), and external admission controllers (OPA Gatekeeper, Kyverno). Each needs a cert the API server trusts, and each must be rotated.

Tidbit — The caBundle Trap

A common gotcha: you create a webhook, cert-manager generates the cert, everything works. Six months later, cert-manager rotates to a new CA — but the caBundle in your webhook configuration still has the old CA. Suddenly every resource mutation fails because the API server can’t verify the webhook. The fix: always use cert-manager’s cainjector (annotate your webhook with cert-manager.io/inject-ca-from) so the CA bundle updates automatically when the cert rotates.

22
Troubleshooting Certificate Issues

Certificate errors in Kubernetes are among the most common and most confusing operational issues. Here’s a field guide to the errors you’ll encounter and what they actually mean.

The error menagerie

ErrorMeaningFix
x509: certificate signed by unknown authorityThe verifier doesn’t have the CA that signed this cert in its trust store.Ensure --client-ca-file or --root-ca-file points to the correct CA bundle. Check if the CA was rotated.
x509: certificate is valid for X, not YThe hostname you connected to doesn’t match any SAN in the cert.Regenerate the cert with the missing SAN. For apiserver: kubeadm init --apiserver-cert-extra-sans=...
x509: certificate has expiredCurrent time is past the cert’s Not After date.kubeadm certs renew all, then restart control-plane components. Check kubelet rotation is enabled.
tls: bad certificateThe server rejected the client’s cert (or vice versa). CA mismatch.Verify both sides trust each other’s CA. Common when etcd CA and cluster CA are confused.
remote error: tls: internal errorThe remote side crashed during the handshake. Often a misconfigured cert/key pair.Verify the cert and key match: openssl x509 -noout -modulus -in cert | md5 should equal openssl rsa -noout -modulus -in key | md5.
certificate-authority-data is emptykubeconfig is missing the CA cert.Re-extract from /etc/kubernetes/pki/ca.crt and base64-encode into the kubeconfig.

Essential debug commands

# Check all control-plane cert expiry dates at a glance
kubeadm certs check-expiration

# Inspect a specific cert in detail
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

# See what a live server presents (without trusting it)
openssl s_client -connect <api-server>:6443 -showcerts 2>/dev/null | \
  openssl x509 -text -noout

# Verify a cert was signed by a specific CA
openssl verify -CAfile /etc/kubernetes/pki/ca.crt \
  /etc/kubernetes/pki/apiserver.crt

# Check if cert and key match (compare modulus hashes)
diff <(openssl x509 -noout -modulus -in cert.crt | md5) \
     <(openssl rsa -noout -modulus -in cert.key | md5)

# View pending and approved CSRs
kubectl get csr -o wide

# Check cert-manager certificate status
kubectl get certificates -A
kubectl describe certificate <name>

# Decode a cert from a K8s secret
kubectl get secret example-tls -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -text -noout
Monitoring tip: Set a Prometheus alert on apiserver_client_certificate_expiration_seconds < 604800 (7 days) to catch expiring certs before they break your cluster. cert-manager exports its own metrics: certmanager_certificate_expiration_timestamp_seconds.
Tidbit — The 1-Year Cert Cliff

A surprising number of Kubernetes outages are caused by expired certificates. kubeadm component certs expire after 1 year. If you don’t run kubeadm certs renew all (or upgrade, which auto-renews) before the anniversary, the API server stops accepting connections. The cluster is up but unreachable. Entirely preventable with monitoring — yet it catches teams every year, including large-scale production environments.


That’s the complete picture. From trapdoor functions, through chains of trust, through TLS handshakes, through ACME automation, to a new Kubernetes node bootstrapping itself with nothing but a temporary token and a CA hash — and beyond, to cert-manager automating certificate lifecycle, service meshes extending mTLS to every pod, and webhook certificates securing the extension layer. Every layer built on the one below it. Every design decision motivated by a specific threat or operational reality.

Certificates aren’t magic — they’re signed documents, verified by math, organized by trust hierarchies, and automated by protocols. The complexity is the minimum machinery required to establish trust between strangers over hostile networks.

Typeset in Literata. Code in JetBrains Mono.
Diagrams rendered as inline SVG.

A treatise on trust.