Hardening Kubernetes for Production

A default Kubernetes cluster is a soft target: anonymous API access in some distributions, containers running as root with the full Linux capability set, flat pod networking where any compromised workload can reach any other, and no verification of where your images came from. Production hardening is not a single setting — it is defence in depth across four layers: the cluster control plane, the workloads, the supply chain that produces those workloads, and the runtime. Compromise of one layer should not hand an attacker the rest.

Cluster hardening

Start at the control plane. Benchmark every cluster against the CIS Kubernetes Benchmark using kube-bench, and treat failures as findings to remediate, not noise to ignore.

# Run CIS benchmark checks against the cluster
kube-bench run --targets master,node,etcd,policies \
  --benchmark cis-1.9 --json | jq '.Totals'

The non-negotiables for the API server:

No anonymous auth — set --anonymous-auth=false. Unauthenticated requests should be rejected outright.
RBAC, least privilege — no wildcard verbs/resources in production roles, no binding users to cluster-admin, and scope ServiceAccount permissions to exactly what each workload needs.
Audit logging — enable an audit policy so every API call is recorded. Without it you cannot investigate a breach.

# Minimal API server audit policy — log metadata for all, bodies for secrets access
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets", "configmaps"]
  - level: Metadata
    omitStages: ["RequestReceived"]

Audit RBAC regularly — permissions accrete over time. kubectl auth can-i --list --as=system:serviceaccount:default:my-sa quickly shows what a given identity can actually do.

Workload hardening

A container that runs as root with a writable root filesystem and all capabilities is one escape away from owning the node. Enforce a locked-down securityContext on every pod:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      automountServiceAccountToken: false   # off unless the pod calls the API
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: api
          image: registry.example.com/api@sha256:...   # pin by digest
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            privileged: false
            capabilities:
              drop: ["ALL"]
          resources:
            limits: { cpu: "1", memory: "512Mi" }
            requests: { cpu: "250m", memory: "256Mi" }

Enforce this cluster-wide rather than trusting every author to remember it. The built-in Pod Security Standards admission controller applies the restricted profile by namespace label:

apiVersion: v1
kind: Namespace
metadata:
  name: payments
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: restricted

The restricted profile rejects privileged pods, host namespace sharing, and root execution — closing the most common workload misconfigurations by default.

Network policies with default-deny

By default, every pod can talk to every other pod. That flat network means one compromised front-end can reach your database directly. Adopt a default-deny posture per namespace, then allow only required flows.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}            # every pod in the namespace
  policyTypes: [Ingress, Egress]
# no ingress/egress rules = deny all
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow
  namespace: payments
spec:
  podSelector:
    matchLabels: { app: api }
  policyTypes: [Ingress, Egress]
  ingress:
    - from:
        - podSelector: { matchLabels: { app: gateway } }
      ports:
        - { protocol: TCP, port: 8080 }
  egress:
    - to:
        - podSelector: { matchLabels: { app: postgres } }
      ports:
        - { protocol: TCP, port: 5432 }
    - to:                    # allow DNS
        - namespaceSelector: {}
          podSelector: { matchLabels: { k8s-app: kube-dns } }
      ports:
        - { protocol: UDP, port: 53 }

Default-deny that forgets DNS egress will break every pod in the namespace. Always explicitly allow port 53 to kube-dns when you lock down egress.

Secrets

Kubernetes Secrets are base64-encoded, not encrypted — anyone with etcd or namespace read access can decode them. Two things are mandatory:

Encryption at rest with KMS envelope encryption. Configure an EncryptionConfiguration with a KMS provider (AWS KMS, Azure Key Vault, GCP KMS) so the data encryption key is itself wrapped by a key you control and rotate.
Never store secrets in Git, plaintext manifests, or ConfigMaps. Pull them from an external store at runtime using the External Secrets Operator, which syncs from AWS Secrets Manager, Vault, or Azure Key Vault into Kubernetes Secrets without the secret material living in your repo.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: payments
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: db-credentials
  data:
    - secretKey: password
      remoteRef:
        key: prod/payments/db
        property: password

Supply chain

You must trust what you run. Three controls form the chain:

Scan images for known CVEs with Trivy in CI, and fail the build on high/critical findings.
Sign images with cosign so their provenance is verifiable.
Admission control to block anything unsigned, unscanned, or risky from ever being admitted.

# Scan and fail the pipeline on serious vulnerabilities
trivy image --severity HIGH,CRITICAL --exit-code 1 \
  registry.example.com/api:$GIT_SHA

# Sign the image after a clean scan (keyless, OIDC-backed)
cosign sign --yes registry.example.com/api@sha256:$DIGEST

Then enforce verification at admission. A Kyverno policy that rejects images lacking a valid signature:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: Enforce
  webhookTimeoutSeconds: 30
  rules:
    - name: require-signed-images
      match:
        any:
          - resources:
              kinds: [Pod]
      verifyImages:
        - imageReferences: ["registry.example.com/*"]
          attestors:
            - entries:
                - keyless:
                    issuer: "https://token.actions.githubusercontent.com"
                    subject: "https://github.com/example/api/.github/workflows/release.yml@refs/heads/main"

Either Kyverno or OPA Gatekeeper works here; Kyverno's native image-verification rules are more ergonomic for cosign, while Gatekeeper's Rego is more flexible for arbitrary policy. Pick one and standardise.

Control	Tool	Stops
Vulnerability scanning	Trivy	Running known-vulnerable images
Image signing/verify	cosign + Kyverno	Tampered or untrusted images
Policy admission	Kyverno / Gatekeeper	Misconfigured or non-compliant resources
Runtime detection	Falco	Live exploitation after admission

Runtime detection and node hardening

Admission control stops bad things being admitted; it cannot see what a workload does once running. Falco watches kernel syscalls and alerts on suspicious behaviour — a shell spawned in a container, an unexpected outbound connection, a write to a sensitive path. Wire its alerts into the same on-call pipeline as everything else.

Finally, harden the nodes themselves: use a minimal, hardened OS image, keep the kernel and kubelet patched, disable the read-only kubelet port, restrict SSH, and run nodes in private subnets behind a bastion or session-manager equivalent. A workload's blast radius is bounded by how locked down the host underneath it is.

Treat nodes as cattle, not pets. Patch by replacing the node image and rolling the fleet, not by SSHing in to run apt upgrade. An immutable, frequently-recycled node is both easier to patch and harder to persist on.

Sequencing the rollout

Do not try to land all of this at once on a live cluster — you will break workloads and erode trust in the programme. Sequence it. Begin in audit or warn mode for both Pod Security Standards and your admission policies (validationFailureAction: Audit in Kyverno) so you can see what would be blocked without blocking it. Triage the violations, fix the offending workloads, and only then flip to Enforce. Apply the same discipline to network policies: deploy default-deny in a non-production namespace first, confirm DNS and inter-service traffic still flow, and roll forward namespace by namespace. The goal is a cluster that is locked down by default and stays that way as new workloads arrive — enforcement is what makes hardening durable rather than a snapshot that decays the moment the next team deploys.

Defence in depth is a programme, not a one-off audit. If you want i2zone to benchmark, harden, and continuously enforce security across your Kubernetes estate, speak to our team.