Joe Thompson

In IT since my first job helping out with computers in my high school in 1994

Past employers: Mesosphere, Capital One, CoreOS, Red Hat, among others

Exposed to Kubernetes in early 2015 and working with it full-time since late 2015

Currently a Solutions Engineer for (we're hiring!)

Pronouns: he/him
Blood type: Caffeine-positive

Contact info:

What do I mean by "Avoid Spikes"

We're going Back... to the Future '80s!

In the arcade game Tempest, you have to fight aliens, then...

...while warping to the next level, avoid the spikes some of the aliens leave behind

IT Engineering is kind of like that:

First, you blast aliens set things up

...then you deploy to prod and see if you actually did it right

Kubernetes is also kind of like that

(Kubernetes is not the only thing like that of course)

Kubernetes project spikes

The Kubernetes safe zone

We're going Back... to 2015!

What makes Kubernetes happy?

Stateless, 12-factor-style apps

One container to a pod

No special privileges needed

Compute plane made up of identical nodes

On a flat network

Outside that home turf... spikes

Kubernetes moves fast...

Image credit: Photo of Car on Expressway

...actually more like this:

Image credit: User WikiImages on Pixabay

Kubernetes releases three times a year (lately)

Teams that move fast love features that release fast

Rolling your own sounds like a great way to get that spiffy new feature you need...

...but the release treadmill can take away as fast as it gives

Case in point: PodSecurityPolicy

Beta release July 2016 in Kubernetes 1.3
Stayed in beta for 18 releases
Deprecated in April with Kubernetes 1.21
Will be removed entirely in Kubernetes 1.25 (~August 2022)
All support for versions that provide PSP will cease by ~June 2023

What the release treadmill gives, it gives quickly:

Beta features are enabled by default
Once a beta behavior change proceeds to GA, the clock is ticking on the support for the old behavior

How Kubernetes reaches outside its safe zone

New use cases tend to be handled by providing facilities for plugins or layers

(this is good)

Plugins and layers may not have the same quality or freshness as the core code

Plugins and layers may implement their own extensions that can make migrating to an alternative difficult if you come to depend on them

(these sound kinda bad, but really it's just life with extensible systems)

Kubernetes cluster ops spikes

Kubernetes abstracts the details of compute from the deployment of workloads

However...

"All non-trivial abstractions, to some degree, are leaky."

-- Spolsky's Law of Leaky Abstractions

Abstracted compute still has underlying non-abstract limitations

Some common ones:

Open files
ARP cache entries
Limits of underlying services like etcd

You may have to change the underlying compute node, component or service configuration, or change how you deploy your apps, to deal with these

A concrete example: kube-dns ndots

search default.svc.cluster.local svc.cluster.local cluster.local us-east-2.compute.internal
nameserver 10.43.0.10
options ndots:5

Default setting: 5

Non-FQDNs with fewer than 5 dots will be looked up as relative names in the search domain list first

Problem: Short names that could be treated as FQDNs trigger a bunch of relative lookups instead because they don't have enough path dots

Solutions:

Change the ndots setting in kube-dns
Strictly use FQDN trailing-dot notation

(thanks to Adam Kozłowski whose GrapeUp blog post I mined this and a couple of the preceding examples from)

Oh where, oh where have my Kube pod logs gone?

Kubernetes logs lots of info about your deployments

...but not all in one place

Pod scheduling event info is in kubectl describe [podname] and kubectl get events

Pod output is in kubectl logs [podname]

Sort of... each container's logs in a multi-container pod are separate too

Some cluster ops mistakes can really cost you... literally

"...autoscaling is essentially connected to your credit card"

-- Peter Nickolov, Opsani

Kubernetes application deployment spikes

Kubernetes sees the world in terms of resource manifests

...even when the world is more than the manifests describing it

Kubernetes only knows a workload has changed if its manifest changes

Not the attributes of a controlling resource like a Deployment...

...the actual pod spec of the workload pods themselves

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: cpustress-new
  name: cpustress
  annotations:
    test-data: "my test data"
spec:
  selector:
    matchLabels:
      run: cpustress
  template:
    metadata:
      annotations:
        test-data: "my test data"
      labels:
        run: cpustress
    spec:
      containers:
      - image: busybox
        name: busybox
        resources: {}
        args:
        - "sleep"
        - "3600"

Things don't happen in a particular order unless you make them

Run ordering is not enforced just because things were declared in a given order*

YOU can enforce ordering

by handling ordered deployment from outside the cluster

by making workloads aware of each other and defer startup until prerequisites are met

* except for init containers and StatefulSet pods

Breaking up with root is hard to do

Every security pro: "Run your containers as a non-root user and explicitly grant them any capabilities they need"

Kubernetes: "These are not the capabilities you're looking for"

Workarounds:

Use a wrapper entrypoint script in a root container to run the workload as non-root and set its capabilities

Set the capabilities on the workload binary at container build time

Things aren't special unless you make them special

Sidecar containers are just containers to Kubernetes

If a workload fails but the sidecar stays running, odd behavior can result

Workarounds:

Use health probes
Create a way for a sidecar to stop itself if the main workload fails
Don't use sidecars on workloads prone to this issue

Rolling updates aren't free

Rolling updates require extra resources to get rolling

Rolling update overhead for large workloads can be significant

Tweak maxSurge to control the overhead

Be aware that lowering maxSurge can cause rollouts to lengthen

Statefulness has side-effects

StatefulSet pods leave their volumes behind on termination (on purpose)

If you're sure you don't need the volume any more, clean it up manually

Kubernetes network spikes

NetworkPolicy is spiky and you need to plan for it

Default deny is a spike on purpose: once a pod is subject to a deployed NetworkPolicy, it is "isolated" and will only accept traffic allowed by some policy

Pods not selected by any policy are not isolated even if policies exist that isolate other pods

Other parts are spiky because of Kubernetes' own spikiness: NetworkPolicy implementation is part of the network layer provider, not Kubernetes itself

Network debugging is like peeling an onion

The first layer is the node network, on interfaces like eth0 and ens3

The second layer is the container or pod network, on interfaces like cni0 and veth*

The third layer is the service network, on interfaces... wait, what?

Kubernetes' service network is a fictitious network that does not have interfaces of its own

Created entirely through kernel packet rules (iptables/IPVS)

Final thoughts

Assume Kubernetes is changing in ways that outdate your knowledge constantly -- you will rarely be wrong about this

Story time!

ReplicationControllers: we hardly knew ye
Endpoints: A scaling tale

Staying up to date: The Kubernetes community has your back

Preventing the unexpected: a test environment is not optional

A test environment is not optional

A test environment is NOT optional

Tell your family, tell your friends, but most importantly tell your boss: A test environment is NOT OPTIONAL

Thank you!

Slides: https://bit.ly/3CWu9r4+

Further info

Info and further reading

Errata

(none currently)