Built-In Locks for Kubernetes Workloads
July 31, 2023
I’ve been working on a project that interacts with an external web endpoint to query a list of available resources, select a free resource, and then mark it as reserved. The upstream system doesn’t have any type of built-in read locking, so this process is naturally prone to the following race condition:
- User A looks up a list of available resources
- While User A is still reviewing the list, User B also looks up a list of available resources.
- User A selects resource with ID 123
- User B also selects resource with ID 123
- A race condition ensues, and the last writer will “win” the resource (but the first user will be none the wiser)
This is a pretty classic problem, and it could be solved in a variety of ways if I’m interacting directly with a database that supports transactions (and most should). However, it’s a bit more challenging when the upstream, web-based system has no concept of such transactional locking.
In my case, the script I’m writing will only ever run from a single system. Therefore, the easiest way to solve this is with a simple, file based lock:
#!/bin/bash
LOCKFILE="/var/run/my-script.lock"
if [[ -f "$LOCKFILE" ]]; then
echo "Cannot run as lockfile exists at $LOCKFILE." >&2
else
touch "$LOCKFILE"
# Do some work
rm "$LOCKFILE"
fi
A file-based lock is absolutely fine for this use case: I have a script that runs infrequently, is very unlikely to be run by multiple callers at the same time, and (most importantly) runs from a single system. I’m a big fan of avoiding premature optimization, but it’s also good to think about the future, even if it’s just for the benefit of the thought experiment.
I could use a distributed locking mechanism, such as Apache Zookeeper or a database system with strong write guarantees. However, I already anticipated this workload would run in Kubernetes at some point, so I began to wonder if Kubernetes has a built-in primitive to handle this use case. And it turns out it does: leases!
From the documentation:
Distributed systems often have a need for leases, which provide a mechanism to lock shared resources and coordinate activity between members of a set. In Kubernetes, the lease concept is represented by Lease objects in the coordination.k8s.io API Group, which are used for system-critical capabilities such as node heartbeats and component-level leader election.
In this article, I’ll discuss how to define a Kubernetes lease and supporting role-based access control mechanisms to support a simple distributed lock.
Role-Based Access Control
First, I’ll start with security (this is probably the first time I’ve ever written that sentence). Role-based access control (RBAC) is the security approach in Kubernetes, and like many Kubernetes things, it takes some getting used to.
Workloads in Kubernetes can access the Kubernetes API using a service account token that is automatically mounted within a Pod
. This isn’t mandatory: if a workload has no reason to access the Kubernetes API, then it can be configured without a service account token. However, the default behavior is generally to mount a token within the Pod
at /var/run/secrets/kubernetes.io/serviceaccount/token
.
The basic flow for granting a workload permissions to the Kubernetes API consists of four steps:
- Define a
ServiceAccount
. This is just a friendly name for the service account assigned to a workload. - Define a
Role
. This defines the actual API permissions, such as permitted endpoints and the HTTP verbs that can be executed against them. - Define a
RoleBinding
. This ties theServiceAccount
to theRole
. - Assign the
ServiceAccount
to a workload, such as aPod
.
The result of this spaghetti elegant configuration is a token that workloads inside of a Pod
can use to access the Kubernetes API with whatever permissions are defined by the Role
.
ServiceAccount
, Role
, and RoleBinding
resources are namespaced. ClusterRole
and ClusterRoleBinding
are also available for non-namespaced resources that exist across the entire cluster. See the documentation for more information.
The creation order doesn’t matter for these resources, but I like to start with the ServiceAccount
:
apiVersion: v1
kind: ServiceAccount
metadata:
name: lease-demo
The Role
is a bit more complex. The workload must be able to create a new Lease
, as well as delete the Lease
it created. Unfortunately, the API endpoint to create a lease is /apis/coordination.k8s.io/v1/namespaces/{namespace}/leases
, and it doesn’t support restricting the resourceName
that can be created. This is kind of silly (I’m sure there’s a technical reason for it), and it means the permissions for Lease
creation must be overly permissive: they allow for the creation of any Lease
. However, the delete permissions can be restricted.
The following Role
allows for the creation of any Lease
, but it only allows a Lease
named demo-lock
to be deleted:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: lease-demo
rules:
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
resourceNames: ["demo-lock"]
verbs: ["delete"]
Finally, the RoleBinding
ties the lease-demo
ServiceAccount
to the lease-demo
Role
. I’ve named everything the same here, but the names don’t fundamentally matter as long as they are listed correctly in the roleRef
and the subjects
section of the RoleBinding
:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: lease-demo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: lease-demo
subjects:
- kind: ServiceAccount
name: lease-demo
I now have a lease-demo
ServiceAccount
with all of the permissions necessary to manage a simple lock. Next, it’s time to apply it to a workload.
The Workload
Now that the Kubernetes-specific configuration is in place to support Lease
management, we can build a workload that leverages Leases
for locking purposes. A simple web request using the service account token (from /var/run/secrets/kubernetes.io/serviceaccount/token
) is enough to create and delete a Lease
. My actual implementation uses a Python script, but a simple cURL
example is sufficient for demonstration purposes:
curl \
--fail \
-X POST \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
-H "Content-Type: application/json" \
--data '{"apiVersion":"coordination.k8s.io/v1","kind":"Lease","metadata":{"name":"demo-lock","namespace":"default"}}' \
https://kubernetes.default.svc.cluster.local/apis/coordination.k8s.io/v1/namespaces/default/leases
To break down this cURL
request for those who have never interacted directly with the Kubernetes API:
--fail
- Causescurl
to exit non-0 if it doesn’t receive a successful HTTP status code-X POST
- Send a POST request (this is a create operation)--cacert
- Verify the server’s certificate using the CA certificate that is automatically mounted in thePod
alongside the service account token-H Authorization...
- Pass an HTTP header called “Authorization” with a value ofBearer ${TOKEN}
, where${TOKEN}
is the actual service account token from/var/run/secrets/kubernetes.io/serviceaccount/token
Content-Type: application/json
- This is a JSON request--data
- The JSON data sent to the API. This contains the parameters to create theLease
, and is the JSON representation of the same YAML that can be used to create aLease
using thekubectl
command.https://kubernetes.default.svc.cluster.local/apis/coordination.k8s.io/v1/namespaces/default/leases
- TheLease
API endpoint
Deleting a Lease
is similar: just send a DELETE
request to the lease’s endpoint:
curl \
--fail
-X DELETE \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
-H "Content-Type: application/json" \
https://kubernetes.default.svc.cluster.local/apis/coordination.k8s.io/v1/namespaces/default/leases/demo-lock
Using these fundamental concepts, I built a reasonably robust script to simulate acquiring a Lease
, running a workload, and releasing the Lease
:
#!/bin/bash
# Install curl. Never actually do this in a production container.
if ! curl --version >/dev/null 2>&1
then
apt update >/dev/null 2>&1
apt install -y curl >/dev/null 2>&1
fi
delete_lease() {
# If we don't currently hold the Lease, then don't do anything.
# This check allows us to use this function in an exit trap also, since we only want to delete a lease if we currently hold it.
if ! $LEASE_HELD
then
return 0
fi
if curl \
--fail \
-X DELETE \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
-H "Content-Type: application/json" \
https://kubernetes.default.svc.cluster.local/apis/coordination.k8s.io/v1/namespaces/default/leases/demo-lock \
>/dev/null 2>&1
then
echo "Lock deleted"
return 0
else
echo "Unable delete lock"
return 1
fi
}
delete_lease_and_exit() {
delete_lease
exit $?
}
LEASE_HELD=false
TIMEOUT_SECONDS=120
TIME_WAITED=0
while [ "$TIME_WAITED" -lt "$TIMEOUT_SECONDS" ]
do
# Acquire a lease
if curl \
--fail \
-X POST \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
-H "Content-Type: application/json" \
--data '{"apiVersion":"coordination.k8s.io/v1","kind":"Lease","metadata":{"name":"demo-lock","namespace":"default"}}' \
https://kubernetes.default.svc.cluster.local/apis/coordination.k8s.io/v1/namespaces/default/leases \
>/dev/null 2>&1
then
# Catch exits and ensure the lock is cleaned up (e.g., if the script is killed) to reduce the chance of deadlocks.
trap delete_lease_and_exit EXIT
# Simulate doing some work for a random number of seconds between 1 and 15
LEASE_HELD=true
SLEEP_TIME=$(shuf -i 1-15 -n 1)
echo "Got lease, doing some work for $SLEEP_TIME seconds"
sleep $SLEEP_TIME
# Delete the lease, clear the exit trap (so it doesn't fire when we exit), and exit
delete_lease
EXIT_CODE=$?
trap - EXIT
exit $EXIT_CODE
else
# If we can't get a lease (someone else is holding the lock), then wait for a bit.
echo "Unable to get lease. Sleeping for 5 seconds then retrying"
sleep 5
TIME_WAITED=$(( TIME_WAITED + 5 ))
fi
done
echo "Unable to acquire lease after $TIMEOUT_SECONDS seconds."
exit 1
I put this entire script into a ConfigMap
for demonstration purposes. In practice, I would build this into the actual container image, but mounting a ConfigMap
is sufficient for this example:
$ kubectl create configmap --from-file=script.sh lease-script
I then define a simple Job
to execute the script. The Job
uses the ServiceAccount
with the permissions needed to create and delete the lease:
---
apiVersion: batch/v1
kind: Job
metadata:
name: lease-demo
spec:
template:
spec:
serviceAccountName: lease-demo
restartPolicy: OnFailure
containers:
- image: ubuntu:latest
name: script-pod
command:
- bash
- /script/script.sh
volumeMounts:
- name: lease-script
mountPath: "/script"
readOnly: true
volumes:
- name: lease-script
configMap:
name: lease-script
backoffLimit: 4
I can try to run two instances of this Job
simultaneously. However, the locking mechanism ensures that only one Job
can run at a time, while the other Job
simply waits for the Lease
to be available:
# Create a job called lease-demo, then create a duplicate job called lease-demo2
$ kubectl apply -f Job.yaml && sed 's/ name: lease-demo/ name: lease-demo2/' Job.yaml | kubectl apply -f -
# Check on the status of the lease-demo2 Pod. It looks like it won the lease!
$ kubectl logs job/lease-demo2
Got lease, doing some work for 6 seconds
Lock deleted
# The lease-demo Pod lost the race, so it needs to wait. Finally, it acquires the lease and does some work.
$ kubectl logs job/lease-demo
Unable to get lease. Sleeping for 5 seconds then retrying
Unable to get lease. Sleeping for 5 seconds then retrying
Got lease, doing some work for 10 seconds
Lock deleted
Wrapping Up
I’m generally very bearish about writing software that is locked to a platform, such as a cloud provider’s services or the Kubernetes API. However, in this case, the tradeoff likely makes sense: the use case is too simple to build and maintain a separate distributed locking mechanism, the impact of vendor lock-in is minimal since this is only a basic script, and the provided Kubernetes API meets all of my needs. It’s always nice to find primitives that are included with your chosen platform, and I was pleasantly surprised to discover that Kubernetes includes this helpful Lease
mechanism.