# Writing a Kubernetes Operator for Multi-Tenancy
Table of Contents
Managing Kubernetes namespaces manually is a rite of passage for platform
engineers. You start with a few kubectl create ns team-a commands. Then you
add ResourceQuotas because one team accidentally consumed the entire cluster.
Then you’re juggling RoleBindings for different groups.
Eventually, you realize you’re just a human API for YAML generation. That’s when you write an Operator.
I recently built a Private Tenant Operator to solve exactly this problem. Bootstrapped with Kubebuilder, the goal was simple: define a “Tenant” in a high-level CRD, and let the controller handle the dirty work of provisioning namespaces, RBAC, quotas, and even Vault integration.
You can find a full walkthrough and demo of the operator in the deep dive documentation.
The Tenant Abstraction
Instead of asking teams to file a ticket for a namespace, we want them (or us)
to define a Tenant resource. This abstracts away the low-level Kubernetes
primitives.
Here’s what our API looks like:
type TenantSpec struct { // adminGroups is the list of group identifiers (e.g. "github:checkout-maintainers") // that will receive admin RBAC in the tenant namespace. AdminGroups []string `json:"adminGroups"`
// size determines the resource quota tier for this tenant. // Allowed values: small, medium, large. Size TenantSize `json:"size"`
// vaultPath is the path in Vault where this tenant's secrets are stored // (e.g. "secret/data/teams/checkout"). VaultPath string `json:"vaultPath"`}This simple struct drives everything. A small tenant gets 1 CPU / 4Gi RAM. A
large one gets 16 CPU / 64Gi. No more arguing about custom requests; you get
a T-shirt size.
The Reconciliation Loop
The heart of any operator is the Reconcile function. It observes the state of
the world and drives it towards the desired state.
For our tenant operator, the loop looks like this:
- Namespace: Ensure
tenant-<name>exists with the correct labels (managed-by: tenant-operator). - RBAC: Sync
RoleBindings. IfAdminGroupschanges, we add new bindings and remove stale ones. - Quotas: Apply
ResourceQuotaandLimitRangebased on theSizefield. - Secrets: Configure Vault access (more on this below).
- Status: Update the
Tenantstatus with conditions (e.g.,NamespaceReady,QuotasReady).
If any step fails, we error out and retry. If everything succeeds, we mark the tenant as Ready.
// From internal/controller/tenant_controller.gofunc (r *TenantReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { // ... fetch tenant ...
// Reconcile the namespace if err := r.reconcileNamespace(ctx, &tenant); err != nil { return ctrl.Result{}, err }
// Reconcile RBAC if err := r.reconcileRBAC(ctx, &tenant); err != nil { return ctrl.Result{}, err }
// ... reconcile quotas, vault, etc ...
return ctrl.Result{}, nil}Automating “Secret Zero” with Vault
The most interesting part of this operator is how it handles secrets. We use HashiCorp Vault and the External Secrets Operator (ESO).
Usually, setting up Vault access for a new namespace involves a dance of creating policies, roles, and service accounts. The operator automates this entire chain:
- Vault Policy: It creates a read-only policy in Vault scoped strictly to
spec.vaultPath. - Kubernetes Auth Role: It creates a Vault role binding that policy to a specific ServiceAccount in the new tenant namespace.
- SecretStore: Finally, it creates an ESO
SecretStorein the tenant namespace, pointing to that Vault role.
This setup eliminates the need for manual Vault configuration tickets. Tenants get a ready-to-use environment where they can safely sync secrets using the External Secrets Operator, with all the necessary least-privilege policies already baked in.
// We dynamically generate the policy based on the CRDhcl := fmt.Sprintf( "path %q { capabilities = [\"read\",\"list\"] }\npath %q { capabilities = [\"read\",\"list\"] }", vaultPath, vaultPath+"/*",)Is It Overkill?
Writing an operator might seem like overkill for “just creating namespaces.” But the value lies in standardization and governance.
- Consistency: Every tenant has the exact same baseline configuration.
- Safety: Quotas are enforced by default.
- Self-Service: You can wrap this CRD in a platform API or a GitOps workflow, allowing teams to provision their own environments without waiting on ops.
If you’re finding yourself repeating the same kubectl commands day after day,
it might be time to fire up kubebuilder init.