An isometric 3D visualization of a Kubernetes Operator automation loop

# Writing a Kubernetes Operator for Multi-Tenancy

Table of Contents

Managing Kubernetes namespaces manually is a rite of passage for platform engineers. You start with a few kubectl create ns team-a commands. Then you add ResourceQuotas because one team accidentally consumed the entire cluster. Then you’re juggling RoleBindings for different groups.

Eventually, you realize you’re just a human API for YAML generation. That’s when you write an Operator.

I recently built a Private Tenant Operator to solve exactly this problem. Bootstrapped with Kubebuilder, the goal was simple: define a “Tenant” in a high-level CRD, and let the controller handle the dirty work of provisioning namespaces, RBAC, quotas, and even Vault integration.

You can find a full walkthrough and demo of the operator in the deep dive documentation.

The Tenant Abstraction

Instead of asking teams to file a ticket for a namespace, we want them (or us) to define a Tenant resource. This abstracts away the low-level Kubernetes primitives.

Here’s what our API looks like:

type TenantSpec struct {
// adminGroups is the list of group identifiers (e.g. "github:checkout-maintainers")
// that will receive admin RBAC in the tenant namespace.
AdminGroups []string `json:"adminGroups"`
// size determines the resource quota tier for this tenant.
// Allowed values: small, medium, large.
Size TenantSize `json:"size"`
// vaultPath is the path in Vault where this tenant's secrets are stored
// (e.g. "secret/data/teams/checkout").
VaultPath string `json:"vaultPath"`
}

This simple struct drives everything. A small tenant gets 1 CPU / 4Gi RAM. A large one gets 16 CPU / 64Gi. No more arguing about custom requests; you get a T-shirt size.

The Reconciliation Loop

The heart of any operator is the Reconcile function. It observes the state of the world and drives it towards the desired state.

For our tenant operator, the loop looks like this:

  1. Namespace: Ensure tenant-<name> exists with the correct labels (managed-by: tenant-operator).
  2. RBAC: Sync RoleBindings. If AdminGroups changes, we add new bindings and remove stale ones.
  3. Quotas: Apply ResourceQuota and LimitRange based on the Size field.
  4. Secrets: Configure Vault access (more on this below).
  5. Status: Update the Tenant status with conditions (e.g., NamespaceReady, QuotasReady).

If any step fails, we error out and retry. If everything succeeds, we mark the tenant as Ready.

// From internal/controller/tenant_controller.go
func (r *TenantReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... fetch tenant ...
// Reconcile the namespace
if err := r.reconcileNamespace(ctx, &tenant); err != nil {
return ctrl.Result{}, err
}
// Reconcile RBAC
if err := r.reconcileRBAC(ctx, &tenant); err != nil {
return ctrl.Result{}, err
}
// ... reconcile quotas, vault, etc ...
return ctrl.Result{}, nil
}

Automating “Secret Zero” with Vault

The most interesting part of this operator is how it handles secrets. We use HashiCorp Vault and the External Secrets Operator (ESO).

Usually, setting up Vault access for a new namespace involves a dance of creating policies, roles, and service accounts. The operator automates this entire chain:

  1. Vault Policy: It creates a read-only policy in Vault scoped strictly to spec.vaultPath.
  2. Kubernetes Auth Role: It creates a Vault role binding that policy to a specific ServiceAccount in the new tenant namespace.
  3. SecretStore: Finally, it creates an ESO SecretStore in the tenant namespace, pointing to that Vault role.

This setup eliminates the need for manual Vault configuration tickets. Tenants get a ready-to-use environment where they can safely sync secrets using the External Secrets Operator, with all the necessary least-privilege policies already baked in.

// We dynamically generate the policy based on the CRD
hcl := fmt.Sprintf(
"path %q { capabilities = [\"read\",\"list\"] }\npath %q { capabilities = [\"read\",\"list\"] }",
vaultPath, vaultPath+"/*",
)

Is It Overkill?

Writing an operator might seem like overkill for “just creating namespaces.” But the value lies in standardization and governance.

  • Consistency: Every tenant has the exact same baseline configuration.
  • Safety: Quotas are enforced by default.
  • Self-Service: You can wrap this CRD in a platform API or a GitOps workflow, allowing teams to provision their own environments without waiting on ops.

If you’re finding yourself repeating the same kubectl commands day after day, it might be time to fire up kubebuilder init.

JP Fontenele avatar

Thanks for reading! Feel free to check out my other posts or reach out via GitHub.


More Posts