Best Practices Series

Building Production
Kubernetes Operators

A deep dive into Kubebuilder patterns — from finalizers to idempotency, conflict resolution, webhooks, and everything you need before shipping an operator to production.

10
Patterns
v3+
Kubebuilder
Go
Language
9
Step Template
Scroll to explore ↓
Pattern 01

Finalizer Pattern

A finalizer is a list of strings on metadata.finalizers. Kubernetes has one single rule: if that list is not empty, it will never fully delete the object. Instead it sets deletionTimestamp and waits forever until you clear the list yourself.

Key Concept
When to Use Finalizers

Only when your controller creates resources outside of Kubernetes — database users, cloud buckets, DNS records, external API entries. For k8s-only child resources, use ownerReferences instead and let garbage collection handle it.

Finalizer Deletion Flow
kubectl delete postgresuser alice
         |
         v
k8s checks metadata.finalizers
         |
   +-----+-----+
   |             |
EMPTY        NOT EMPTY
   |             |
delete now   set deletionTimestamp
             object frozen, waiting
                   |
                   v
        your controller Reconcile() called
          sees deletionTimestamp is set
          runs handleDeletion()
          DROP USER alice in postgres
          removes finalizer string
                   |
                   v
          finalizers = []  ->  k8s deletes

The key insight: deletionTimestamp is not a timer. Nothing happens automatically after it. It is simply a record of when someone requested deletion. The object can stay frozen for hours, days, forever — until your controller removes the finalizer.

Creates Outside k8sPattern
Database users, schemas, tablesUse finalizer
Cloud resources (S3, RDS, DNS)Use finalizer
External APIs (Stripe, Auth0, Vault)Use finalizer
k8s Deployments, Services, ConfigMapsUse ownerReferences
Read-only / observer controllersNothing needed

Stuck objects: If your controller crashes and never removes the finalizer, the object is frozen forever. You can unstick it manually with kubectl patch postgresuser alice -p '{"metadata":{"finalizers":[]}}' --type=merge


Pattern 02

Idempotency

The reconcile loop runs on any change to any watched object — not just on creation. It can also be retried automatically on failures, or triggered by unrelated annotation changes. It is never called exactly once. Every action inside Reconcile must be safe to repeat without side effects.

Dangerous
Blindly Creates
// will ERROR on every run
// after the first
db.Exec(`CREATE USER alice
  WITH PASSWORD '123'`)
Safe
Check First, Then Act
// same result on every run
db.Exec(`DO $ BEGIN
  IF NOT EXISTS (
    SELECT FROM pg_roles
    WHERE rolname = 'alice')
  THEN CREATE USER alice
    WITH PASSWORD '123';
  END IF;
END $;`)

Rule of thumb: Every external call should follow the pattern "check if it exists → create/update only if needed". Think of your entire Reconcile function as a kubectl apply for the real world.


Pattern 03

ctrl.Result & Requeue

Every reconcile function returns a ctrl.Result and an error. Together they control exactly when the controller will call your function again.

go
// 1. Done — don't call me again unless something changes
return ctrl.Result{}, nil

// 2. Requeue immediately — run again right now
return ctrl.Result{Requeue: true}, nil

// 3. Requeue after a delay — useful for polling external systems
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil

// 4. Error — controller retries with exponential backoff automatically
return ctrl.Result{}, fmt.Errorf("connection failed: %v", err)
SituationWhat to Return
Everything done, nothing more to doResult{}, nil
Just added finalizer, need to continueResult{Requeue: true}, nil
External system not ready yet, poll againResult{RequeueAfter: 10s}, nil
Transient error (network, conflict)Result{}, err → exponential backoff
Permanent error (bad user spec)Update status to Failed, return Result{}, nil

Permanent vs transient errors: If the user wrote an invalid username, retrying every 5 seconds forever is wasteful. Detect permanent errors, set status.phase = "Failed" with a clear message, and return nil so the controller stops retrying.


Pattern 04

Watches

By default your controller only watches its own CRD. But your CRD depends on other resources — like the Secret that holds the postgres password. If someone updates that Secret, your controller never knows about it.

go
func (r *PostgresUserReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&dbv1alpha1.PostgresUser{}).     // primary: watch our CRD
        Watches(                              // secondary: also watch Secrets
            &corev1.Secret{},
            handler.EnqueueRequestsFromMapFunc(r.findPostgresUsersForSecret),
        ).
        Complete(r)
}

// map Secret -> find which PostgresUsers reference it -> enqueue them
func (r *PostgresUserReconciler) findPostgresUsersForSecret(
    ctx context.Context, secret client.Object,
) []reconcile.Request {
    pgUserList := &dbv1alpha1.PostgresUserList{}
    r.List(ctx, pgUserList, client.InNamespace(secret.GetNamespace()))

    requests := []reconcile.Request{}
    for _, user := range pgUserList.Items {
        // only enqueue users that reference this specific secret
        if user.Spec.PasswordSecretRef.Name == secret.GetName() {
            requests = append(requests, reconcile.Request{
                NamespacedName: types.NamespacedName{
                    Name:      user.Name,
                    Namespace: user.Namespace,
                },
            })
        }
    }
    return requests
}
Key Concept
One Controller, One Kind

A single controller should only reconcile one Kind. The official Kubebuilder docs strongly recommend against having one controller manage multiple CRDs — it hurts the Single Responsibility Principle and makes debugging much harder. Use watches to react to other resources, not to own them.


Pattern 05

Status Conditions

A single Phase string works for simple cases, but the Kubernetes ecosystem has a richer standard: a list of typed conditions. Each condition has a type, a boolean status, a reason code, and a human message. Tools like ArgoCD, Flux, and kubectl wait understand this pattern natively.

go
type PostgresUserStatus struct {
    // +listType=map
    // +listMapKey=type
    Conditions []metav1.Condition `json:"conditions,omitempty"`

    // keep phase too for simple kubectl get column
    Phase string `json:"phase,omitempty"`
}

// in your controller:
meta.SetStatusCondition(&pgUser.Status.Conditions, metav1.Condition{
    Type:               "Ready",
    Status:             metav1.ConditionTrue,
    Reason:             "UserCreated",
    Message:            "Postgres user created successfully",
    LastTransitionTime: metav1.Now(),
})

meta.SetStatusCondition(&pgUser.Status.Conditions, metav1.Condition{
    Type:               "DatabaseConnected",
    Status:             metav1.ConditionFalse,
    Reason:             "ConnectionFailed",
    Message:            "Cannot reach postgres at host:5432",
    LastTransitionTime: metav1.Now(),
})

Multi-dimensional health: With conditions, kubectl get postgresusers can show multiple independent health dimensions at once, and you can use kubectl wait --for=condition=Ready postgresuser/alice in CI pipelines.


Pattern 06

Conflict Errors

Kubernetes uses resource versions to detect concurrent writes. If you fetched the object at time T and something else updated it before you call r.Update(), you get a 409 Conflict error.

Risky
Stale Object
// fetched 5 seconds ago
// object may have changed
pgUser.Status.Phase = "Ready"
r.Status().Update(ctx, pgUser)
// possible 409 Conflict
Safe
Re-fetch First
// re-fetch latest version first
r.Get(ctx, req.NamespacedName, pgUser)
pgUser.Status.Phase = "Ready"
r.Status().Update(ctx, pgUser)
// works even if object changed

Retry on Conflict Pattern

go
// for critical updates — retry automatically on conflict
retry.RetryOnConflict(retry.DefaultRetry, func() error {
    // always re-fetch inside the retry loop
    if err := r.Get(ctx, req.NamespacedName, pgUser); err != nil {
        return err
    }
    pgUser.Status.Phase = "Ready"
    pgUser.Status.Created = true
    return r.Status().Update(ctx, pgUser)
})

Pattern 07

Never Modify Spec

Kubernetes has a strict ownership model: spec belongs to the user, status belongs to the controller. Your controller must never write to spec. Doing so would silently overwrite the user's intent and cause infinite reconcile loops.

go
// NEVER — you are overwriting the user's declared intent
pgUser.Spec.Username = "something_else"
r.Update(ctx, pgUser)

// Controllers write ONLY to status
pgUser.Status.Phase = "Ready"
r.Status().Update(ctx, pgUser)   // separate API call — status subresource

// Controllers write ONLY to metadata (finalizers)
controllerutil.AddFinalizer(pgUser, myFinalizer)
r.Update(ctx, pgUser)           // r.Update() for metadata, not r.Status().Update()
Key Concept
Two Different Update Calls

r.Update(ctx, obj) — updates spec + metadata (for finalizers). Does NOT update status.

r.Status().Update(ctx, obj) — updates ONLY the status block. Does NOT update spec or metadata.

They are intentionally separated so that user writes and controller writes never collide.


Pattern 08

Validation Webhooks

Instead of letting bad specs reach your controller and failing with a cryptic status error, webhooks let you reject them at kubectl apply time with a clear, instant error message. The user sees it before anything is persisted.

bash
kubebuilder create webhook \
  --group db --version v1alpha1 --kind PostgresUser \
  --programmatic-validation
go
// ValidateCreate — runs on kubectl apply (new objects)
func (r *PostgresUser) ValidateCreate() (admission.Warnings, error) {
    if strings.Contains(r.Spec.Username, "-") {
        return nil, fmt.Errorf("username cannot contain hyphens: postgres rejects it")
    }
    if len(r.Spec.Username) > 63 {
        return nil, fmt.Errorf("username too long: postgres limit is 63 chars")
    }
    return nil, nil
}

// ValidateUpdate — runs on kubectl apply (existing objects)
func (r *PostgresUser) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
    oldUser := old.(*PostgresUser)

    // username is immutable — changing it would need DROP + CREATE
    // better to force the user to delete and recreate explicitly
    if r.Spec.Username != oldUser.Spec.Username {
        return nil, fmt.Errorf(
            "username is immutable after creation (old: %s, new: %s)",
            oldUser.Spec.Username, r.Spec.Username,
        )
    }
    return nil, nil
}

// ValidateDelete — runs on kubectl delete (optional)
func (r *PostgresUser) ValidateDelete() (admission.Warnings, error) {
    return nil, nil
}

Defaulting webhooks are also available (--defaulting flag). Use them to fill in sensible defaults on creation so users don't have to specify every optional field.


Pattern 09

RBAC Markers

The // +kubebuilder:rbac comment markers are not documentation — they generate actual Kubernetes RBAC ClusterRole rules when you run make manifests. If you touch a resource without a marker, your controller pod will get a silent 403 Forbidden error.

go
// your CRD — full access
// +kubebuilder:rbac:groups=db.example.com,resources=postgresusers,verbs=get;list;watch;create;update;patch;delete

// status subresource — separate permission required
// +kubebuilder:rbac:groups=db.example.com,resources=postgresusers/status,verbs=get;update;patch

// finalizers — needs update permission on the main resource's finalizers
// +kubebuilder:rbac:groups=db.example.com,resources=postgresusers/finalizers,verbs=update

// Secrets — read-only (we only read the password, never write)
// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch

// Events — optional but lets you emit events visible in kubectl describe
// +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch

Don't forget to regenerate: After changing any // +kubebuilder:rbac marker, always run make manifests to regenerate the YAML, then kubectl apply -f config/rbac/ to apply it. Forgetting this step is a very common silent failure.


Pattern 10

Full Reconcile Template

This is the canonical structure to use as your starting point for any controller that manages external resources. Every step is numbered and annotated.

go
func (r *PostgresUserReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // 1. FETCH
    // req only has name + namespace. r.Get fills our empty struct
    // with all the data from the k8s API server.
    pgUser := &dbv1alpha1.PostgresUser{}
    if err := r.Get(ctx, req.NamespacedName, pgUser); err != nil {
        if errors.IsNotFound(err) {
            return ctrl.Result{}, nil // deleted before we ran
        }
        return ctrl.Result{}, err
    }

    // 2. DELETION CHECK
    // DeletionTimestamp is set by k8s when kubectl delete is called
    // and a finalizer is present. If it's set, we must clean up.
    if !pgUser.DeletionTimestamp.IsZero() {
        return r.handleDeletion(ctx, pgUser)
    }

    // 3. ADD FINALIZER
    // Done BEFORE any real work so that if we crash mid-way,
    // deletion will still go through our cleanup code.
    if !controllerutil.ContainsFinalizer(pgUser, postgresUserFinalizer) {
        log.Info("Adding finalizer", "name", pgUser.Name)
        controllerutil.AddFinalizer(pgUser, postgresUserFinalizer)
        if err := r.Update(ctx, pgUser); err != nil {
            return ctrl.Result{}, err
        }
        return ctrl.Result{Requeue: true}, nil
    }

    // 4. EARLY EXIT IF ALREADY DONE
    // Reconcile runs on ANY change (labels, annotations, anything).
    // If the postgres user already exists, skip all work.
    if pgUser.Status.Phase == "Ready" && pgUser.Status.Created {
        return ctrl.Result{}, nil
    }

    // 5. SET IN-PROGRESS STATUS
    pgUser.Status.Phase = "Creating"
    pgUser.Status.Message = "Creating postgres user..."
    r.Status().Update(ctx, pgUser)

    // 6. FETCH DEPENDENCIES
    // Get the password from the referenced k8s Secret
    password, err := r.getPasswordFromSecret(ctx, pgUser)
    if err != nil {
        return r.setFailedStatus(ctx, pgUser, "cannot read password secret: "+err.Error())
    }

    // 7. CONNECT TO EXTERNAL SYSTEM
    db, err := r.connectToPostgres()
    if err != nil {
        // transient error — return err so controller retries with backoff
        return ctrl.Result{}, fmt.Errorf("cannot connect to postgres: %v", err)
    }
    defer db.Close()

    // 8. IDEMPOTENT REAL-WORLD ACTION
    // IF NOT EXISTS makes this safe to call on every reconcile run
    if err := r.createPostgresUser(db, pgUser.Spec.Username, password,
        pgUser.Spec.Database, pgUser.Spec.Privileges); err != nil {
        return r.setFailedStatus(ctx, pgUser, "failed to create postgres user: "+err.Error())
    }

    // 9. UPDATE STATUS TO READY
    // r.Status().Update() only touches the status block
    now := metav1.Now()
    pgUser.Status.Created = true
    pgUser.Status.Phase = "Ready"
    pgUser.Status.Message = "User created successfully"
    pgUser.Status.ProvisionedUsername = pgUser.Spec.Username
    pgUser.Status.ProvisionedDatabase = pgUser.Spec.Database
    pgUser.Status.CreatedAt = &now
    pgUser.Status.LastUpdated = &now
    if err := r.Status().Update(ctx, pgUser); err != nil {
        return ctrl.Result{}, err
    }

    log.Info("PostgresUser reconciled successfully", "username", pgUser.Spec.Username)
    return ctrl.Result{}, nil
}

Handle Deletion

go
// Called when deletionTimestamp is set. Cleans up postgres then
// removes the finalizer to unblock k8s deletion.
func (r *PostgresUserReconciler) handleDeletion(ctx context.Context, pgUser *dbv1alpha1.PostgresUser) (ctrl.Result, error) {
    if controllerutil.ContainsFinalizer(pgUser, postgresUserFinalizer) {

        pgUser.Status.Phase = "Deleting"
        r.Status().Update(ctx, pgUser)

        db, err := r.connectToPostgres()
        if err != nil {
            // can't connect — don't remove finalizer — controller retries
            return ctrl.Result{}, fmt.Errorf("cannot connect for cleanup: %v", err)
        }
        defer db.Close()

        // must revoke before drop
        db.Exec(fmt.Sprintf("REVOKE ALL PRIVILEGES ON DATABASE %s FROM %s",
            pgUser.Spec.Database, pgUser.Spec.Username))

        if _, err := db.Exec(fmt.Sprintf("DROP USER IF EXISTS %s",
            pgUser.Spec.Username)); err != nil {
            return ctrl.Result{}, fmt.Errorf("failed to drop user: %v", err)
        }

        // cleanup done — remove finalizer — k8s will now complete deletion
        controllerutil.RemoveFinalizer(pgUser, postgresUserFinalizer)
        if err := r.Update(ctx, pgUser); err != nil {
            return ctrl.Result{}, err
        }
    }
    return ctrl.Result{}, nil
}

Helper: Set Failed Status

go
// returning an error triggers automatic retry with exponential backoff
func (r *PostgresUserReconciler) setFailedStatus(ctx context.Context, pgUser *dbv1alpha1.PostgresUser, msg string) (ctrl.Result, error) {
    pgUser.Status.Phase = "Failed"
    pgUser.Status.Message = msg
    pgUser.Status.LastUpdated = &metav1.Time{Time: metav1.Now().Time}
    r.Status().Update(ctx, pgUser)
    return ctrl.Result{}, fmt.Errorf(msg)
}
Checklist
The 9-Step Mental Model for Any Reconcile

1. Fetch the object — handle NotFound gracefully

2. Check DeletionTimestamp — delegate to handleDeletion if set

3. Add finalizer if missing — requeue immediately after

4. Early exit if already in desired state

5. Set in-progress status

6. Fetch all dependencies (secrets, configmaps)

7. Connect to external system

8. Perform idempotent real-world action

9. Update status to reflect observed outcome