A deep dive into Kubebuilder patterns — from finalizers to idempotency, conflict resolution, webhooks, and everything you need before shipping an operator to production.
A finalizer is a list of strings on metadata.finalizers. Kubernetes has one single rule: if that list is not empty, it will never fully delete the object. Instead it sets deletionTimestamp and waits forever until you clear the list yourself.
Only when your controller creates resources outside of Kubernetes — database users, cloud buckets, DNS records, external API entries. For k8s-only child resources, use ownerReferences instead and let garbage collection handle it.
kubectl delete postgresuser alice | v k8s checks metadata.finalizers | +-----+-----+ | | EMPTY NOT EMPTY | | delete now set deletionTimestamp object frozen, waiting | v your controller Reconcile() called sees deletionTimestamp is set runs handleDeletion() DROP USER alice in postgres removes finalizer string | v finalizers = [] -> k8s deletes
The key insight: deletionTimestamp is not a timer. Nothing happens automatically after it. It is simply a record of when someone requested deletion. The object can stay frozen for hours, days, forever — until your controller removes the finalizer.
| Creates Outside k8s | Pattern |
|---|---|
| Database users, schemas, tables | Use finalizer |
| Cloud resources (S3, RDS, DNS) | Use finalizer |
| External APIs (Stripe, Auth0, Vault) | Use finalizer |
| k8s Deployments, Services, ConfigMaps | Use ownerReferences |
| Read-only / observer controllers | Nothing needed |
Stuck objects: If your controller crashes and never removes the finalizer, the object is frozen forever. You can unstick it manually with kubectl patch postgresuser alice -p '{"metadata":{"finalizers":[]}}' --type=merge
The reconcile loop runs on any change to any watched object — not just on creation. It can also be retried automatically on failures, or triggered by unrelated annotation changes. It is never called exactly once. Every action inside Reconcile must be safe to repeat without side effects.
// will ERROR on every run // after the first db.Exec(`CREATE USER alice WITH PASSWORD '123'`)
// same result on every run db.Exec(`DO $ BEGIN IF NOT EXISTS ( SELECT FROM pg_roles WHERE rolname = 'alice') THEN CREATE USER alice WITH PASSWORD '123'; END IF; END $;`)
Rule of thumb: Every external call should follow the pattern "check if it exists → create/update only if needed". Think of your entire Reconcile function as a kubectl apply for the real world.
Every reconcile function returns a ctrl.Result and an error. Together they control exactly when the controller will call your function again.
go // 1. Done — don't call me again unless something changes return ctrl.Result{}, nil // 2. Requeue immediately — run again right now return ctrl.Result{Requeue: true}, nil // 3. Requeue after a delay — useful for polling external systems return ctrl.Result{RequeueAfter: 30 * time.Second}, nil // 4. Error — controller retries with exponential backoff automatically return ctrl.Result{}, fmt.Errorf("connection failed: %v", err)
| Situation | What to Return |
|---|---|
| Everything done, nothing more to do | Result{}, nil |
| Just added finalizer, need to continue | Result{Requeue: true}, nil |
| External system not ready yet, poll again | Result{RequeueAfter: 10s}, nil |
| Transient error (network, conflict) | Result{}, err → exponential backoff |
| Permanent error (bad user spec) | Update status to Failed, return Result{}, nil |
Permanent vs transient errors: If the user wrote an invalid username, retrying every 5 seconds forever is wasteful. Detect permanent errors, set status.phase = "Failed" with a clear message, and return nil so the controller stops retrying.
By default your controller only watches its own CRD. But your CRD depends on other resources — like the Secret that holds the postgres password. If someone updates that Secret, your controller never knows about it.
go func (r *PostgresUserReconciler) SetupWithManager(mgr ctrl.Manager) error { return ctrl.NewControllerManagedBy(mgr). For(&dbv1alpha1.PostgresUser{}). // primary: watch our CRD Watches( // secondary: also watch Secrets &corev1.Secret{}, handler.EnqueueRequestsFromMapFunc(r.findPostgresUsersForSecret), ). Complete(r) } // map Secret -> find which PostgresUsers reference it -> enqueue them func (r *PostgresUserReconciler) findPostgresUsersForSecret( ctx context.Context, secret client.Object, ) []reconcile.Request { pgUserList := &dbv1alpha1.PostgresUserList{} r.List(ctx, pgUserList, client.InNamespace(secret.GetNamespace())) requests := []reconcile.Request{} for _, user := range pgUserList.Items { // only enqueue users that reference this specific secret if user.Spec.PasswordSecretRef.Name == secret.GetName() { requests = append(requests, reconcile.Request{ NamespacedName: types.NamespacedName{ Name: user.Name, Namespace: user.Namespace, }, }) } } return requests }
A single controller should only reconcile one Kind. The official Kubebuilder docs strongly recommend against having one controller manage multiple CRDs — it hurts the Single Responsibility Principle and makes debugging much harder. Use watches to react to other resources, not to own them.
A single Phase string works for simple cases, but the Kubernetes ecosystem has a richer standard: a list of typed conditions. Each condition has a type, a boolean status, a reason code, and a human message. Tools like ArgoCD, Flux, and kubectl wait understand this pattern natively.
go type PostgresUserStatus struct { // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` // keep phase too for simple kubectl get column Phase string `json:"phase,omitempty"` } // in your controller: meta.SetStatusCondition(&pgUser.Status.Conditions, metav1.Condition{ Type: "Ready", Status: metav1.ConditionTrue, Reason: "UserCreated", Message: "Postgres user created successfully", LastTransitionTime: metav1.Now(), }) meta.SetStatusCondition(&pgUser.Status.Conditions, metav1.Condition{ Type: "DatabaseConnected", Status: metav1.ConditionFalse, Reason: "ConnectionFailed", Message: "Cannot reach postgres at host:5432", LastTransitionTime: metav1.Now(), })
Multi-dimensional health: With conditions, kubectl get postgresusers can show multiple independent health dimensions at once, and you can use kubectl wait --for=condition=Ready postgresuser/alice in CI pipelines.
Kubernetes uses resource versions to detect concurrent writes. If you fetched the object at time T and something else updated it before you call r.Update(), you get a 409 Conflict error.
// fetched 5 seconds ago // object may have changed pgUser.Status.Phase = "Ready" r.Status().Update(ctx, pgUser) // possible 409 Conflict
// re-fetch latest version first r.Get(ctx, req.NamespacedName, pgUser) pgUser.Status.Phase = "Ready" r.Status().Update(ctx, pgUser) // works even if object changed
go // for critical updates — retry automatically on conflict retry.RetryOnConflict(retry.DefaultRetry, func() error { // always re-fetch inside the retry loop if err := r.Get(ctx, req.NamespacedName, pgUser); err != nil { return err } pgUser.Status.Phase = "Ready" pgUser.Status.Created = true return r.Status().Update(ctx, pgUser) })
Kubernetes has a strict ownership model: spec belongs to the user, status belongs to the controller. Your controller must never write to spec. Doing so would silently overwrite the user's intent and cause infinite reconcile loops.
go // NEVER — you are overwriting the user's declared intent pgUser.Spec.Username = "something_else" r.Update(ctx, pgUser) // Controllers write ONLY to status pgUser.Status.Phase = "Ready" r.Status().Update(ctx, pgUser) // separate API call — status subresource // Controllers write ONLY to metadata (finalizers) controllerutil.AddFinalizer(pgUser, myFinalizer) r.Update(ctx, pgUser) // r.Update() for metadata, not r.Status().Update()
r.Update(ctx, obj) — updates spec + metadata (for finalizers). Does NOT update status.
r.Status().Update(ctx, obj) — updates ONLY the status block. Does NOT update spec or metadata.
They are intentionally separated so that user writes and controller writes never collide.
Instead of letting bad specs reach your controller and failing with a cryptic status error, webhooks let you reject them at kubectl apply time with a clear, instant error message. The user sees it before anything is persisted.
bash
kubebuilder create webhook \
--group db --version v1alpha1 --kind PostgresUser \
--programmatic-validation
go // ValidateCreate — runs on kubectl apply (new objects) func (r *PostgresUser) ValidateCreate() (admission.Warnings, error) { if strings.Contains(r.Spec.Username, "-") { return nil, fmt.Errorf("username cannot contain hyphens: postgres rejects it") } if len(r.Spec.Username) > 63 { return nil, fmt.Errorf("username too long: postgres limit is 63 chars") } return nil, nil } // ValidateUpdate — runs on kubectl apply (existing objects) func (r *PostgresUser) ValidateUpdate(old runtime.Object) (admission.Warnings, error) { oldUser := old.(*PostgresUser) // username is immutable — changing it would need DROP + CREATE // better to force the user to delete and recreate explicitly if r.Spec.Username != oldUser.Spec.Username { return nil, fmt.Errorf( "username is immutable after creation (old: %s, new: %s)", oldUser.Spec.Username, r.Spec.Username, ) } return nil, nil } // ValidateDelete — runs on kubectl delete (optional) func (r *PostgresUser) ValidateDelete() (admission.Warnings, error) { return nil, nil }
Defaulting webhooks are also available (--defaulting flag). Use them to fill in sensible defaults on creation so users don't have to specify every optional field.
The // +kubebuilder:rbac comment markers are not documentation — they generate actual Kubernetes RBAC ClusterRole rules when you run make manifests. If you touch a resource without a marker, your controller pod will get a silent 403 Forbidden error.
go // your CRD — full access // +kubebuilder:rbac:groups=db.example.com,resources=postgresusers,verbs=get;list;watch;create;update;patch;delete // status subresource — separate permission required // +kubebuilder:rbac:groups=db.example.com,resources=postgresusers/status,verbs=get;update;patch // finalizers — needs update permission on the main resource's finalizers // +kubebuilder:rbac:groups=db.example.com,resources=postgresusers/finalizers,verbs=update // Secrets — read-only (we only read the password, never write) // +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch // Events — optional but lets you emit events visible in kubectl describe // +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch
Don't forget to regenerate: After changing any // +kubebuilder:rbac marker, always run make manifests to regenerate the YAML, then kubectl apply -f config/rbac/ to apply it. Forgetting this step is a very common silent failure.
This is the canonical structure to use as your starting point for any controller that manages external resources. Every step is numbered and annotated.
go func (r *PostgresUserReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { log := log.FromContext(ctx) // 1. FETCH // req only has name + namespace. r.Get fills our empty struct // with all the data from the k8s API server. pgUser := &dbv1alpha1.PostgresUser{} if err := r.Get(ctx, req.NamespacedName, pgUser); err != nil { if errors.IsNotFound(err) { return ctrl.Result{}, nil // deleted before we ran } return ctrl.Result{}, err } // 2. DELETION CHECK // DeletionTimestamp is set by k8s when kubectl delete is called // and a finalizer is present. If it's set, we must clean up. if !pgUser.DeletionTimestamp.IsZero() { return r.handleDeletion(ctx, pgUser) } // 3. ADD FINALIZER // Done BEFORE any real work so that if we crash mid-way, // deletion will still go through our cleanup code. if !controllerutil.ContainsFinalizer(pgUser, postgresUserFinalizer) { log.Info("Adding finalizer", "name", pgUser.Name) controllerutil.AddFinalizer(pgUser, postgresUserFinalizer) if err := r.Update(ctx, pgUser); err != nil { return ctrl.Result{}, err } return ctrl.Result{Requeue: true}, nil } // 4. EARLY EXIT IF ALREADY DONE // Reconcile runs on ANY change (labels, annotations, anything). // If the postgres user already exists, skip all work. if pgUser.Status.Phase == "Ready" && pgUser.Status.Created { return ctrl.Result{}, nil } // 5. SET IN-PROGRESS STATUS pgUser.Status.Phase = "Creating" pgUser.Status.Message = "Creating postgres user..." r.Status().Update(ctx, pgUser) // 6. FETCH DEPENDENCIES // Get the password from the referenced k8s Secret password, err := r.getPasswordFromSecret(ctx, pgUser) if err != nil { return r.setFailedStatus(ctx, pgUser, "cannot read password secret: "+err.Error()) } // 7. CONNECT TO EXTERNAL SYSTEM db, err := r.connectToPostgres() if err != nil { // transient error — return err so controller retries with backoff return ctrl.Result{}, fmt.Errorf("cannot connect to postgres: %v", err) } defer db.Close() // 8. IDEMPOTENT REAL-WORLD ACTION // IF NOT EXISTS makes this safe to call on every reconcile run if err := r.createPostgresUser(db, pgUser.Spec.Username, password, pgUser.Spec.Database, pgUser.Spec.Privileges); err != nil { return r.setFailedStatus(ctx, pgUser, "failed to create postgres user: "+err.Error()) } // 9. UPDATE STATUS TO READY // r.Status().Update() only touches the status block now := metav1.Now() pgUser.Status.Created = true pgUser.Status.Phase = "Ready" pgUser.Status.Message = "User created successfully" pgUser.Status.ProvisionedUsername = pgUser.Spec.Username pgUser.Status.ProvisionedDatabase = pgUser.Spec.Database pgUser.Status.CreatedAt = &now pgUser.Status.LastUpdated = &now if err := r.Status().Update(ctx, pgUser); err != nil { return ctrl.Result{}, err } log.Info("PostgresUser reconciled successfully", "username", pgUser.Spec.Username) return ctrl.Result{}, nil }
go // Called when deletionTimestamp is set. Cleans up postgres then // removes the finalizer to unblock k8s deletion. func (r *PostgresUserReconciler) handleDeletion(ctx context.Context, pgUser *dbv1alpha1.PostgresUser) (ctrl.Result, error) { if controllerutil.ContainsFinalizer(pgUser, postgresUserFinalizer) { pgUser.Status.Phase = "Deleting" r.Status().Update(ctx, pgUser) db, err := r.connectToPostgres() if err != nil { // can't connect — don't remove finalizer — controller retries return ctrl.Result{}, fmt.Errorf("cannot connect for cleanup: %v", err) } defer db.Close() // must revoke before drop db.Exec(fmt.Sprintf("REVOKE ALL PRIVILEGES ON DATABASE %s FROM %s", pgUser.Spec.Database, pgUser.Spec.Username)) if _, err := db.Exec(fmt.Sprintf("DROP USER IF EXISTS %s", pgUser.Spec.Username)); err != nil { return ctrl.Result{}, fmt.Errorf("failed to drop user: %v", err) } // cleanup done — remove finalizer — k8s will now complete deletion controllerutil.RemoveFinalizer(pgUser, postgresUserFinalizer) if err := r.Update(ctx, pgUser); err != nil { return ctrl.Result{}, err } } return ctrl.Result{}, nil }
go // returning an error triggers automatic retry with exponential backoff func (r *PostgresUserReconciler) setFailedStatus(ctx context.Context, pgUser *dbv1alpha1.PostgresUser, msg string) (ctrl.Result, error) { pgUser.Status.Phase = "Failed" pgUser.Status.Message = msg pgUser.Status.LastUpdated = &metav1.Time{Time: metav1.Now().Time} r.Status().Update(ctx, pgUser) return ctrl.Result{}, fmt.Errorf(msg) }
1. Fetch the object — handle NotFound gracefully
2. Check DeletionTimestamp — delegate to handleDeletion if set
3. Add finalizer if missing — requeue immediately after
4. Early exit if already in desired state
5. Set in-progress status
6. Fetch all dependencies (secrets, configmaps)
7. Connect to external system
8. Perform idempotent real-world action
9. Update status to reflect observed outcome