Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces — what we learned the hard way

Uncategorized

We tried to roll out a multi-tenant “shared ALB” architecture for our production EKS cluster and the apply blew up with three apparently unrelated errors. Untangling them turned into a tour of EKS Auto Mode internals, the AWS Load Balancer Controller’s Ingress-group feature, and the seams that appear when one Kubernetes cluster is provisioned by two or more separate Terraform Cloud workspaces. This post writes down the findings so the next person doesn’t burn a day on it.

The cluster is EKS 1.34 with Auto Mode enabled, two general-purpose Auto Mode nodes, and four “team” namespaces (analyticsdesignplatformqa) each getting their own ExternalDNS instance pointed at a delegated Route53 zone under tools.drivemode.com. The intent is one shared internet-facing ALB serving <name>.<team>.tools.drivemode.com for any team, with cert SNI across four wildcard ACM certs and team isolation enforced by a ValidatingAdmissionPolicy.

The three errors, and why they were misleading

A single terraform apply produced:

Error: 3 errors occurred:
  * clusterroles.rbac.authorization.k8s.io "external-dns" already exists
  * clusterrolebindings.rbac.authorization.k8s.io "external-dns-viewer" already exists
  * Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws":
    no endpoints available for service "aws-load-balancer-webhook-service"

Error: Cannot create resource that already exists
  resource "/alb" already exists
  module.tools_dns.module.alb_ingressclass.kubernetes_manifest.ingressclass

It would be easy to read these as three bugs in our code. They’re not. They’re three independent failure modes that surfaced together because we built one design on top of another stack we didn’t fully understand.

Two different ALB controllers, one cluster

EKS Auto Mode bundles its own ALB controller as a managed capability — exposed via kubernetesNetworkConfig.elasticLoadBalancing.enabled. There is no pod for it; the control plane runs it for you. Its IngressClass controller string is eks.amazonaws.com/alb and its IngressClassParams live in the eks.amazonaws.com/v1 CRD group.

Separately, you can install the standalone AWS Load Balancer Controller (LBC) via the Helm chart aws-load-balancer-controller. That ships its own pod, its own webhooks, its own CRDs in elbv2.k8s.aws, and its IngressClass controller string is ingress.k8s.aws/alb.

These are not the same controller. They are not the same product. They are not interchangeable. They differ in their feature set, their CRD group, and the API contract for annotations.

Our cluster had both installed. We discovered this only when we noticed the live IngressClass alb had two distinct IngressClassParams CRDs in two different API groups, and the Helm-annotated IngressClass alb was owned by the standalone LBC release — not by the EKS-managed controller our tools_dns module was hard-coded against.

Why both? Git archaeology answered it. In March we shipped a staging-only design that used Auto Mode’s built-in controller and per-gateway ALBs (one ALB per gateway, no sharing). In May we shipped a different production design that uses a shared ALB across teams via the alb.ingress.kubernetes.io/group.name annotation — and that annotation is a feature of the standalone LBC, not Auto Mode. So the same PR that introduced the shared-ALB design also flipped on enable_aws_load_balancer_controller = true. Both controllers ended up coexisting on the cluster.

The IMDS hop-limit ambush

The standalone LBC pods entered CrashLoopBackOff with:

unable to initialize AWS cloud: failed to introspect vpcID from EC2Metadata
or Node name, specify --aws-vpc-id instead if EC2Metadata is unavailable:
EC2MetadataError: failed to make EC2Metadata request status code: 401

Pods can’t reach IMDSv2 because the token PUT requires an extra network hop and the node enforces HttpPutResponseHopLimit=1. AWS documents this constraint explicitly for Auto Mode:

“EKS Auto Mode enforces IMDSv2 with a hop limit of 1 by default, adhering to AWS security best practices. This default configuration cannot be modified in Auto Mode.
— https://docs.aws.amazon.com/eks/latest/userguide/automode-learn-instances.html

The same page also documents the official workaround:

“For add-ons that typically require IMDS access, supply parameters (such as AWS region) during installation to avoid IMDS lookups.”

And the AWS LBC Helm install guide is specific about which values to set when the nodes have IMDS restricted:

“If you’re deploying the controller to Amazon EC2 nodes that have restricted access to the Amazon EC2 instance metadata service (IMDS), or if you’re deploying to Fargate or Amazon EKS Hybrid Nodes, then add the following flags to the helm command that follows:

  • --set region={{region-code}}
  • --set vpcId={{vpc-xxxxxxxx}}

— https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html#lbc-helm-install

So the fix for LBC on Auto Mode is to inject both values it would have learned from IMDS as explicit Helm values:

aws_load_balancer_controller = {
  set = [
    { name = "vpcId", value = module.network.vpc_id },
    { name = "region", value = data.aws_region.current.name },
  ]
}

The LBC chart renders these into --aws-vpc-id=<value> and --aws-region=<value> on the Deployment, the IMDS lookup is skipped, and pods come Ready.

Note: we initially set only vpcId, reasoning that IRSA already injects AWS_REGION env vars into the pod so the AWS SDK would pick up the region without IMDS. That reasoning may be true for SDK calls, but the AWS doc explicitly tells you to set both for this exact scenario. We followed the doc. Don’t second-guess AWS guidance with inferred reasoning — they wrote that line for a reason, and the cost of setting a redundant value is zero.

This is not a hack. This is the AWS-documented pattern for any IMDS-using add-on running on Auto Mode (and for Fargate, and for EKS Hybrid Nodes — same restriction shape). Internalise it — every IRSA workload you add to an Auto Mode cluster needs to take its region/VPC/account as parameters, not from IMDS. The day you forget, you’ll spend an hour debugging “why does this pod hang on startup”.

The webhook cascade — why one crashing pod broke unrelated Helm releases

The standalone LBC ships a MutatingWebhookConfiguration with three webhooks: mpod.elbv2.k8s.awsmservice.elbv2.k8s.awsmtargetgroupbinding.elbv2.k8s.aws. All three are failurePolicy: Fail. The mservice webhook fires on every Service create, cluster-wide.

When LBC pods crash, the webhook service has zero endpoints. Every Service create — including the Service that the ExternalDNS Helm chart wants to make in the team’s namespace — is rejected because the cluster can’t call the webhook. So a controller crash that has nothing to do with ExternalDNS still breaks our ExternalDNS rollout. The Helm releases retry, partially apply, and leave the cluster in a half-state.

If you ever see “webhook failed” or “no endpoints available for service” for aws-load-balancer-webhook-service, don’t look at the resource you were trying to create. Look at the LBC pods.

The group.name annotation is the whole reason we run standalone LBC

Our tools_dns design wants one ALB serving every team, with each team owning its own Ingresses. The mechanism that lets multiple Ingresses converge onto a single ALB is the standalone LBC’s IngressGroup feature. You stamp every Ingress that should land on the same ALB with the same alb.ingress.kubernetes.io/group.name annotation and the controller merges them.

An “anchor” Ingress holds listener-level config that can’t be expressed per-rule — listen ports, SSL policy, the cert ARN list for SNI, a catch-all 404 action for hosts no team rule matches. We use group.order = 1000 on the anchor so its catch-all 404 sits last in the merged rule list, after all team rules.

This pattern is documented for the standalone LBC:

“You can add an order number of your ingress resource. alb.ingress.kubernetes.io/group.order: '10' … Duplicate rules with a higher number can overwrite rules with a lower number. By default, the rule order between ingresses within the same ingress group is determined lexicographically based namespace and name.”
— https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

EKS Auto Mode supports “shared ALB” — but not the same way

Auto Mode supports a similar concept, but the API is different:

alb.ingress.kubernetes.io/group.name — Not supported — Specify groups in IngressClass only”
— https://docs.aws.amazon.com/eks/latest/userguide/auto-configure-alb.html

Instead, you put group.name on the IngressClassParams — once, at the class level. Every Ingress using that class is implicitly in that group. That works for “all teams share one ALB”. It does not work for finer-grained group structure inside the same class, and it does not preserve the per-Ingress patterns LBC supports:

  • alb.ingress.kubernetes.io/group.order — docs silent on Auto Mode. Could be supported, could be silently ignored. Nobody has officially said.
  • alb.ingress.kubernetes.io/actions.<name> — docs silent. The Auto Mode considerations page mentions conditions.* as a supported per-Ingress annotation pattern, but explicitly not actions.*. Our anchor’s actions.default-404 may not work.
  • alb.ingress.kubernetes.io/ssl-policy — docs silent for the per-Ingress annotation. Listener attributes in general are explicitly not supported: “You cannot set ListenerAttribute with EKS Auto Mode.”
  • alb.ingress.kubernetes.io/listen-portsload-balancer-name — docs silent.

Migrating our design to Auto Mode would mean betting that the undocumented behaviours either work or have workarounds we don’t know. We chose not to take that bet right now. We keep the standalone LBC.

This is the part most people miss: “Auto Mode supports shared ALBs” is technically true, but Auto Mode supports a strictly smaller subset of the LBC’s annotation API. If your design relies on any per-Ingress annotation Auto Mode lists as unsupported (or silent on), you cannot migrate that design verbatim.

Coexistence is officially supported

For people in the same position as us — already running standalone LBC on top of an Auto Mode cluster and wondering whether to roll back — AWS explicitly blesses this setup:

“You can install the AWS Load Balancer Controller on an Amazon EKS Auto Mode cluster. Use the IngressClass or loadBalancerClass options to associate Service and Ingress resources with either the Load Balancer Controller or EKS Auto Mode.”
— https://docs.aws.amazon.com/eks/latest/userguide/migrate-auto.html

The two controllers are distinguished by their IngressClass spec.controller value (ingress.k8s.aws/alb vs. eks.amazonaws.com/alb). You can keep both installed and steer Ingresses to whichever you want by setting ingressClassName appropriately. Auto Mode’s controller sits idle if no IngressClass references it — no cost, no failure mode.

The Terraform Cloud / multi-workspace trap

This is the part the title promises and the part that bit us hardest.

Our infra is split across separate TFC workspaces by ownership: devops-core owns the EKS cluster, the VPC, the LBC helm release, and the cross-team certificates; devops-kubernetes owns team namespaces, ExternalDNS, and the tools_dns shared-ALB design. Two state files. Two pipelines. Two PR review queues.

This split is reasonable at the platform level — different teams own different layers — but it interacts badly with the shared-ALB design in three concrete ways.

1. Both workspaces wanted to own IngressClass alb

The LBC Helm chart creates IngressClass alb automatically as part of its install. The terraform-modules-aws//ingressclass-alb submodule we used in tools_dns also creates IngressClass alb. Each workspace believes it is the authoritative source. Each apply that runs in either workspace tries to either create or modify the object. The first one wins; the second one errors with “Cannot create resource that already exists”.

This is not a Kubernetes problem and not really a Terraform problem either. It’s a coordination problem: a single cluster-scoped object cannot be co-owned by two state files. The only sustainable answer is to pick one workspace as the owner and document it. We chose: LBC chart (i.e. devops-core) owns the IngressClass; tools_dns references it by name. That removes the duplicate definition from devops-kubernetes entirely.

Lesson: for any cluster-scoped object created by an upstream Helm chart (IngressClass, ClusterRole, CRD, MutatingWebhookConfiguration, PriorityClass), check the chart before declaring it elsewhere in Terraform. If the chart creates it, do not.

2. Cross-workspace install ordering is implicit

devops-kubernetes assumes LBC is installed and healthy. There is no terraform dependency that enforces this. If devops-core is applied second, or LBC pods are crashing for any reason, devops-kubernetes apply fails — not with “LBC is missing”, but with cascading webhook errors and “IngressClass not found” and assorted incoherent symptoms.

We worked around this by treating the cluster’s state as the source of truth: before any devops-kubernetes apply, manually verify LBC pods are Ready and the webhook has endpoints. That’s brittle and won’t scale. A better answer is to expose a TFC outputs contract from devops-core (e.g. lbc_ready = true only after a null_resource waits for the deployment to be Available) and data.tfe_outputs in devops-kubernetes to read it. We didn’t do this yet but it’s the right direction.

Lesson: a split-workspace setup requires an explicit handshake between the workspaces, not “we know devops-core applies first most of the time”.

3. The IngressGroup itself crosses workspace boundaries

This is the subtlest one and the most important to internalise.

An LBC IngressGroup is a set of Ingresses identified by their shared group.name annotation. The ALB they all share is a single AWS resource. Its listener rule list is the merged union of every Ingress’s rules. The certificate list on its listener is the union of every Ingress’s certificate-arn value. Its listener attributes (SSL policy, listen ports) are negotiated across all member Ingresses.

In our design, the anchor Ingress lives in devops-kubernetes‘s tools_dns module and holds the listener-level config (SSL policy, listen ports, cert ARN list, catch-all 404). The team Ingresses are created by team workloads — which in our case will live in yet more TFC workspaces (one per team service). So the ALB ends up co-owned by:

  • devops-core (LBC helm release, controller config, IngressClass)
  • devops-kubernetes (anchor Ingress, IRSA roles for ExternalDNS, admission policy)
  • N team workspaces (their own Ingresses, each contributing rules to the merged listener)

Any of those workspaces can break the shared ALB by adding an Ingress with a conflicting group.order, a different ssl-policy value, an annotation the anchor relies on being absent, or a certificate-arn that displaces another team’s cert from the list. There is no terraform graph that catches this — the conflict only surfaces at the LBC reconciliation, and only as a controller log line nobody is paging on.

A few things we put in place to make this less dangerous:

  • ValidatingAdmissionPolicy per team that pins ingressClassName, requires the team’s group.name, forbids alb.ingress.kubernetes.io/certificate-arn (so teams can’t add certs to the shared listener and intercept SNI traffic for other teams), and restricts hostnames to the team’s apex.
  • IRSA scoping per team so a team’s ExternalDNS can only mutate its own Route53 zone — even if it lies about hostnames, AWS denies the write.
  • A single anchor Ingress whose group.order = 1000 puts it last; teams default to order=0, so a misconfigured team rule won’t accidentally shadow another team’s rule (though tie-breaking by namespace/name is still a foot-gun).

What we did not put in place — and what we should — is a contract test in CI that, when any team workspace tries to add an Ingress, validates the merged outcome against the platform’s invariants (no listener-attribute drift, no cert-list mutation, no group.order < 1, no host outside the team’s apex). The admission policy catches some of this at apply time, but a CI gate catches it before TFC even runs the plan.

Lesson: when an AWS resource (an ALB, here) is the merged output of multiple TF workspaces, the workspace that owns the anchor is not the workspace that owns the resource. Defensive design at the boundary (admission policy, IRSA scoping, schema contracts) is mandatory, not nice-to-have.

ExternalDNS multi-instance — a smaller version of the same problem

Four team namespaces, four ExternalDNS Helm releases. The chart’s external-dns.fullname helper, by default, returns external-dns for any release named external-dns. The ClusterRole and ClusterRoleBinding are named off fullname. So all four releases tried to create ClusterRole/external-dns and ClusterRoleBinding/external-dns-viewer — same names, same cluster.

Helm doesn’t reconcile that automatically. One release wins, the others fail with “already exists”. The “winner” varies by apply order. State diverges across runs.

Fix: set fullnameOverride = "external-dns-<team>" per release. The chart’s helper respects the override and renders unique names for ClusterRole, ClusterRoleBinding, Service, and Deployment. The ServiceAccount stays named external-dns (because we set serviceAccount.name explicitly), so the IRSA trust policy — which targets the SA name in the team’s namespace — keeps working.

This wasn’t a TFC problem per se; it was a multi-instance Helm chart problem. But it’s the same shape: an object with a globally-scoped name (ClusterRole is cluster-scoped, so its name is unique across the cluster) being declared in multiple places that don’t know about each other.

Takeaways

  1. Don’t pick a controller by what your design uses today — pick by what you’re committing to operationally. The standalone LBC and Auto Mode’s built-in ALB do related but non-identical jobs. Migrating between them is not a refactor; it’s a redesign. Picking by feature (“we need group.name annotation”) locks you into the standalone LBC. Picking by operational profile (“we want AWS to manage the controller”) locks you into Auto Mode. You don’t get to be ambivalent.
  2. AWS-managed runtimes have invisible constraints. EKS Auto Mode’s IMDS hop-limit=1 was not in the design doc we wrote when we enabled Auto Mode, and we found it by reading a crash log. Read the whole Auto Mode user guide before deploying any new IRSA-using add-on. Especially the “Instance Metadata Service” section.
  3. Pre-existing webhooks with failurePolicy: Fail will block unrelated work when their backing controller is down. Audit your MutatingWebhookConfigurations and ValidatingWebhookConfigurations and know which ones fail-closed. If any controller you depend on is down, anything matching its webhook scope is too.
  4. For multi-workspace setups, name the owner of every shared object. IngressClasses, ClusterRoles, CRDs, webhook configs — these don’t have a Terraform mechanism to express co-ownership. Pick one workspace per object and document it.
  5. alb.ingress.kubernetes.io/group.name is a powerful feature with a serious operational cost when the group spans multiple workspaces. The convenience of “one ALB, many teams” comes with the risk that any contributor can mis-shape the merged listener. Compensate with admission policies, IRSA scoping, and CI contract tests — preferably all three.
  6. Helm charts named external-dnscert-manageraws-load-balancer-controller, etc., were not designed for multi-instance use on a single cluster. If you’re installing the same chart more than once, the first thing to check is what fullnameOverride does in that chart’s templates. The second thing to check is which cluster-scoped objects it creates that you’ll have to disambiguate.

What we shipped

  • devops-core: added vpcId and region Helm values to the LBC release so pods can come Ready on Auto Mode nodes. AWS-documented fix per https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html#lbc-helm-install.
  • devops-kubernetes: removed the duplicate IngressClass alb from tools_dns — LBC’s Helm chart is the single owner. Added fullnameOverride per team to the ExternalDNS releases so cluster-scoped objects don’t collide.

Three diffs across two repos and a long Friday afternoon. The diffs themselves are small. The understanding is what took the time. If this saves you the same Friday, the post did its job.


0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x