{"id":2904,"date":"2026-05-26T06:00:40","date_gmt":"2026-05-26T06:00:40","guid":{"rendered":"https:\/\/sreschool.com\/blog\/?p=2904"},"modified":"2026-05-26T06:00:40","modified_gmt":"2026-05-26T06:00:40","slug":"sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way","status":"publish","type":"post","link":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/","title":{"rendered":"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way"},"content":{"rendered":"\n<p>We tried to roll out a multi-tenant &#8220;shared ALB&#8221; architecture for our production EKS cluster and the apply blew up with three apparently unrelated errors. Untangling them turned into a tour of EKS Auto Mode internals, the AWS Load Balancer Controller&#8217;s Ingress-group feature, and the seams that appear when one Kubernetes cluster is provisioned by two or more separate Terraform Cloud workspaces. This post writes down the findings so the next person doesn&#8217;t burn a day on it.<\/p>\n\n\n\n<p>The cluster is EKS 1.34 with Auto Mode enabled, two&nbsp;<code>general-purpose<\/code>&nbsp;Auto Mode nodes, and four &#8220;team&#8221; namespaces (<code>analytics<\/code>,&nbsp;<code>design<\/code>,&nbsp;<code>platform<\/code>,&nbsp;<code>qa<\/code>) each getting their own ExternalDNS instance pointed at a delegated Route53 zone under&nbsp;<code>tools.drivemode.com<\/code>. The intent is one shared internet-facing ALB serving&nbsp;<code>&lt;name&gt;.&lt;team&gt;.tools.drivemode.com<\/code>&nbsp;for any team, with cert SNI across four wildcard ACM certs and team isolation enforced by a&nbsp;<code>ValidatingAdmissionPolicy<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-three-errors-and-why-they-were-misleading\">The three errors, and why they were misleading<\/h2>\n\n\n\n<p>A single&nbsp;<code>terraform apply<\/code>&nbsp;produced:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Error: 3 errors occurred:\n  * clusterroles.rbac.authorization.k8s.io \"external-dns\" already exists\n  * clusterrolebindings.rbac.authorization.k8s.io \"external-dns-viewer\" already exists\n  * Internal error occurred: failed calling webhook \"mservice.elbv2.k8s.aws\":\n    no endpoints available for service \"aws-load-balancer-webhook-service\"\n\nError: Cannot create resource that already exists\n  resource \"\/alb\" already exists\n  module.tools_dns.module.alb_ingressclass.kubernetes_manifest.ingressclass\n<\/code><\/pre>\n\n\n\n<p>It would be easy to read these as three bugs in our code. They&#8217;re not. They&#8217;re three independent failure modes that surfaced together because we built one design on top of another stack we didn&#8217;t fully understand.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"two-different-alb-controllers-one-cluster\">Two different ALB controllers, one cluster<\/h2>\n\n\n\n<p>EKS Auto Mode bundles its own ALB controller as a managed capability \u2014 exposed via&nbsp;<code>kubernetesNetworkConfig.elasticLoadBalancing.enabled<\/code>. There is no pod for it; the control plane runs it for you. Its IngressClass controller string is&nbsp;<code>eks.amazonaws.com\/alb<\/code>&nbsp;and its IngressClassParams live in the&nbsp;<code>eks.amazonaws.com\/v1<\/code>&nbsp;CRD group.<\/p>\n\n\n\n<p>Separately, you can install the standalone AWS Load Balancer Controller (LBC) via the Helm chart&nbsp;<code>aws-load-balancer-controller<\/code>. That ships its own pod, its own webhooks, its own CRDs in&nbsp;<code>elbv2.k8s.aws<\/code>, and its IngressClass controller string is&nbsp;<code>ingress.k8s.aws\/alb<\/code>.<\/p>\n\n\n\n<p>These are not the same controller. They are not the same product. They are not interchangeable. They differ in their feature set, their CRD group, and the API contract for annotations.<\/p>\n\n\n\n<p>Our cluster had both installed. We discovered this only when we noticed the live&nbsp;<code>IngressClass alb<\/code>&nbsp;had two distinct IngressClassParams CRDs in two different API groups, and the Helm-annotated&nbsp;<code>IngressClass alb<\/code>&nbsp;was owned by the standalone LBC release \u2014 not by the EKS-managed controller our&nbsp;<code>tools_dns<\/code>&nbsp;module was hard-coded against.<\/p>\n\n\n\n<p>Why both? Git archaeology answered it. In March we shipped a staging-only design that used Auto Mode&#8217;s built-in controller and per-gateway ALBs (one ALB per gateway, no sharing). In May we shipped a&nbsp;<em>different<\/em>&nbsp;production design that uses a shared ALB across teams via the&nbsp;<code>alb.ingress.kubernetes.io\/group.name<\/code>&nbsp;annotation \u2014 and that annotation is a feature of the standalone LBC, not Auto Mode. So the same PR that introduced the shared-ALB design also flipped on&nbsp;<code>enable_aws_load_balancer_controller = true<\/code>. Both controllers ended up coexisting on the cluster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-imds-hop-limit-ambush\">The IMDS hop-limit ambush<\/h2>\n\n\n\n<p>The standalone LBC pods entered CrashLoopBackOff with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>unable to initialize AWS cloud: failed to introspect vpcID from EC2Metadata\nor Node name, specify --aws-vpc-id instead if EC2Metadata is unavailable:\nEC2MetadataError: failed to make EC2Metadata request status code: 401\n<\/code><\/pre>\n\n\n\n<p>Pods can&#8217;t reach IMDSv2 because the token PUT requires an extra network hop and the node enforces&nbsp;<code>HttpPutResponseHopLimit=1<\/code>. AWS documents this constraint explicitly for Auto Mode:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;EKS Auto Mode enforces IMDSv2 with a hop limit of 1 by default, adhering to AWS security best practices.&nbsp;<strong>This default configuration cannot be modified in Auto Mode.<\/strong>&#8220;<br>\u2014&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/automode-learn-instances.html\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/automode-learn-instances.html<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>The same page also documents the official workaround:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;For add-ons that typically require IMDS access, supply parameters (such as AWS region) during installation to avoid IMDS lookups.&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p>And the AWS LBC Helm install guide is specific about which values to set when the nodes have IMDS restricted:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;If you&#8217;re deploying the controller to Amazon EC2 nodes that have&nbsp;<strong>restricted access to the Amazon EC2 instance metadata service (IMDS)<\/strong>, or if you&#8217;re deploying to Fargate or Amazon EKS Hybrid Nodes, then add the following flags to the&nbsp;<code>helm<\/code>&nbsp;command that follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>--set region={{region-code}}<\/code><\/li>\n\n\n\n<li><code>--set vpcId={{vpc-xxxxxxxx}}<\/code>&#8220;<\/li>\n<\/ul>\n\n\n\n<p>\u2014&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/lbc-helm.html#lbc-helm-install\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/lbc-helm.html#lbc-helm-install<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>So the fix for LBC on Auto Mode is to inject&nbsp;<strong>both<\/strong>&nbsp;values it would have learned from IMDS as explicit Helm values:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>aws_load_balancer_controller = {\n  set = &#91;\n    { name = \"vpcId\", value = module.network.vpc_id },\n    { name = \"region\", value = data.aws_region.current.name },\n  ]\n}\n<\/code><\/pre>\n\n\n\n<p>The LBC chart renders these into&nbsp;<code>--aws-vpc-id=&lt;value&gt;<\/code>&nbsp;and&nbsp;<code>--aws-region=&lt;value&gt;<\/code>&nbsp;on the Deployment, the IMDS lookup is skipped, and pods come Ready.<\/p>\n\n\n\n<p>Note: we initially set only&nbsp;<code>vpcId<\/code>, reasoning that IRSA already injects&nbsp;<code>AWS_REGION<\/code>&nbsp;env vars into the pod so the AWS SDK would pick up the region without IMDS. That reasoning may be true for SDK calls, but the AWS doc explicitly tells you to set both for this exact scenario. We followed the doc. Don&#8217;t second-guess AWS guidance with inferred reasoning \u2014 they wrote that line for a reason, and the cost of setting a redundant value is zero.<\/p>\n\n\n\n<p>This is not a hack. This is the AWS-documented pattern for any IMDS-using add-on running on Auto Mode (and for Fargate, and for EKS Hybrid Nodes \u2014 same restriction shape). Internalise it \u2014 every IRSA workload you add to an Auto Mode cluster needs to take its region\/VPC\/account as parameters, not from IMDS. The day you forget, you&#8217;ll spend an hour debugging &#8220;why does this pod hang on startup&#8221;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-webhook-cascade--why-one-crashing-pod-broke-unrelated-helm-releases\">The webhook cascade \u2014 why one crashing pod broke unrelated Helm releases<\/h2>\n\n\n\n<p>The standalone LBC ships a&nbsp;<code>MutatingWebhookConfiguration<\/code>&nbsp;with three webhooks:&nbsp;<code>mpod.elbv2.k8s.aws<\/code>,&nbsp;<code>mservice.elbv2.k8s.aws<\/code>,&nbsp;<code>mtargetgroupbinding.elbv2.k8s.aws<\/code>. All three are&nbsp;<code>failurePolicy: Fail<\/code>. The&nbsp;<code>mservice<\/code>&nbsp;webhook fires on every Service create, cluster-wide.<\/p>\n\n\n\n<p>When LBC pods crash, the webhook service has zero endpoints. Every Service create \u2014 including the Service that the ExternalDNS Helm chart wants to make in the team&#8217;s namespace \u2014 is rejected because the cluster can&#8217;t call the webhook. So a controller crash that has nothing to do with ExternalDNS still breaks our ExternalDNS rollout. The Helm releases retry, partially apply, and leave the cluster in a half-state.<\/p>\n\n\n\n<p>If you ever see &#8220;webhook failed&#8221; or &#8220;no endpoints available for service&#8221; for&nbsp;<code>aws-load-balancer-webhook-service<\/code>, don&#8217;t look at the resource you were trying to create. Look at the LBC pods.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-groupname-annotation-is-the-whole-reason-we-run-standalone-lbc\">The&nbsp;<code>group.name<\/code>&nbsp;annotation is the whole reason we run standalone LBC<\/h2>\n\n\n\n<p>Our&nbsp;<code>tools_dns<\/code>&nbsp;design wants one ALB serving every team, with each team owning its own Ingresses. The mechanism that lets multiple Ingresses converge onto a single ALB is the standalone LBC&#8217;s&nbsp;<code>IngressGroup<\/code>&nbsp;feature. You stamp every Ingress that should land on the same ALB with the same&nbsp;<code>alb.ingress.kubernetes.io\/group.name<\/code>&nbsp;annotation and the controller merges them.<\/p>\n\n\n\n<p>An &#8220;anchor&#8221; Ingress holds listener-level config that can&#8217;t be expressed per-rule \u2014 listen ports, SSL policy, the cert ARN list for SNI, a catch-all 404 action for hosts no team rule matches. We use&nbsp;<code>group.order = 1000<\/code>&nbsp;on the anchor so its catch-all 404 sits last in the merged rule list, after all team rules.<\/p>\n\n\n\n<p>This pattern is documented for the standalone LBC:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;You can add an order number of your ingress resource.&nbsp;<code>alb.ingress.kubernetes.io\/group.order: '10'<\/code>&nbsp;&#8230; Duplicate rules with a higher number can overwrite rules with a lower number. By default, the rule order between ingresses within the same ingress group is determined lexicographically based namespace and name.&#8221;<br>\u2014&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/alb-ingress.html\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/alb-ingress.html<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"eks-auto-mode-supports-shared-alb--but-not-the-same-way\">EKS Auto Mode supports &#8220;shared ALB&#8221; \u2014 but not the same way<\/h2>\n\n\n\n<p>Auto Mode supports a similar concept, but the API is different:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;<code>alb.ingress.kubernetes.io\/group.name<\/code>&nbsp;\u2014 Not supported \u2014 Specify groups in IngressClass only&#8221;<br>\u2014&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/auto-configure-alb.html\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/auto-configure-alb.html<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>Instead, you put&nbsp;<code>group.name<\/code>&nbsp;on the&nbsp;<code>IngressClassParams<\/code>&nbsp;\u2014 once, at the class level. Every Ingress using that class is implicitly in that group. That works for &#8220;all teams share one ALB&#8221;. It does&nbsp;<strong>not<\/strong>&nbsp;work for finer-grained group structure inside the same class, and it does&nbsp;<strong>not<\/strong>&nbsp;preserve the per-Ingress patterns LBC supports:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>alb.ingress.kubernetes.io\/group.order<\/code>\u00a0\u2014 docs silent on Auto Mode. Could be supported, could be silently ignored. Nobody has officially said.<\/li>\n\n\n\n<li><code>alb.ingress.kubernetes.io\/actions.&lt;name><\/code>\u00a0\u2014 docs silent. The Auto Mode considerations page mentions\u00a0<code>conditions.*<\/code>\u00a0as a supported per-Ingress annotation pattern, but explicitly not\u00a0<code>actions.*<\/code>. Our anchor&#8217;s\u00a0<code>actions.default-404<\/code>\u00a0may not work.<\/li>\n\n\n\n<li><code>alb.ingress.kubernetes.io\/ssl-policy<\/code>\u00a0\u2014 docs silent for the per-Ingress annotation. Listener attributes in general are explicitly\u00a0<strong>not<\/strong>\u00a0supported: &#8220;You cannot set ListenerAttribute with EKS Auto Mode.&#8221;<\/li>\n\n\n\n<li><code>alb.ingress.kubernetes.io\/listen-ports<\/code>,\u00a0<code>load-balancer-name<\/code>\u00a0\u2014 docs silent.<\/li>\n<\/ul>\n\n\n\n<p>Migrating our design to Auto Mode would mean betting that the undocumented behaviours either work or have workarounds we don&#8217;t know. We chose not to take that bet right now. We keep the standalone LBC.<\/p>\n\n\n\n<p>This is the part most people miss: &#8220;Auto Mode supports shared ALBs&#8221; is technically true, but Auto Mode supports a&nbsp;<em>strictly smaller subset<\/em>&nbsp;of the LBC&#8217;s annotation API. If your design relies on any per-Ingress annotation Auto Mode lists as unsupported (or silent on), you cannot migrate that design verbatim.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"coexistence-is-officially-supported\">Coexistence is officially supported<\/h2>\n\n\n\n<p>For people in the same position as us \u2014 already running standalone LBC on top of an Auto Mode cluster and wondering whether to roll back \u2014 AWS explicitly blesses this setup:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;You can install the AWS Load Balancer Controller on an Amazon EKS Auto Mode cluster. Use the&nbsp;<code>IngressClass<\/code>&nbsp;or&nbsp;<code>loadBalancerClass<\/code>&nbsp;options to associate Service and Ingress resources with either the Load Balancer Controller or EKS Auto Mode.&#8221;<br>\u2014&nbsp;<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/migrate-auto.html\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/migrate-auto.html<\/a><\/p>\n<\/blockquote>\n\n\n\n<p>The two controllers are distinguished by their IngressClass&nbsp;<code>spec.controller<\/code>&nbsp;value (<code>ingress.k8s.aws\/alb<\/code>&nbsp;vs.&nbsp;<code>eks.amazonaws.com\/alb<\/code>). You can keep both installed and steer Ingresses to whichever you want by setting&nbsp;<code>ingressClassName<\/code>&nbsp;appropriately. Auto Mode&#8217;s controller sits idle if no IngressClass references it \u2014 no cost, no failure mode.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-terraform-cloud--multi-workspace-trap\">The Terraform Cloud \/ multi-workspace trap<\/h2>\n\n\n\n<p>This is the part the title promises and the part that bit us hardest.<\/p>\n\n\n\n<p>Our infra is split across separate TFC workspaces by ownership:&nbsp;<code>devops-core<\/code>&nbsp;owns the EKS cluster, the VPC, the LBC helm release, and the cross-team certificates;&nbsp;<code>devops-kubernetes<\/code>&nbsp;owns team namespaces, ExternalDNS, and the&nbsp;<code>tools_dns<\/code>&nbsp;shared-ALB design. Two state files. Two pipelines. Two PR review queues.<\/p>\n\n\n\n<p>This split is reasonable at the platform level \u2014 different teams own different layers \u2014 but it interacts badly with the shared-ALB design in three concrete ways.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1-both-workspaces-wanted-to-own-ingressclass-alb\">1. Both workspaces wanted to own&nbsp;<code>IngressClass alb<\/code><\/h3>\n\n\n\n<p>The LBC Helm chart creates&nbsp;<code>IngressClass alb<\/code>&nbsp;automatically as part of its install. The&nbsp;<code>terraform-modules-aws\/\/ingressclass-alb<\/code>&nbsp;submodule we used in&nbsp;<code>tools_dns<\/code>&nbsp;also creates&nbsp;<code>IngressClass alb<\/code>. Each workspace believes it is the authoritative source. Each apply that runs in either workspace tries to either create or modify the object. The first one wins; the second one errors with &#8220;Cannot create resource that already exists&#8221;.<\/p>\n\n\n\n<p>This is not a Kubernetes problem and not really a Terraform problem either. It&#8217;s a coordination problem: a single cluster-scoped object cannot be co-owned by two state files. The only sustainable answer is to pick one workspace as the owner and document it. We chose: LBC chart (i.e.&nbsp;<code>devops-core<\/code>) owns the IngressClass;&nbsp;<code>tools_dns<\/code>&nbsp;references it by name. That removes the duplicate definition from&nbsp;<code>devops-kubernetes<\/code>&nbsp;entirely.<\/p>\n\n\n\n<p>Lesson:&nbsp;<strong>for any cluster-scoped object created by an upstream Helm chart (IngressClass, ClusterRole, CRD, MutatingWebhookConfiguration, PriorityClass), check the chart before declaring it elsewhere in Terraform.<\/strong>&nbsp;If the chart creates it, do not.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2-cross-workspace-install-ordering-is-implicit\">2. Cross-workspace install ordering is implicit<\/h3>\n\n\n\n<p><code>devops-kubernetes<\/code>&nbsp;assumes LBC is installed and healthy. There is no terraform dependency that enforces this. If&nbsp;<code>devops-core<\/code>&nbsp;is applied second, or LBC pods are crashing for any reason,&nbsp;<code>devops-kubernetes<\/code>&nbsp;apply fails \u2014 not with &#8220;LBC is missing&#8221;, but with cascading webhook errors and &#8220;IngressClass not found&#8221; and assorted incoherent symptoms.<\/p>\n\n\n\n<p>We worked around this by treating the cluster&#8217;s state as the source of truth: before any&nbsp;<code>devops-kubernetes<\/code>&nbsp;apply, manually verify LBC pods are Ready and the webhook has endpoints. That&#8217;s brittle and won&#8217;t scale. A better answer is to expose a TFC outputs contract from&nbsp;<code>devops-core<\/code>&nbsp;(e.g.&nbsp;<code>lbc_ready = true<\/code>&nbsp;only after a&nbsp;<code>null_resource<\/code>&nbsp;waits for the deployment to be Available) and&nbsp;<code>data.tfe_outputs<\/code>&nbsp;in&nbsp;<code>devops-kubernetes<\/code>&nbsp;to read it. We didn&#8217;t do this yet but it&#8217;s the right direction.<\/p>\n\n\n\n<p>Lesson:&nbsp;<strong>a split-workspace setup requires an explicit handshake between the workspaces, not &#8220;we know&nbsp;<code>devops-core<\/code>&nbsp;applies first most of the time&#8221;.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3-the-ingressgroup-itself-crosses-workspace-boundaries\">3. The&nbsp;<code>IngressGroup<\/code>&nbsp;itself crosses workspace boundaries<\/h3>\n\n\n\n<p>This is the subtlest one and the most important to internalise.<\/p>\n\n\n\n<p>An LBC&nbsp;<code>IngressGroup<\/code>&nbsp;is a set of Ingresses identified by their shared&nbsp;<code>group.name<\/code>&nbsp;annotation. The ALB they all share is a single AWS resource. Its listener rule list is the merged union of every Ingress&#8217;s rules. The certificate list on its listener is the union of every Ingress&#8217;s&nbsp;<code>certificate-arn<\/code>&nbsp;value. Its listener attributes (SSL policy, listen ports) are negotiated across all member Ingresses.<\/p>\n\n\n\n<p>In our design, the anchor Ingress lives in&nbsp;<code>devops-kubernetes<\/code>&#8216;s&nbsp;<code>tools_dns<\/code>&nbsp;module and holds the listener-level config (SSL policy, listen ports, cert ARN list, catch-all 404). The team Ingresses are created by&nbsp;<em>team workloads<\/em>&nbsp;\u2014 which in our case will live in&nbsp;<em>yet more<\/em>&nbsp;TFC workspaces (one per team service). So the ALB ends up co-owned by:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>devops-core<\/code>\u00a0(LBC helm release, controller config, IngressClass)<\/li>\n\n\n\n<li><code>devops-kubernetes<\/code>\u00a0(anchor Ingress, IRSA roles for ExternalDNS, admission policy)<\/li>\n\n\n\n<li>N team workspaces (their own Ingresses, each contributing rules to the merged listener)<\/li>\n<\/ul>\n\n\n\n<p>Any of those workspaces can break the shared ALB by adding an Ingress with a conflicting&nbsp;<code>group.order<\/code>, a different&nbsp;<code>ssl-policy<\/code>&nbsp;value, an annotation the anchor relies on being absent, or a certificate-arn that displaces another team&#8217;s cert from the list. There is no terraform graph that catches this \u2014 the conflict only surfaces at the LBC reconciliation, and only as a controller log line nobody is paging on.<\/p>\n\n\n\n<p>A few things we put in place to make this less dangerous:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A\u00a0<code>ValidatingAdmissionPolicy<\/code>\u00a0per team that pins\u00a0<code>ingressClassName<\/code>, requires the team&#8217;s\u00a0<code>group.name<\/code>, forbids\u00a0<code>alb.ingress.kubernetes.io\/certificate-arn<\/code>\u00a0(so teams can&#8217;t add certs to the shared listener and intercept SNI traffic for other teams), and restricts hostnames to the team&#8217;s apex.<\/li>\n\n\n\n<li>IRSA scoping per team so a team&#8217;s ExternalDNS can only mutate its own Route53 zone \u2014 even if it lies about hostnames, AWS denies the write.<\/li>\n\n\n\n<li>A single anchor Ingress whose\u00a0<code>group.order = 1000<\/code>\u00a0puts it last; teams default to order=0, so a misconfigured team rule won&#8217;t accidentally shadow another team&#8217;s rule (though tie-breaking by namespace\/name is still a foot-gun).<\/li>\n<\/ul>\n\n\n\n<p>What we did&nbsp;<strong>not<\/strong>&nbsp;put in place \u2014 and what we should \u2014 is a contract test in CI that, when any team workspace tries to add an Ingress, validates the merged outcome against the platform&#8217;s invariants (no listener-attribute drift, no cert-list mutation, no group.order &lt; 1, no host outside the team&#8217;s apex). The admission policy catches some of this at apply time, but a CI gate catches it before TFC even runs the plan.<\/p>\n\n\n\n<p>Lesson:&nbsp;<strong>when an AWS resource (an ALB, here) is the merged output of multiple TF workspaces, the workspace that owns the&nbsp;<em>anchor<\/em>&nbsp;is not the workspace that owns the&nbsp;<em>resource<\/em>. Defensive design at the boundary (admission policy, IRSA scoping, schema contracts) is mandatory, not nice-to-have.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"externaldns-multi-instance--a-smaller-version-of-the-same-problem\">ExternalDNS multi-instance \u2014 a smaller version of the same problem<\/h2>\n\n\n\n<p>Four team namespaces, four ExternalDNS Helm releases. The chart&#8217;s&nbsp;<code>external-dns.fullname<\/code>&nbsp;helper, by default, returns&nbsp;<code>external-dns<\/code>&nbsp;for any release named&nbsp;<code>external-dns<\/code>. The ClusterRole and ClusterRoleBinding are named off&nbsp;<code>fullname<\/code>. So all four releases tried to create&nbsp;<code>ClusterRole\/external-dns<\/code>&nbsp;and&nbsp;<code>ClusterRoleBinding\/external-dns-viewer<\/code>&nbsp;\u2014 same names, same cluster.<\/p>\n\n\n\n<p>Helm doesn&#8217;t reconcile that automatically. One release wins, the others fail with &#8220;already exists&#8221;. The &#8220;winner&#8221; varies by apply order. State diverges across runs.<\/p>\n\n\n\n<p>Fix: set&nbsp;<code>fullnameOverride = \"external-dns-&lt;team&gt;\"<\/code>&nbsp;per release. The chart&#8217;s helper respects the override and renders unique names for ClusterRole, ClusterRoleBinding, Service, and Deployment. The ServiceAccount stays named&nbsp;<code>external-dns<\/code>&nbsp;(because we set&nbsp;<code>serviceAccount.name<\/code>&nbsp;explicitly), so the IRSA trust policy \u2014 which targets the SA name in the team&#8217;s namespace \u2014 keeps working.<\/p>\n\n\n\n<p>This wasn&#8217;t a TFC problem per se; it was a multi-instance Helm chart problem. But it&#8217;s the same shape: an object with a globally-scoped name (ClusterRole is cluster-scoped, so its name is unique across the cluster) being declared in multiple places that don&#8217;t know about each other.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"takeaways\">Takeaways<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Don&#8217;t pick a controller by what your design uses today \u2014 pick by what you&#8217;re committing to operationally.<\/strong>\u00a0The standalone LBC and Auto Mode&#8217;s built-in ALB do related but non-identical jobs. Migrating between them is not a refactor; it&#8217;s a redesign. Picking by feature (&#8220;we need group.name annotation&#8221;) locks you into the standalone LBC. Picking by operational profile (&#8220;we want AWS to manage the controller&#8221;) locks you into Auto Mode. You don&#8217;t get to be ambivalent.<\/li>\n\n\n\n<li><strong>AWS-managed runtimes have invisible constraints.<\/strong>\u00a0EKS Auto Mode&#8217;s IMDS hop-limit=1 was not in the design doc we wrote when we enabled Auto Mode, and we found it by reading a crash log. Read the\u00a0<em>whole<\/em>\u00a0Auto Mode user guide before deploying any new IRSA-using add-on. Especially the &#8220;Instance Metadata Service&#8221; section.<\/li>\n\n\n\n<li><strong>Pre-existing webhooks with\u00a0<code>failurePolicy: Fail<\/code>\u00a0will block unrelated work when their backing controller is down.<\/strong>\u00a0Audit your\u00a0<code>MutatingWebhookConfiguration<\/code>s and\u00a0<code>ValidatingWebhookConfiguration<\/code>s and know which ones fail-closed. If any controller you depend on is down, anything matching its webhook scope is too.<\/li>\n\n\n\n<li><strong>For multi-workspace setups, name the owner of every shared object.<\/strong>\u00a0IngressClasses, ClusterRoles, CRDs, webhook configs \u2014 these don&#8217;t have a Terraform mechanism to express co-ownership. Pick one workspace per object and document it.<\/li>\n\n\n\n<li><strong><code>alb.ingress.kubernetes.io\/group.name<\/code>\u00a0is a powerful feature with a serious operational cost<\/strong>\u00a0when the group spans multiple workspaces. The convenience of &#8220;one ALB, many teams&#8221; comes with the risk that any contributor can mis-shape the merged listener. Compensate with admission policies, IRSA scoping, and CI contract tests \u2014 preferably all three.<\/li>\n\n\n\n<li><strong>Helm charts named\u00a0<code>external-dns<\/code>,\u00a0<code>cert-manager<\/code>,\u00a0<code>aws-load-balancer-controller<\/code>, etc., were not designed for multi-instance use on a single cluster.<\/strong>\u00a0If you&#8217;re installing the same chart more than once, the first thing to check is what\u00a0<code>fullnameOverride<\/code>\u00a0does in that chart&#8217;s templates. The second thing to check is which cluster-scoped objects it creates that you&#8217;ll have to disambiguate.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-we-shipped\">What we shipped<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>devops-core<\/code>: added\u00a0<code>vpcId<\/code>\u00a0and\u00a0<code>region<\/code>\u00a0Helm values to the LBC release so pods can come Ready on Auto Mode nodes. AWS-documented fix per\u00a0<a href=\"https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/lbc-helm.html#lbc-helm-install\">https:\/\/docs.aws.amazon.com\/eks\/latest\/userguide\/lbc-helm.html#lbc-helm-install<\/a>.<\/li>\n\n\n\n<li><code>devops-kubernetes<\/code>: removed the duplicate\u00a0<code>IngressClass alb<\/code>\u00a0from\u00a0<code>tools_dns<\/code>\u00a0\u2014 LBC&#8217;s Helm chart is the single owner. Added\u00a0<code>fullnameOverride<\/code>\u00a0per team to the ExternalDNS releases so cluster-scoped objects don&#8217;t collide.<\/li>\n<\/ul>\n\n\n\n<p>Three diffs across two repos and a long Friday afternoon. The diffs themselves are small. The understanding is what took the time. If this saves you the same Friday, the post did its job.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We tried to roll out a multi-tenant &#8220;shared ALB&#8221; architecture for our production EKS cluster and the apply blew up [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2904","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School\" \/>\n<meta property=\"og:description\" content=\"We tried to roll out a multi-tenant &#8220;shared ALB&#8221; architecture for our production EKS cluster and the apply blew up [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/\" \/>\n<meta property=\"og:site_name\" content=\"SRE School\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-26T06:00:40+00:00\" \/>\n<meta name=\"author\" content=\"Rajesh Kumar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Rajesh Kumar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/\",\"url\":\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/\",\"name\":\"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School\",\"isPartOf\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#website\"},\"datePublished\":\"2026-05-26T06:00:40+00:00\",\"author\":{\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\"},\"breadcrumb\":{\"@id\":\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/sreschool.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/sreschool.com\/blog\/#website\",\"url\":\"https:\/\/sreschool.com\/blog\/\",\"name\":\"SRESchool\",\"description\":\"Master SRE. Build Resilient Systems. Lead the Future of Reliability\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/sreschool.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201\",\"name\":\"Rajesh Kumar\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g\",\"caption\":\"Rajesh Kumar\"},\"sameAs\":[\"http:\/\/sreschool.com\/blog\"],\"url\":\"https:\/\/sreschool.com\/blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/","og_locale":"en_US","og_type":"article","og_title":"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School","og_description":"We tried to roll out a multi-tenant &#8220;shared ALB&#8221; architecture for our production EKS cluster and the apply blew up [&hellip;]","og_url":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/","og_site_name":"SRE School","article_published_time":"2026-05-26T06:00:40+00:00","author":"Rajesh Kumar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Rajesh Kumar","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/","url":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/","name":"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way - SRE School","isPartOf":{"@id":"https:\/\/sreschool.com\/blog\/#website"},"datePublished":"2026-05-26T06:00:40+00:00","author":{"@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201"},"breadcrumb":{"@id":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sreschool.com\/blog\/sharing-an-alb-across-teams-on-eks-auto-mode-and-split-terraform-workspaces-what-we-learned-the-hard-way\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sreschool.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Sharing an ALB across teams on EKS Auto Mode and split Terraform workspaces \u2014 what we learned the hard way"}]},{"@type":"WebSite","@id":"https:\/\/sreschool.com\/blog\/#website","url":"https:\/\/sreschool.com\/blog\/","name":"SRESchool","description":"Master SRE. Build Resilient Systems. Lead the Future of Reliability","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sreschool.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Person","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/0ffe446f77bb2589992dbe3a7f417201","name":"Rajesh Kumar","image":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/sreschool.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f901a4f2929fa034a291a8363d589791d5a3c1f6a051c22e744acb8bfc8e022a?s=96&d=mm&r=g","caption":"Rajesh Kumar"},"sameAs":["http:\/\/sreschool.com\/blog"],"url":"https:\/\/sreschool.com\/blog\/author\/admin\/"}]}},"_links":{"self":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2904","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/comments?post=2904"}],"version-history":[{"count":1,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2904\/revisions"}],"predecessor-version":[{"id":2905,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/posts\/2904\/revisions\/2905"}],"wp:attachment":[{"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/media?parent=2904"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/categories?post=2904"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sreschool.com\/blog\/wp-json\/wp\/v2\/tags?post=2904"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}