Terraform Data Block Deep Dive — From Real World Experience

Terraform’s data block is often misunderstood or underused until you hit real-world scenarios where your configuration has to reference existing resources.
This article will not rehash basic documentation. Instead, we will walk through why, when, and how to use data blocks, supported with practical scenarios and alternatives — all based on day-to-day usage.

📌 Why do we need `data` blocks at all?

Let’s begin with a simple but common scenario:
You want to associate an Azure Network Security Group (NSG) with a subnet.

In Terraform, to associate them, you need their IDs —

- - subnet_id
  - network_security_group_id

But here are the problems :

1) The resource may not have been created yet. It might be part of the same Terraform apply operation. For example, your Terraform configuration may create the Virtual Network, Subnets, NSGs, and then associate NSGs with Subnets — all in a single run. In such cases, how would you provide the resource ID in advance?

2) Yes, technically you can guess the resource ID format in advance, as Azure follows a pre-defined structure. But for a normal user, figuring out and constructing the exact resource ID is difficult, and we do not want to put this burden on the user.

3) Even if the resource already exists, we do not want our users to struggle by going to the JSON View in Azure Portal to manually fetch the resource ID. That’s tedious and error-prone. [Althogh if you ask me, I would prefer providing the resource id in tfvars rather than giving n number of inputs like resource name, resource group name etc. But somehow, people does not like resource id in tfvars and I have to go by industry standards]

4) Instead, we want users to provide only the simple details which are readily available to them — like resource name, resource group name, virtual network name, etc. The rest should be taken care of automatically by Terraform using data blocks.

This is where data blocks become invaluable.

Instead of asking users to input IDs in your tfvars, you ask them for just:

subnet_name           = "backend-subnet"
subnet_rg_name       = "app-network-rg"
vnet_name            = "app-vnet"
nsg_name             = "backend-nsg"
nsg_rg_name          = "security-rg"

And Terraform will take care of fetching the corresponding id automatically using data blocks.

✅ Example: Using `data` block for Azure Subnet and NSG

data "azurerm_subnet" "selected" {
  name                 = var.subnet_name
  virtual_network_name = var.vnet_name
  resource_group_name  = var.subnet_rg_name
}

data "azurerm_network_security_group" "selected" {
  name                = var.nsg_name
  resource_group_name = var.nsg_rg_name
}

resource "azurerm_subnet_network_security_group_association" "association" {
  subnet_id                 = data.azurerm_subnet.selected.id
  network_security_group_id = data.azurerm_network_security_group.selected.id
}

👉 Here, users only provide subnet name and NSG name (and RG, VNet).

👉 Terraform fetches IDs using data blocks internally.

📚 How do you know what values `data` block can return?

Every Terraform provider (like azurerm) clearly defines:

- - Supported data blocks
  - The required input arguments (i.e. what you need to provide)
  - The exported attributes (i.e. what you can fetch)

📌 Important

If a resource does NOT support data block → you CANNOT use data block.

If a property (attribute) is NOT exported in data block → you CANNOT fetch it.

Also, when referring to Terraform documentation for a resource or data block, always make sure you are looking at the documentation version that matches your azurerm provider version.

For example, if you are using azurerm provider version 3.114.0 but checking documentation for 4.27.0, there may be differences and inconsistencies which can cause confusion and errors.

🔄 Using `for_each` with `data` block

In real-world scenarios, you don’t deal with one subnet or NSG — you deal with many.

Maybe multiple subnets need to be associated with different NSGs.

For such cases, data block supports for_each — just like resource block.

variable "subnets" {}

data "azurerm_subnet" "selected" {
  for_each             = var.subnets
  name                 = each.value.subnet_name
  virtual_network_name = each.value.vnet_name
  resource_group_name  = each.value.rg_name
}

data "azurerm_network_security_group" "selected" {
  for_each            = var.subnets
  name                = each.value.nsg_name
  resource_group_name = each.value.nsg_rg_name
}

resource "azurerm_subnet_network_security_group_association" "association" {
  for_each                = var.subnets
  subnet_id               = data.azurerm_subnet.selected[each.key].id
  network_security_group_id = data.azurerm_network_security_group.selected[each.key].id
}

Note: While referring (calling) a data block inside a for_each loop, always use each.key — not each.value (which you typically use in resource blocks). This is because the data block itself is indexed by [each.key], and there is no value object at this point. Trying to use each.value here will result in an error.

✅ Why use for_each?

If your resource block uses for_each, your data block must also use for_each to match the context.

Otherwise, Terraform won’t know which subnet → which NSG → which association. This can lead to errors or incorrect mappings during plan and apply.

✅ What goes in tfvars?

Now that you are using iterations (for_each) both in resource block and data block, you can put multiple input block in tfvars, as shown below.

Here, app1 and app2 are considered two iterations.

subnets = {
  "app1" = {
    subnet_name = "app1-subnet"
    vnet_name   = "app-vnet"
    rg_name     = "app-network-rg"
    nsg_name    = "app1-nsg"
    nsg_rg_name = "security-rg"
  },
  "app2" = {
    subnet_name = "app2-subnet"
    vnet_name   = "app-vnet"
    rg_name     = "app-network-rg"
    nsg_name    = "app2-nsg"
    nsg_rg_name = "security-rg"
  }
}

🚦 Handling dependency: `depends_on` with `data` and `resource` blocks

In most cases, Terraform has implicit dependency handling — it automatically understands the inter-dependencies between data blocks and resource blocks, and executes them in the correct order.

However, there are scenarios where dependencies need to be explicitly defined, especially when there are no direct references or when timing between resources matters.

Best Practice: Even though Terraform tries to handle this automatically, I recommend explicitly using depends_on to define such dependencies. This ensures clarity, avoids subtle issues during plan/apply, and makes your intent visible in the configuration.

📌 If `data` depends on `resource`

If you are creating a resource and fetching it immediately after using data, the resource may not exist yet → race condition.

data "azurerm_virtual_network" "selected" {
  name                = azurerm_virtual_network.my_vnet.name
  resource_group_name = azurerm_virtual_network.my_vnet.resource_group_name

  depends_on = [azurerm_virtual_network.my_vnet]
}

Without depends_on, Terraform might try to fetch it before creation → error.

📌 If `resource` depends on `data`

This is more common → like in our subnet association example.

resource "azurerm_subnet_network_security_group_association" "association" {
  subnet_id = data.azurerm_subnet.selected.id
  network_security_group_id = data.azurerm_network_security_group.selected.id

  depends_on = [
    data.azurerm_subnet.selected,
    data.azurerm_network_security_group.selected
  ]
}

👉 If omitted, Terraform might try to create the association before fetching IDs → error.

🔗 Multiple `data` blocks can be used freely

Terraform modules often use many data blocks together to fetch:

- - Subnets
  - NSGs
  - Resource Groups
  - Storage Accounts
  - Private DNS Zones

All these can be used together in any resource block.

subnet_id = data.azurerm_subnet.selected.id
nsg_id    = data.azurerm_network_security_group.selected.id
dns_zone  = data.azurerm_private_dns_zone.internal.name

No restriction — just best practices to avoid clutter.

⚡ When NOT to use `data` block → Resource ID Format Alternative

The most common use cases of data block is fetching the resource id of an Azure resource.

A classic example is associating an (existing) subnet with (existing) NSG, where we need to put both the subnet ID and the NSG ID. So we are using data block here to read the subnet and NSG IDs.

But is data block the only way to dynamically fetch the resource ID ?

The answer is NO. In fact, there is a simpler approach.

Instead of using data block, you can directly construct the resource ID using:

/subscriptions/${data.azurerm_client_config.current.subscription_id}/resourceGroups/${var.resource_group_name}/providers/Microsoft.Network/virtualNetworks/${var.vnet_name}/subnets/${var.subnet_name}

✅ Benefits:

- No extra data block
- No dependency management
- No unnecessary state lookups

In this case:

- subscription_id comes dynamically from data.azurerm_client_config.current.subscription_id
- resource_group_name comes from var.resource_group_name

👉 If resource moves to another RG → user updates only var.resource_group_name → everything works
👉 If resource moves to another subscription → user runs Terraform from that subscription context → data.azurerm_client_config.current.subscription_id changes → everything works

✅ No hardcoding → no manual change needed inside Terraform code.

Note:

I am not saying that this is a full substitute for data block. There might be some use cases where you still need to use data blocks. However, if it is just about fetching resource IDs, I would prefer this approach over data block due to its simplicity.

And yes — I have tested it in a real environment, and it works without any glitch !

🎯 Final Words — Data Block is Powerful, but not Always Needed

In summary:

- - - data block is essential when you need to fetch existing resource properties.
    - For repeated resources → use for_each.
    - For dependency order → use depends_on.
    - For just resource IDs → consider resource ID format directly.

The right decision comes from real world experience →
Overusing data blocks → bloated code.

Avoiding them when needed → broken references.

So, use it smartly.

📌 Why do we need data blocks at all?

✅ Example: Using data block for Azure Subnet and NSG

📚 How do you know what values data block can return?

🔄 Using for_each with data block

🚦 Handling dependency: depends_on with data and resource blocks

📌 If data depends on resource

📌 If resource depends on data

🔗 Multiple data blocks can be used freely

⚡ When NOT to use data block → Resource ID Format Alternative