Back to Blog
Azure
4 min read

Synapse Managed VNET with Data Exfiltration Protection

AzureSynapseNetworkingSecurityPrivate Endpoints

Azure Synapse with Data Exfiltration Protection (DEP) is the most secure configuration - but it blocks all outbound traffic from Spark pools. Here's how to make it work.

What Data Exfiltration Protection Does

When enabled:

  • All outbound traffic from managed VNET is blocked
  • Spark pools can only reach resources via managed private endpoints
  • No internet access for pip install or external APIs
  • Even Azure services need private endpoints

The Catch-22

You want security, but your Spark jobs need to:

  • Read from Azure Storage
  • Write to SQL databases
  • Log to Log Analytics
  • Maybe call external APIs

Without proper configuration, everything fails with connection timeouts.

Enabling DEP

resource "azurerm_synapse_workspace" "this" {
  name                                 = "syn-production"
  resource_group_name                  = azurerm_resource_group.this.name
  location                             = azurerm_resource_group.this.location
  storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.this.id

  managed_virtual_network_enabled      = true
  data_exfiltration_protection_enabled = true  # This is the key setting

  identity {
    type = "SystemAssigned"
  }
}

Warning: DEP can only be enabled at workspace creation. You cannot enable it later.

Creating Managed Private Endpoints

For each resource your Spark pools need to access:

Storage Account

resource "azurerm_synapse_managed_private_endpoint" "storage" {
  name                 = "pe-storage-blob"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_storage_account.data.id
  subresource_name     = "blob"
}

resource "azurerm_synapse_managed_private_endpoint" "storage_dfs" {
  name                 = "pe-storage-dfs"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_storage_account.data.id
  subresource_name     = "dfs"
}

Key Vault

resource "azurerm_synapse_managed_private_endpoint" "keyvault" {
  name                 = "pe-keyvault"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_key_vault.this.id
  subresource_name     = "vault"
}

SQL Database

resource "azurerm_synapse_managed_private_endpoint" "sql" {
  name                 = "pe-sql"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_mssql_server.this.id
  subresource_name     = "sqlServer"
}

Event Hub

resource "azurerm_synapse_managed_private_endpoint" "eventhub" {
  name                 = "pe-eventhub"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_eventhub_namespace.this.id
  subresource_name     = "namespace"  # Note: not "eventhub"
}

Approving Private Endpoints

Managed private endpoints require approval on the target resource:

# Auto-approval doesn't work for managed private endpoints
# You need to approve them manually or via automation

resource "null_resource" "approve_storage_pe" {
  depends_on = [azurerm_synapse_managed_private_endpoint.storage]

  provisioner "local-exec" {
    command = <<-EOT
      az network private-endpoint-connection approve \
        --resource-group ${azurerm_resource_group.this.name} \
        --resource-name ${azurerm_storage_account.data.name} \
        --name ${azurerm_synapse_managed_private_endpoint.storage.name} \
        --type Microsoft.Storage/storageAccounts \
        --description "Approved for Synapse"
    EOT
  }
}

Handling External Dependencies

Python Packages

Without internet access, pip install fails. Options:

  1. Pre-built Spark pool image with required packages
  2. Private PyPI mirror with managed private endpoint
  3. Upload packages to linked storage and install from there
# Install from storage
spark.sparkContext.addPyFile("abfss://[email protected]/mypackage.whl")

Workspace Packages

Upload packages to the workspace:

resource "azurerm_synapse_workspace_package" "pandas" {
  name               = "pandas-1.5.0-py3-none-any.whl"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  link               = azurerm_storage_blob.pandas_wheel.url
}

Logging to Log Analytics

For Spark pool diagnostics with DEP, you need:

  1. Log Analytics in the same region
  2. Managed private endpoint to Log Analytics
resource "azurerm_synapse_managed_private_endpoint" "log_analytics" {
  name                 = "pe-log-analytics"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_log_analytics_workspace.this.id
  subresource_name     = "api"  # For data collection
}

Note: This requires Azure Monitor Private Link Scope (AMPLS) configuration.

VPN and ExpressRoute Access

DEP blocks traffic to on-premises too. For hybrid scenarios:

  1. NAT VM approach: Route through a VM in your VNET
  2. Azure Firewall: Allow specific destinations
  3. Private endpoints on-prem: Complex but possible

Most organisations accept that Spark pools can't reach on-prem with DEP enabled.

Debugging Connection Issues

When Spark jobs fail with timeout errors:

# In a notebook, test connectivity
import socket

def test_connection(host, port):
    try:
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.settimeout(5)
        result = sock.connect_ex((host, port))
        sock.close()
        return result == 0
    except:
        return False

# Test your endpoints
endpoints = [
    ("mystorageaccount.blob.core.windows.net", 443),
    ("mystorageaccount.dfs.core.windows.net", 443),
    ("mykeyvault.vault.azure.net", 443),
]

for host, port in endpoints:
    status = "OK" if test_connection(host, port) else "BLOCKED"
    print(f"{host}:{port} - {status}")

Comparison: DEP vs Standard Managed VNET

FeatureDEP EnabledStandard Managed VNET
Outbound internetBlockedAllowed
Azure servicesPrivate endpoint requiredDirect access
pip installBlockedWorks
On-premisesBlockedWorks
SecurityHighestHigh
ComplexityHigherLower

Need help securing your Synapse environment? Get in touch - we help organisations implement secure data platforms.

Need help with your Azure environment?

Get in touch for a free consultation.

Get in Touch