Back to Blog
Azure
4 min read

Site-to-Site VPN with Synapse Data Exfiltration Protection

AzureSynapseVPNNetworkingOn-Premises

You've enabled Data Exfiltration Protection on Synapse for security. Now your Spark pools can't reach on-premises data sources over VPN. Here's what to do.

The Problem

With DEP enabled:

  • All outbound traffic from Spark pools is blocked
  • Only approved managed private endpoints are allowed
  • VPN traffic goes through your hub VNET
  • There's no way to create a "private endpoint" to on-premises

Understanding the Traffic Flow

Without DEP:
Spark Pool → Managed VNET → Peering → Hub VNET → VPN → On-Prem
(Works)

With DEP:
Spark Pool → Managed VNET → BLOCKED
(All egress denied except managed private endpoints)

Option 1: NAT VM Workaround

Create a VM in your VNET that acts as a reverse proxy:

Spark → Managed PE → NAT VM → VPN → On-Prem

Deploy NAT VM

resource "azurerm_network_interface" "nat" {
  name                = "nic-nat-vm"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  ip_configuration {
    name                          = "internal"
    subnet_id                     = azurerm_subnet.nat.id
    private_ip_address_allocation = "Static"
    private_ip_address            = "10.0.5.10"
  }

  # Enable IP forwarding for NAT
  enable_ip_forwarding = true
}

resource "azurerm_linux_virtual_machine" "nat" {
  name                = "vm-nat"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location
  size                = "Standard_B2s"
  admin_username      = "adminuser"

  network_interface_ids = [azurerm_network_interface.nat.id]

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }

  # Configure NAT on boot
  custom_data = base64encode(<<-EOF
    #!/bin/bash
    sysctl -w net.ipv4.ip_forward=1
    iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
    iptables -A FORWARD -i eth0 -j ACCEPT
    EOF
  )
}

Create Managed Private Endpoint to NAT VM

In Synapse, create a managed private endpoint pointing to a Private Link Service on the NAT VM.

# Private Link Service for NAT VM
resource "azurerm_private_link_service" "nat" {
  name                = "pls-nat-vm"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location

  load_balancer_frontend_ip_configuration_ids = [
    azurerm_lb.nat.frontend_ip_configuration[0].id
  ]

  nat_ip_configuration {
    name      = "primary"
    subnet_id = azurerm_subnet.nat.id
    primary   = true
  }
}

# Managed private endpoint in Synapse
resource "azurerm_synapse_managed_private_endpoint" "nat" {
  name                 = "pe-nat"
  synapse_workspace_id = azurerm_synapse_workspace.this.id
  target_resource_id   = azurerm_private_link_service.nat.id
  subresource_name     = ""  # Private Link Service doesn't use subresource
}

Option 2: Data Movement via Linked Services

Instead of direct access, move data through Azure:

On-Prem → Self-Hosted IR → Data Factory → ADLS → Synapse Spark
  1. Deploy Self-Hosted Integration Runtime on-premises
  2. Use Data Factory to copy data to Azure Data Lake
  3. Spark reads from ADLS (via managed private endpoint)
# Data Factory with managed VNET
resource "azurerm_data_factory" "this" {
  name                = "adf-data-ingestion"
  resource_group_name = azurerm_resource_group.this.name
  location            = azurerm_resource_group.this.location

  managed_virtual_network_enabled = true
}

# Self-hosted IR for on-prem connectivity
resource "azurerm_data_factory_integration_runtime_self_hosted" "onprem" {
  name            = "ir-onprem"
  data_factory_id = azurerm_data_factory.this.id
}

Option 3: ExpressRoute with Microsoft Peering

ExpressRoute Microsoft Peering can provide private connectivity to Azure PaaS services:

On-Prem → ExpressRoute → Microsoft Peering → Private Endpoint

This is complex and expensive but provides true private connectivity.

Option 4: Accept the Limitation

Sometimes the right answer is to not use DEP:

  • If on-prem connectivity is essential
  • If the data isn't highly sensitive
  • If other controls provide adequate protection
resource "azurerm_synapse_workspace" "this" {
  # ...

  managed_virtual_network_enabled = true
  data_exfiltration_protection_enabled = false  # Allow outbound
}

Security Compensating Controls

If you disable DEP, implement other controls:

  1. NSG on managed VNET - Restrict destinations
  2. Azure Firewall - Inspect and log traffic
  3. Private endpoints - For Azure resources
  4. Activity monitoring - Log all data access

Comparison

ApproachComplexityCostSecurity
NAT VMHighMediumGood
Data MovementMediumLow-MediumGood
ExpressRoute MS PeeringVery HighHighBest
Disable DEPLowLowLower

Recommendation

For most organisations:

  1. Use Data Movement approach for scheduled data loads
  2. Keep DEP enabled for sensitive workloads
  3. Consider separate Synapse workspace for on-prem connected workloads (without DEP)

Need help with Synapse networking and on-premises integration? Get in touch - we help organisations design secure hybrid data platforms.

Need help with your Azure environment?

Get in touch for a free consultation.

Get in Touch