Synapse Spark Pool Logging with Private Endpoints

Synapse Spark pools generate valuable logs - execution metrics, errors, custom application logging. But with data exfiltration protection enabled, getting those logs to Log Analytics requires specific configuration.

The Challenge

With DEP enabled:

Spark pools have no internet access
Outbound traffic is blocked except to approved private endpoints
Log Analytics ingestion endpoint isn't reachable by default

Solution: Azure Monitor Private Link Scope

AMPLS creates private endpoints for Log Analytics ingestion.

1. Create AMPLS

resource "azurerm_monitor_private_link_scope" "this" {
  name                = "ampls-synapse"
  resource_group_name = azurerm_resource_group.this.name

  # Use Open mode
  ingestion_access_mode = "Open"
  query_access_mode     = "Open"
}

resource "azurerm_monitor_private_link_scoped_service" "workspace" {
  name                = "link-to-law"
  resource_group_name = azurerm_resource_group.this.name
  scope_name          = azurerm_monitor_private_link_scope.this.name
  linked_resource_id  = azurerm_log_analytics_workspace.this.id
}

2. Create Private Endpoint for AMPLS

resource "azurerm_private_endpoint" "ampls" {
  name                = "pe-ampls"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  subnet_id           = azurerm_subnet.private_endpoints.id

  private_service_connection {
    name                           = "psc-ampls"
    private_connection_resource_id = azurerm_monitor_private_link_scope.this.id
    subresource_names              = ["azuremonitor"]
    is_manual_connection           = false
  }

  private_dns_zone_group {
    name = "dns-group"
    private_dns_zone_ids = [
      azurerm_private_dns_zone.monitor.id,
      azurerm_private_dns_zone.oms.id,
      azurerm_private_dns_zone.ods.id,
      azurerm_private_dns_zone.agentsvc.id,
    ]
  }
}

3. Create Required DNS Zones

locals {
  monitor_zones = [
    "privatelink.monitor.azure.com",
    "privatelink.oms.opinsights.azure.com",
    "privatelink.ods.opinsights.azure.com",
    "privatelink.agentsvc.azure-automation.net",
    "privatelink.blob.core.windows.net"  # For data collection
  ]
}

resource "azurerm_private_dns_zone" "monitor" {
  for_each            = toset(local.monitor_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.this.name
}

4. Link DNS Zones to Synapse Managed VNET

This is the tricky part. Synapse managed VNETs don't appear in your subscription, but they need DNS resolution.

Option A: Hub-spoke with DNS forwarding

If your Synapse workspace's managed VNET peers with your hub:

resource "azurerm_private_dns_zone_virtual_network_link" "hub" {
  for_each              = toset(local.monitor_zones)
  name                  = "link-hub"
  resource_group_name   = azurerm_resource_group.this.name
  private_dns_zone_name = azurerm_private_dns_zone.monitor[each.key].name
  virtual_network_id    = azurerm_virtual_network.hub.id
}

Option B: Managed VNET DNS configuration

Configure in Synapse:

resource "azurerm_synapse_workspace" "this" {
  # ... other config ...

  managed_virtual_network_enabled = true
  data_exfiltration_protection_enabled = true

  # The managed VNET uses Azure DNS by default
  # DNS resolution flows through private endpoint
}

Enabling Diagnostic Settings

Configure Synapse to send Spark logs:

resource "azurerm_monitor_diagnostic_setting" "synapse" {
  name                       = "synapse-diagnostics"
  target_resource_id         = azurerm_synapse_workspace.this.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

  enabled_log {
    category = "SynapseRbacOperations"
  }

  enabled_log {
    category = "GatewayApiRequests"
  }

  enabled_log {
    category = "BuiltinSqlReqsEnded"
  }

  enabled_log {
    category = "IntegrationPipelineRuns"
  }

  enabled_log {
    category = "IntegrationActivityRuns"
  }

  enabled_log {
    category = "IntegrationTriggerRuns"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Spark Pool Specific Logging

resource "azurerm_monitor_diagnostic_setting" "spark" {
  name                       = "spark-diagnostics"
  target_resource_id         = azurerm_synapse_spark_pool.this.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

  enabled_log {
    category = "BigDataPoolAppsEnded"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Custom Logging from Spark

In your notebooks/jobs:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
log4j = spark._jvm.org.apache.log4j
logger = log4j.LogManager.getLogger("MyApp")

# These logs go to the Spark driver logs
logger.info("Processing started")
logger.warn("Missing data in partition")
logger.error("Failed to connect to source")

Query in Log Analytics:

SynapseSparkPoolLogs
| where SparkPoolName_s == "sparkpool"
| where LogLevel_s == "ERROR"
| project TimeGenerated, Message_s, ApplicationId_s

Troubleshooting

Logs Not Appearing

Check private endpoint is approved

Verify DNS resolution:

import socket
socket.gethostbyname("workspace-id.ods.opinsights.azure.com")
# Should return private IP, not public

Check AMPLS includes your workspace

Connection Timeouts

Verify managed private endpoint to Log Analytics exists
Check AMPLS mode is "Open" not "PrivateOnly"

Partial Logs

Ensure all required DNS zones are linked
Check diagnostic setting includes all categories

Need help with Synapse logging and monitoring? Get in touch - we help organisations build observable data platforms.