Back to Blog
Azure
3 min read

Synapse Spark Pool Logging with Private Endpoints

AzureSynapseSparkLoggingPrivate Endpoints

Synapse Spark pools generate valuable logs - execution metrics, errors, custom application logging. But with data exfiltration protection enabled, getting those logs to Log Analytics requires specific configuration.

The Challenge

With DEP enabled:

  • Spark pools have no internet access
  • Outbound traffic is blocked except to approved private endpoints
  • Log Analytics ingestion endpoint isn't reachable by default

Solution: Azure Monitor Private Link Scope

AMPLS creates private endpoints for Log Analytics ingestion.

1. Create AMPLS

resource "azurerm_monitor_private_link_scope" "this" {
  name                = "ampls-synapse"
  resource_group_name = azurerm_resource_group.this.name

  # Use Open mode
  ingestion_access_mode = "Open"
  query_access_mode     = "Open"
}

resource "azurerm_monitor_private_link_scoped_service" "workspace" {
  name                = "link-to-law"
  resource_group_name = azurerm_resource_group.this.name
  scope_name          = azurerm_monitor_private_link_scope.this.name
  linked_resource_id  = azurerm_log_analytics_workspace.this.id
}

2. Create Private Endpoint for AMPLS

resource "azurerm_private_endpoint" "ampls" {
  name                = "pe-ampls"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  subnet_id           = azurerm_subnet.private_endpoints.id

  private_service_connection {
    name                           = "psc-ampls"
    private_connection_resource_id = azurerm_monitor_private_link_scope.this.id
    subresource_names              = ["azuremonitor"]
    is_manual_connection           = false
  }

  private_dns_zone_group {
    name = "dns-group"
    private_dns_zone_ids = [
      azurerm_private_dns_zone.monitor.id,
      azurerm_private_dns_zone.oms.id,
      azurerm_private_dns_zone.ods.id,
      azurerm_private_dns_zone.agentsvc.id,
    ]
  }
}

3. Create Required DNS Zones

locals {
  monitor_zones = [
    "privatelink.monitor.azure.com",
    "privatelink.oms.opinsights.azure.com",
    "privatelink.ods.opinsights.azure.com",
    "privatelink.agentsvc.azure-automation.net",
    "privatelink.blob.core.windows.net"  # For data collection
  ]
}

resource "azurerm_private_dns_zone" "monitor" {
  for_each            = toset(local.monitor_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.this.name
}

4. Link DNS Zones to Synapse Managed VNET

This is the tricky part. Synapse managed VNETs don't appear in your subscription, but they need DNS resolution.

Option A: Hub-spoke with DNS forwarding

If your Synapse workspace's managed VNET peers with your hub:

resource "azurerm_private_dns_zone_virtual_network_link" "hub" {
  for_each              = toset(local.monitor_zones)
  name                  = "link-hub"
  resource_group_name   = azurerm_resource_group.this.name
  private_dns_zone_name = azurerm_private_dns_zone.monitor[each.key].name
  virtual_network_id    = azurerm_virtual_network.hub.id
}

Option B: Managed VNET DNS configuration

Configure in Synapse:

resource "azurerm_synapse_workspace" "this" {
  # ... other config ...

  managed_virtual_network_enabled = true
  data_exfiltration_protection_enabled = true

  # The managed VNET uses Azure DNS by default
  # DNS resolution flows through private endpoint
}

Enabling Diagnostic Settings

Configure Synapse to send Spark logs:

resource "azurerm_monitor_diagnostic_setting" "synapse" {
  name                       = "synapse-diagnostics"
  target_resource_id         = azurerm_synapse_workspace.this.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

  enabled_log {
    category = "SynapseRbacOperations"
  }

  enabled_log {
    category = "GatewayApiRequests"
  }

  enabled_log {
    category = "BuiltinSqlReqsEnded"
  }

  enabled_log {
    category = "IntegrationPipelineRuns"
  }

  enabled_log {
    category = "IntegrationActivityRuns"
  }

  enabled_log {
    category = "IntegrationTriggerRuns"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Spark Pool Specific Logging

resource "azurerm_monitor_diagnostic_setting" "spark" {
  name                       = "spark-diagnostics"
  target_resource_id         = azurerm_synapse_spark_pool.this.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id

  enabled_log {
    category = "BigDataPoolAppsEnded"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Custom Logging from Spark

In your notebooks/jobs:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
log4j = spark._jvm.org.apache.log4j
logger = log4j.LogManager.getLogger("MyApp")

# These logs go to the Spark driver logs
logger.info("Processing started")
logger.warn("Missing data in partition")
logger.error("Failed to connect to source")

Query in Log Analytics:

SynapseSparkPoolLogs
| where SparkPoolName_s == "sparkpool"
| where LogLevel_s == "ERROR"
| project TimeGenerated, Message_s, ApplicationId_s

Troubleshooting

Logs Not Appearing

  1. Check private endpoint is approved
  2. Verify DNS resolution:
    import socket
    socket.gethostbyname("workspace-id.ods.opinsights.azure.com")
    # Should return private IP, not public
    
  3. Check AMPLS includes your workspace

Connection Timeouts

  • Verify managed private endpoint to Log Analytics exists
  • Check AMPLS mode is "Open" not "PrivateOnly"

Partial Logs

  • Ensure all required DNS zones are linked
  • Check diagnostic setting includes all categories

Need help with Synapse logging and monitoring? Get in touch - we help organisations build observable data platforms.

Need help with your Azure environment?

Get in touch for a free consultation.

Get in Touch