Synapse Spark pools generate valuable logs - execution metrics, errors, custom application logging. But with data exfiltration protection enabled, getting those logs to Log Analytics requires specific configuration.
The Challenge
With DEP enabled:
- Spark pools have no internet access
- Outbound traffic is blocked except to approved private endpoints
- Log Analytics ingestion endpoint isn't reachable by default
Solution: Azure Monitor Private Link Scope
AMPLS creates private endpoints for Log Analytics ingestion.
1. Create AMPLS
resource "azurerm_monitor_private_link_scope" "this" {
name = "ampls-synapse"
resource_group_name = azurerm_resource_group.this.name
# Use Open mode
ingestion_access_mode = "Open"
query_access_mode = "Open"
}
resource "azurerm_monitor_private_link_scoped_service" "workspace" {
name = "link-to-law"
resource_group_name = azurerm_resource_group.this.name
scope_name = azurerm_monitor_private_link_scope.this.name
linked_resource_id = azurerm_log_analytics_workspace.this.id
}
2. Create Private Endpoint for AMPLS
resource "azurerm_private_endpoint" "ampls" {
name = "pe-ampls"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
subnet_id = azurerm_subnet.private_endpoints.id
private_service_connection {
name = "psc-ampls"
private_connection_resource_id = azurerm_monitor_private_link_scope.this.id
subresource_names = ["azuremonitor"]
is_manual_connection = false
}
private_dns_zone_group {
name = "dns-group"
private_dns_zone_ids = [
azurerm_private_dns_zone.monitor.id,
azurerm_private_dns_zone.oms.id,
azurerm_private_dns_zone.ods.id,
azurerm_private_dns_zone.agentsvc.id,
]
}
}
3. Create Required DNS Zones
locals {
monitor_zones = [
"privatelink.monitor.azure.com",
"privatelink.oms.opinsights.azure.com",
"privatelink.ods.opinsights.azure.com",
"privatelink.agentsvc.azure-automation.net",
"privatelink.blob.core.windows.net" # For data collection
]
}
resource "azurerm_private_dns_zone" "monitor" {
for_each = toset(local.monitor_zones)
name = each.value
resource_group_name = azurerm_resource_group.this.name
}
4. Link DNS Zones to Synapse Managed VNET
This is the tricky part. Synapse managed VNETs don't appear in your subscription, but they need DNS resolution.
Option A: Hub-spoke with DNS forwarding
If your Synapse workspace's managed VNET peers with your hub:
resource "azurerm_private_dns_zone_virtual_network_link" "hub" {
for_each = toset(local.monitor_zones)
name = "link-hub"
resource_group_name = azurerm_resource_group.this.name
private_dns_zone_name = azurerm_private_dns_zone.monitor[each.key].name
virtual_network_id = azurerm_virtual_network.hub.id
}
Option B: Managed VNET DNS configuration
Configure in Synapse:
resource "azurerm_synapse_workspace" "this" {
# ... other config ...
managed_virtual_network_enabled = true
data_exfiltration_protection_enabled = true
# The managed VNET uses Azure DNS by default
# DNS resolution flows through private endpoint
}
Enabling Diagnostic Settings
Configure Synapse to send Spark logs:
resource "azurerm_monitor_diagnostic_setting" "synapse" {
name = "synapse-diagnostics"
target_resource_id = azurerm_synapse_workspace.this.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id
enabled_log {
category = "SynapseRbacOperations"
}
enabled_log {
category = "GatewayApiRequests"
}
enabled_log {
category = "BuiltinSqlReqsEnded"
}
enabled_log {
category = "IntegrationPipelineRuns"
}
enabled_log {
category = "IntegrationActivityRuns"
}
enabled_log {
category = "IntegrationTriggerRuns"
}
metric {
category = "AllMetrics"
enabled = true
}
}
Spark Pool Specific Logging
resource "azurerm_monitor_diagnostic_setting" "spark" {
name = "spark-diagnostics"
target_resource_id = azurerm_synapse_spark_pool.this.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.this.id
enabled_log {
category = "BigDataPoolAppsEnded"
}
metric {
category = "AllMetrics"
enabled = true
}
}
Custom Logging from Spark
In your notebooks/jobs:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
log4j = spark._jvm.org.apache.log4j
logger = log4j.LogManager.getLogger("MyApp")
# These logs go to the Spark driver logs
logger.info("Processing started")
logger.warn("Missing data in partition")
logger.error("Failed to connect to source")
Query in Log Analytics:
SynapseSparkPoolLogs
| where SparkPoolName_s == "sparkpool"
| where LogLevel_s == "ERROR"
| project TimeGenerated, Message_s, ApplicationId_s
Troubleshooting
Logs Not Appearing
- Check private endpoint is approved
- Verify DNS resolution:
import socket socket.gethostbyname("workspace-id.ods.opinsights.azure.com") # Should return private IP, not public - Check AMPLS includes your workspace
Connection Timeouts
- Verify managed private endpoint to Log Analytics exists
- Check AMPLS mode is "Open" not "PrivateOnly"
Partial Logs
- Ensure all required DNS zones are linked
- Check diagnostic setting includes all categories
Need help with Synapse logging and monitoring? Get in touch - we help organisations build observable data platforms.