Fabric Monitoring and Troubleshooting: Monitoring Hub, Audit Logs, Error Resolution for Pipelines, Notebooks, Dataflows, Eventstreams, Shortcuts, and Deployment Errors

The Monitoring Hub
Monitoring Pipeline Runs
Monitoring Notebook Runs
Monitoring Dataflow Gen2 Runs
Monitoring Semantic Model Refresh
Monitoring Eventstream and Eventhouse
Fabric Audit Logs
Error Resolution by Item Type
Deployment Pipeline Errors
Setting Up Proactive Monitoring
Common Mistakes
Interview Questions
Wrapping Up

Building data pipelines is half the job. The other half is MONITORING them — knowing when they fail, WHY they fail, and HOW to fix them. This post is your troubleshooting manual for every Fabric item type.

The Monitoring Hub
Monitoring Pipeline Runs
Monitoring Notebook Runs
Monitoring Dataflow Gen2 Runs
Monitoring Semantic Model Refresh
Monitoring Eventstream and Eventhouse
Fabric Audit Logs
Enabling and Accessing Audit Logs
Key Audit Events
Error Resolution by Item Type
Pipeline Errors and Fixes
Notebook Errors and Fixes
Dataflow Gen2 Errors and Fixes
Eventstream Errors and Fixes
Eventhouse/KQL Errors and Fixes
OneLake Shortcut Errors and Fixes
T-SQL Errors and Fixes
Deployment Pipeline Errors
What Can and Cannot Be Deployed
Common Deployment Failures
Setting Up Proactive Monitoring
Common Mistakes
Interview Questions
Wrapping Up

The Monitoring Hub

The Monitoring Hub is Fabric’s centralized monitoring dashboard:

Click Monitor in the left sidebar
See ALL runs across ALL items in the workspace
Filter by: item type, status (Success/Failed/InProgress), date range

Monitoring Hub shows:
  Item Name          | Type         | Status  | Duration | Start Time
  PL_Daily_ETL       | Pipeline     | Failed  | 12m 30s  | 2026-06-05 06:00
  NB_Clean_Customers | Notebook     | Success | 3m 15s   | 2026-06-05 06:12
  DF_Transform_Orders| Dataflow Gen2| Success | 5m 42s   | 2026-06-05 06:16
  Sales_Model        | Sem. Model   | Success | 1m 08s   | 2026-06-05 06:22

Click any row to drill into run details, activity durations, and error messages.

Monitoring Pipeline Runs

Pipeline run details show:
  ┌──────────────────────────────────────────┐
  │ Copy_Customers ──► DF_Clean ──► NB_Gold  │
  │      ✅              ✅           ❌      │
  │    45 sec          2m 10s       FAILED    │
  └──────────────────────────────────────────┘

Click the failed activity (NB_Gold):
  Error: "SparkException: Table gold.dim_customer does not exist"
  → Fix: Create the table first, or check the lakehouse attachment

Key Pipeline Monitoring Metrics

Rows read / rows written (for Copy activities)
Duration per activity
Error message and error code
Pipeline run ID (for support tickets)

Monitoring Notebook Runs

In the Monitoring Hub, click a notebook run to see: – Cell-by-cell execution status – Duration per cell – Spark UI link (for performance analysis) – Error traceback (Python/Scala stack trace)

# Add monitoring WITHIN your notebook
from datetime import datetime
start = datetime.now()

# ... your transformation logic ...

duration = (datetime.now() - start).total_seconds()
print(f"Completed in {duration:.0f} seconds. Rows: {df.count()}")

Monitoring Dataflow Gen2 Runs

Right-click Dataflow Gen2 → Refresh history
See: status, duration, start/end time, error details
Common metrics: rows processed, data destination writes

Monitoring Semantic Model Refresh

Workspace → Semantic Model → Refresh history
See: refresh type (Direct Lake/Import), duration, status
For Direct Lake: check if fallback to DirectQuery occurred

Monitoring Eventstream and Eventhouse

Streaming workloads need continuous monitoring because they run 24/7 — unlike pipelines that run once and finish:

EVENTSTREAM monitoring:
  Open Eventstream item → see the visual canvas with live metrics:
    - Events ingested per second (source throughput)
    - Events delivered per second (destination throughput)
    - Latency (time from source to destination)
    - Error count (deserialization failures, write failures)

  Healthy indicators:
    ✅ Ingested ≈ Delivered (no data loss)
    ✅ Latency < 5 seconds (near real-time)
    ✅ Error count = 0

  Problem indicators:
    ⚠️ Ingested >> Delivered (backlog growing — destination cannot keep up)
    ⚠️ Latency > 30 seconds (falling behind)
    🔴 Error count rising (schema mismatch or destination full)

EVENTHOUSE monitoring:
  Open KQL Database → Monitoring:
    - Ingestion success/failure rate
    - Storage size and growth rate
    - Cache hit ratio (hot vs cold reads)
    - Materialized view health (lag, last refresh)

  Key KQL monitoring queries:
    .show ingestion failures                    // Recent ingestion errors
    .show materialized-views                    // All views with health status
    .show table sensor_readings extents          // Number of data shards
    .show capacity                               // Current resource usage

In the Monitoring Hub, Eventstream runs appear alongside pipeline and notebook runs. For long-running streams, check the Eventstream canvas directly for real-time metrics rather than relying solely on the Monitoring Hub.

Fabric Audit Logs

Audit logs track WHO did WHAT and WHEN across your entire Fabric tenant:

Enabling and Accessing Audit Logs

Admin Portal → Audit logs (or use Microsoft Purview compliance portal)
Audit logs are enabled by default for Fabric
Access via: Purview compliance portal → Audit → Search

Key Audit Events

Event	What It Tracks
CreateWorkspace	Who created a workspace
DeleteWorkspace	Who deleted a workspace
UpdateWorkspaceAccess	Who changed workspace permissions
ViewReport	Who viewed a Power BI report
ExportReport	Who exported data from a report
RunPipeline	Who triggered a pipeline
UpdateDataset	Who modified a semantic model
ShareItem	Who shared an item externally

Audit log entry:
  Activity: UpdateWorkspaceAccess
  User: admin@company.com
  Target: DataEng_Prod workspace
  Detail: Added analyst@company.com as Viewer
  Timestamp: 2026-06-05 14:30:00

Error Resolution by Item Type

Pipeline Errors and Fixes

Error	Cause	Fix
Connection failed	Credentials expired or source down	Refresh connection credentials
Copy activity timeout	Large data + slow source	Increase timeout, add parallelism
Mapping error	Source schema changed (new/dropped column)	Update column mapping
Activity dependency failed	Previous activity failed	Fix upstream activity first
Parameter error	Missing or wrong parameter type	Verify parameter names and types
Insufficient capacity	CU exhausted	Scale up or stagger pipeline schedule

Notebook Errors and Fixes

Error	Cause	Fix
Table not found	Wrong lakehouse attached or table missing	Check default lakehouse, verify table exists
OutOfMemoryError	Data too large for driver/executors	Increase memory, reduce partitions, filter earlier
ModuleNotFoundError	Library not installed	Add to Environment or use %pip install
Permission denied	User lacks access to source data	Check workspace role and OneLake permissions
Session timeout	Idle session expired	Re-run the notebook, increase timeout setting
Schema mismatch on write	Target table schema differs from DataFrame	Use overwriteSchema option or ALTER table

Dataflow Gen2 Errors and Fixes

Error	Cause	Fix
Source connection failed	Credentials or endpoint changed	Update connection in workspace settings
Type conversion error	Data contains invalid values for target type	Add error handling (Replace Errors) before destination
Destination write failed	Schema mismatch or permission issue	Check column mapping, verify write permissions
Timeout	Too much data for Power Query engine	Filter at source (query folding), reduce data volume
Expression error	Bad M formula in custom column	Check M syntax, use try…otherwise

Eventstream Errors and Fixes

Error	Cause	Fix
Ingestion lag	Source producing faster than consuming	Scale destination, add more partitions
Deserialization error	Event format mismatch (expected JSON, got binary)	Fix source format or update schema in Eventstream
Destination write failed	Eventhouse table schema mismatch	Update table schema to match events
Connection lost	Event Hub namespace down or key expired	Check Event Hub health, rotate keys

Eventhouse/KQL Errors and Fixes

Error	Cause	Fix
KQL query timeout	No time filter — scanning entire table	Add `\| where timestamp > ago(1h)` early in the query
Storage limit reached	No retention policy — data growing forever	Set retention: `.alter table T policy retention`
Materialized view stale	View refresh failed or was disabled	Check: `.show materialized-view V extents`, re-enable if disabled
Ingestion failure	Schema mismatch between source events and table columns	Check: `.show ingestion failures`, update table schema or fix source
Slow queries on old data	Querying data outside the hot cache	Increase hot cache: `.alter table T policy caching hot = 30d`
Accelerated shortcut stale	Shortcut acceleration refresh failed	Check source connectivity, restart acceleration
Function not found	Stored function was dropped or renamed	Check: `.show functions`, recreate if missing

OneLake Shortcut Errors and Fixes

Error	Cause	Fix
Shortcut not accessible	Source storage credentials expired	Update connection credentials
Data not showing	Shortcut path incorrect	Verify the exact container/folder path
Permission denied	Missing Fabric Read permission on containing item	Grant Read permission to the user
Cross-cloud timeout	S3/GCS egress slow	Enable shortcut caching
Stale cached data	Cache not refreshing	Check cache settings, force refresh

T-SQL Errors and Fixes

Error	Cause	Fix
Invalid object name ‘schema.table’	Table does not exist or wrong schema prefix	Check: `SELECT * FROM INFORMATION_SCHEMA.TABLES`, verify schema.table name
Cannot insert into table (read-only)	Trying to INSERT into Lakehouse SQL endpoint (read-only)	Use Spark notebook or pipeline for writes to Lakehouse. Only Warehouse supports T-SQL writes
Function not supported	Using a T-SQL function not available in Fabric Warehouse	Check Fabric Warehouse T-SQL surface area — some SQL Server functions are not supported
Cross-database query failed	Referencing a table in another lakehouse/warehouse without proper syntax	Use three-part name: `other_lakehouse.dbo.table_name`
Permission denied on SELECT	User lacks SELECT on the schema or table	GRANT SELECT on the schema: `GRANT SELECT ON SCHEMA::gold TO [user]`
Stored procedure failed	SP references objects that do not exist in this environment	Verify all referenced tables exist, check deployment rules for environment differences
Query timeout (Warehouse)	Complex query on large table without optimization	Add WHERE filters, check table statistics, run OPTIMIZE on source Delta tables

Deployment Pipeline Errors

What Can and Cannot Be Deployed

Item	Deployable?
Notebooks	✅ Yes
Pipelines	✅ Yes
Dataflow Gen2	✅ Yes
Semantic Models	✅ Yes
Reports	✅ Yes
Lakehouse (metadata)	✅ Yes
Warehouse (metadata)	✅ Yes
Spark Environments	✅ Yes
Lakehouse/Warehouse DATA	❌ No (only structure)
Mirrored Databases	❌ No
Eventstreams	❌ No
KQL Databases	❌ No
Connections/Gateways	❌ No (must be created per environment)
Workspace roles	❌ No (must be set per workspace)

Common Deployment Failures

Error	Cause	Fix
Item already exists	Name conflict in target workspace	Rename or delete the conflicting item
Deployment rule missing	Connection not swapped for target environment	Add deployment rule for the data source
Permission denied	User lacks deploy permission	Ensure Admin/Member role on target workspace
Dependent item missing	Item references something not in the pipeline	Add the dependency to the deployment pipeline
Unsupported item type	Trying to deploy a non-deployable item	Remove from deployment (Eventstream, Mirrored DB)

Setting Up Proactive Monitoring

Reactive: Check Monitoring Hub when something seems wrong
Proactive: Get notified BEFORE anyone complains

Pipeline: Add Teams/Outlook activity on red (failure) path
Notebook: Return exit value with status → pipeline checks and alerts
Data Activator: Monitor etl_log table → alert on status='FAILED'
Capacity Metrics: Dashboard shows CU usage → alert before throttling

Common Mistakes

Checking monitoring only when users complain — set up proactive alerts (pipeline failure → Teams notification).
Not reading the full error message — the first line is generic, the details (stack trace, error code) tell you the actual cause.
Not using audit logs for compliance — auditors will ask “who accessed this data?” Audit logs answer that.
Deploying without deployment rules — Dev connections in Prod = reading Dev data in Production. Always set rules.
Assuming data deploys with items — deployment pipelines deploy DEFINITIONS, not data. Pipelines must run in each environment to populate data.

Interview Questions

Q: How do you monitor Fabric items? A: Through the Monitoring Hub (centralized view of all runs across workspaces), item-specific refresh history, Spark UI for notebooks, Capacity Metrics app for CU usage, and audit logs for governance. Proactive monitoring uses pipeline failure activities (Teams/Outlook) and Data Activator alerts.

Q: What can and cannot be deployed via deployment pipelines? A: Deployable: notebooks, pipelines, dataflows, semantic models, reports, lakehouse/warehouse metadata, Spark environments. NOT deployable: actual data, mirrored databases, eventstreams, KQL databases, connections, workspace roles. Data must be loaded via pipelines in each environment separately.

Wrapping Up

Monitoring is not optional — it is the difference between finding problems at 6:01 AM and finding them at 9 AM when the CEO asks why the dashboard is empty. Monitor proactively, read error messages fully, use audit logs for compliance, and always test deployments with rules.

Related posts: – Fabric Data Factory – Git Integration & CI/CD – Data Activator – Administration & Cost

← Previous: Capacity Metrics App Fabric (34/38) Next: Optimization Guide →

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Fabric Monitoring and Troubleshooting: Monitoring Hub, Audit Logs, Error Resolution for Pipelines, Notebooks, Dataflows, Eventstreams, Shortcuts, and Deployment Errors

Table of Contents

The Monitoring Hub

Monitoring Pipeline Runs

Key Pipeline Monitoring Metrics

Monitoring Notebook Runs

Monitoring Dataflow Gen2 Runs

Monitoring Semantic Model Refresh

Monitoring Eventstream and Eventhouse

Fabric Audit Logs

Enabling and Accessing Audit Logs

Key Audit Events

Error Resolution by Item Type

Pipeline Errors and Fixes

Notebook Errors and Fixes

Dataflow Gen2 Errors and Fixes

Eventstream Errors and Fixes

Eventhouse/KQL Errors and Fixes

OneLake Shortcut Errors and Fixes

T-SQL Errors and Fixes

Deployment Pipeline Errors

What Can and Cannot Be Deployed

Common Deployment Failures

Setting Up Proactive Monitoring

Common Mistakes

Interview Questions

Wrapping Up

Leave a Comment Cancel Reply

Table of Contents

The Monitoring Hub

Monitoring Pipeline Runs

Key Pipeline Monitoring Metrics

Monitoring Notebook Runs

Monitoring Dataflow Gen2 Runs

Monitoring Semantic Model Refresh

Monitoring Eventstream and Eventhouse

Fabric Audit Logs

Enabling and Accessing Audit Logs

Key Audit Events

Error Resolution by Item Type

Pipeline Errors and Fixes

Notebook Errors and Fixes

Dataflow Gen2 Errors and Fixes

Eventstream Errors and Fixes

Eventhouse/KQL Errors and Fixes

OneLake Shortcut Errors and Fixes

T-SQL Errors and Fixes

Deployment Pipeline Errors

What Can and Cannot Be Deployed

Common Deployment Failures

Setting Up Proactive Monitoring

Common Mistakes

Interview Questions

Wrapping Up

Related Posts

Leave a Comment Cancel Reply