DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

The DP-700 (Implementing Data Engineering Solutions Using Microsoft Fabric) is Microsoft’s certification for Fabric Data Engineers. It validates your ability to design, implement, secure, monitor, and optimize data engineering solutions on Microsoft Fabric.

This post is your study companion — every exam objective mapped to the DriveDataScience post that covers it, a recommended study plan, exam-day tips, and the key concepts you MUST know for each domain.

This guide is based on the official skills measured as of April 20, 2026. The exam has three equally weighted domains (~33% each). You need a score of 700 or higher to pass.

Table of Contents

  • Exam Overview
  • The Three Domains
  • Domain 1: Implement and Manage an Analytics Solution (30-35%)
  • Domain 2: Ingest and Transform Data (30-35%)
  • Domain 3: Monitor and Optimize an Analytics Solution (30-35%)
  • The 8-Week Study Plan
  • Key Concepts You MUST Know (Quick Reference)
  • Exam-Day Tips
  • Practice Questions (Sample Scenarios)
  • Resources
  • Wrapping Up

Exam Overview

Detail Info
Exam code DP-700
Full name Implementing Data Engineering Solutions Using Microsoft Fabric
Certification Microsoft Certified: Fabric Data Engineer Associate
Passing score 700 / 1000
Duration 120 minutes
Question types Multiple choice, drag-and-drop, case studies, yes/no scenarios
Languages English, Chinese, French, German, Japanese, Portuguese, Spanish
Prerequisite None (DP-900 knowledge recommended)
Cost $165 USD
Renewal Annual (free online assessment)
In-exam resources Microsoft Learn accessible during exam (split screen)

The Three Domains

┌─────────────────────────────────────────────────────┐
│  DP-700: Implementing Data Engineering Solutions     │
│  Using Microsoft Fabric                              │
│                                                      │
│  ┌──────────────────────────────────────────┐        │
│  │ Domain 1: Implement & Manage (30-35%)    │        │
│  │  Workspace config, CI/CD, security,      │        │
│  │  orchestration                            │        │
│  ├──────────────────────────────────────────┤        │
│  │ Domain 2: Ingest & Transform (30-35%)    │        │
│  │  Loading patterns, batch, streaming,      │        │
│  │  shortcuts, mirroring, transformations    │        │
│  ├──────────────────────────────────────────┤        │
│  │ Domain 3: Monitor & Optimize (30-35%)    │        │
│  │  Monitoring, error resolution,            │        │
│  │  performance optimization                 │        │
│  └──────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────┘

Domain 1: Implement and Manage an Analytics Solution (30-35%)

Configure Microsoft Fabric Workspace Settings

Exam Objective Read This Post Key Concepts
Configure Spark workspace settings Apache Spark in Fabric Shuffle partitions, AQE, V-Order, starter pools vs custom environments
Configure Spark workspace settings Fabric Notebooks Spark environments, library management, runtime versions
Configure domain workspace settings Capacity, Workspaces & Items Domains, workspace creation, capacity assignment
Configure OneLake workspace settings OneLake Deep Dive OneLake settings, cache, shortcuts config
Configure Dataflows Gen2 workspace settings Dataflow Gen2: Production Query folding, staging, performance settings
Configure Dataflows Gen2 workspace settings M Language Complete Guide M functions, custom columns, error handling, query folding

What to memorize: Spark shuffle partitions default (200), V-Order default (enabled), AQE default (enabled), starter pool startup time (~10 seconds), runtime versions.

Implement Lifecycle Management in Fabric

Exam Objective Read This Post Key Concepts
Configure version control Git Integration & CI/CD Azure DevOps, GitHub, commit/sync, branching
Implement database projects Git Integration & CI/CD What is tracked in Git (definitions, not data)
Create and configure deployment pipelines Git Integration & CI/CD Dev → Test → Prod stages, deployment rules, selective deployment
Deployment errors Monitoring & Troubleshooting What can/cannot be deployed, common deployment failures

What to memorize: Git tracks item DEFINITIONS not DATA. Deployable: notebooks, pipelines, dataflows, semantic models, reports. NOT deployable: data, mirrored databases, eventstreams, KQL databases, connections. Deployment rules swap connections per environment.

Configure Security and Governance

Exam Objective Read This Post Key Concepts
Workspace-level access controls Security & Governance Admin, Member, Contributor, Viewer roles
Item-level access controls Security & Governance Sharing items, Read/Build/Reshare permissions
RLS, CLS, object-level, file-level Security & Governance RLS filter functions, GRANT on columns, OneLake data access roles
RLS, CLS in Warehouse Warehouse Practical Guide T-SQL security policy, GRANT/DENY
RLS, CLS in Warehouse Warehouse Advanced SSMS connectivity, granting access
Dynamic data masking Security & Governance email(), partial(), default() masking functions
Sensitivity labels Security & Governance Confidential, Internal, Public — labels flow downstream
Endorse items Security & Governance Promoted vs Certified endorsement
Audit logs Monitoring & Troubleshooting Key audit events, Purview compliance portal
OneLake security OneLake Deep Dive OneLake data access roles, workspace roles

What to memorize: 7 security layers (workspace → item → OneLake data access → RLS → CLS → masking → sensitivity labels). RLS uses filter predicate functions + security policies. CLS uses GRANT SELECT on specific columns. Dynamic masking functions: email(), partial(), default(). Sensitivity labels inherit downstream.

Orchestrate Processes

Exam Objective Read This Post Key Concepts
Choose between Dataflow Gen2, pipeline, notebook Fabric Data Factory Decision matrix: DFG2 for simple cleaning, notebook for complex, pipeline for orchestration
Choose between Dataflow Gen2, pipeline, notebook Dataflow Gen2: Production The complete decision guide (20-scenario table)
Schedules and event-based triggers Triggers & Scheduling Schedule, event, tumbling window triggers
Orchestration with notebooks and pipelines Fabric Notebooks Parameters, mssparkutils.notebook.run, %run, exit values
Orchestration patterns Triggers & Scheduling Master-child, conditional, fan-out fan-in, retry
Dynamic expressions Triggers & Scheduling @utcNow(), @formatDateTime(), @pipeline().parameters

What to memorize: Dataflow Gen2 = no-code simple transforms. Notebook = code, complex logic, Delta MERGE. Pipeline = orchestration. Parameters: notebook reads with mssparkutils.widgets.get(), returns with mssparkutils.notebook.exit(). Pipeline expressions: @utcNow(), @formatDateTime(), @activity().output.


Domain 2: Ingest and Transform Data (30-35%)

Design and Implement Loading Patterns

Exam Objective Read This Post Key Concepts
Full and incremental data loads Dataflow Gen2: Production Full (Replace mode) vs incremental (Append + date filter/watermark)
Full and incremental data loads Medallion Architecture Bronze (raw) → Silver (clean) → Gold (enriched)
Prepare data for dimensional model Normalization & Star Schema Fact tables, dimension tables, surrogate keys, SCD
Prepare data for dimensional model SCD Types SCD Type 0, 1, 2, 3, 6
Loading pattern for streaming data Spark Structured Streaming Checkpoints, output modes, watermarks, foreachBatch
Loading pattern for streaming data Real-Time Intelligence Eventstream → Eventhouse, dual-path architecture

Ingest and Transform Batch Data

Exam Objective Read This Post Key Concepts
Choose appropriate data store Lakehouse vs Warehouse Lakehouse (PySpark, unstructured) vs Warehouse (T-SQL, structured)
Choose between DFG2, notebooks, KQL, T-SQL Fabric Data Factory When to use which tool
Create and manage OneLake shortcuts OneLake Shortcuts Internal vs external, R/W/D behavior, caching, security
Implement mirroring Mirrored Databases Sources, setup, read-only, free compute, limitations
Configure connections and gateways Connections & Gateways Connection types, on-prem gateway, VNet gateway, Service Principal
Ingest data by using pipelines Fabric Data Factory Copy Activity, pipeline parameters, expressions
Transform by PySpark PySpark Transformations filter, select, withColumn, groupBy, join, window functions
Transform by PySpark PySpark Window Functions ROW_NUMBER, RANK, LAG, LEAD, running totals
Transform by SQL SQL Functions COALESCE, NULLIF, TRIM, DATEDIFF, CAST
Transform by KQL RTI Deep Dive KQL date, string, aggregation functions
Transform by M Language (Dataflow Gen2) M Language Complete Guide Text, date, table functions, custom functions, query folding
Denormalize data Normalization & Star Schema Star schema, denormalization trade-offs
Handle duplicate, missing, late-arriving data Data Quality Framework Dedup, null handling, quarantine pattern

Ingest and Transform Streaming Data

Exam Objective Read This Post Key Concepts
Choose appropriate streaming engine Real-Time Intelligence Eventstream vs Spark Structured Streaming
Native tables vs OneLake shortcuts in RTI RTI Deep Dive When to use each in Eventhouse
Query acceleration vs standard shortcuts RTI Deep Dive Accelerated (cached, fast) vs standard (live, slower)
Process data by Eventstreams Real-Time Intelligence Sources, destinations, in-flight transformations
Process data by Spark structured streaming Spark Structured Streaming readStream, writeStream, checkpoints, foreachBatch
Process data by KQL RTI Deep Dive KQL queries, materialized views
Create windowing functions Spark Structured Streaming Tumbling, sliding, session windows in Spark
Create windowing functions RTI Deep Dive Tumbling, hopping, sliding, session, snapshot in KQL

What to memorize: Output modes: Append (new rows only), Complete (full rewrite), Update (changed rows). Watermarks handle late data. Checkpoint = crash recovery. foreachBatch = MERGE with streaming. Window types: tumbling (non-overlapping), hopping (overlapping), session (activity-based), snapshot (point-in-time).


Domain 3: Monitor and Optimize an Analytics Solution (30-35%)

Monitor Fabric Items

Exam Objective Read This Post Key Concepts
Monitor data ingestion Monitoring & Troubleshooting Monitoring Hub, pipeline run details
Monitor data transformation Monitoring & Troubleshooting Notebook monitoring, Spark UI
Monitor semantic model refresh Power BI Direct Lake Direct Lake refresh, fallback detection
Configure alerts Data Activator Triggers, actions, cooldowns

Identify and Resolve Errors

Exam Objective Read This Post Key Concepts
Pipeline errors Monitoring & Troubleshooting Connection failed, timeout, mapping errors
Dataflow Gen2 errors Monitoring & Troubleshooting Type conversion, destination write, timeout
Notebook errors Monitoring & Troubleshooting Table not found, OOM, module not found
Eventhouse errors Monitoring & Troubleshooting Ingestion lag, query timeout, disk full
Eventstream errors Monitoring & Troubleshooting Deserialization, connection lost
T-SQL errors SQL Transactions & ACID TRY/CATCH, deadlocks
OneLake shortcut errors Monitoring & Troubleshooting Credential expired, path incorrect, permission denied

Optimize Performance

Exam Objective Read This Post Key Concepts
Optimize a Lakehouse table Optimization Guide OPTIMIZE, VACUUM, Z-ORDER, V-Order, partitioning
Optimize a Lakehouse table Delta Lake Table Properties TBLPROPERTIES: retention, CDF, autoOptimize, column mapping
Optimize a pipeline Optimization Guide Parallel ForEach, staggering, retry
Optimize a data warehouse Optimization Guide Statistics, result set caching, SELECT columns
Optimize a data warehouse Warehouse Advanced COPY INTO, CTAS, DMVs, Query Insights
Optimize Eventstreams and Eventhouses Optimization Guide Retention, caching, materialized views
Optimize Spark performance Apache Spark in Fabric Shuffle partitions, AQE, broadcast joins
Optimize query performance Optimization Guide The complete optimization checklist

What to memorize: OPTIMIZE compacts small files. VACUUM removes old versions. Z-ORDER co-locates by frequently filtered column. V-Order sorts within Parquet for Direct Lake. Statistics improve Warehouse query plans. Result set caching returns cached query results. AQE auto-optimizes Spark. Broadcast joins eliminate shuffles for small tables.


The 8-Week Study Plan

Week 1-2: Foundations (Domain 1 — Implement & Manage)

Day 1-2: Read Fabric overview posts
  → What is Fabric, Capacity/Workspaces, OneLake Deep Dive
Day 3-4: Lakehouse vs Warehouse
  → Lakehouse Practical Guide, Warehouse Practical Guide
Day 5-7: Security
  → Security & Governance, practice RLS/CLS/masking in Warehouse
Day 8-10: CI/CD
  → Git Integration & Deployment Pipelines
Day 11-14: Orchestration
  → Data Factory, Triggers & Scheduling, Notebooks (mssparkutils)

Week 3-4: Batch Ingestion (Domain 2 — Ingest & Transform)

Day 15-17: Data loading
  → Shortcuts, Mirroring, Connections & Gateways
Day 18-20: Dataflow Gen2
  → DFG2 three-part series (Introduction, Advanced, Production)
  → M Language Complete Guide (Power Query reference)
Day 21-23: PySpark transformations
  → PySpark Transformations, Joins, Window Functions
Day 24-26: Dimensional modeling
  → Star Schema, SCD Types
Day 27-28: Data quality
  → Data Quality Framework, Medallion Architecture

Week 5-6: Streaming + Advanced (Domain 2 continued + Domain 3)

Day 29-31: Streaming
  → Spark Structured Streaming (output modes, checkpoints, windows)
Day 32-34: Real-Time Intelligence
  → RTI Overview + RTI Deep Dive (window types, KQL, materialized views)
Day 35-37: Advanced Warehouse
  → COPY INTO, CTAS, DMVs, Visual Query, SSMS
Day 38-40: MLVs
  → Materialized Lake Views
Day 41-42: Power BI
  → Direct Lake, semantic models, fallback behavior

Week 7-8: Optimize + Review

Day 43-45: Optimization
  → Optimization Guide (all workloads), Apache Spark tuning
Day 46-47: Monitoring
  → Monitoring & Troubleshooting, Data Activator
Day 48-49: Error resolution
  → Review error tables for all item types
Day 50-52: Practice
  → Microsoft free practice assessment
  → Review all interview questions from our posts
Day 53-56: Final review
  → Re-read this study guide
  → Focus on weak areas identified in practice assessment

Key Concepts You MUST Know (Quick Reference)

Lakehouse vs Warehouse Decision

Need PySpark/Spark?              → Lakehouse
Need T-SQL write (INSERT/MERGE)? → Warehouse
Need stored procedures?          → Warehouse
Need semi-structured data?       → Lakehouse
Need file storage (CSV/JSON)?    → Lakehouse
SQL analytics endpoint?          → Lakehouse (auto-generated, READ-ONLY)

Shortcut Behavior

ADLS Gen2 shortcut:  Read ✅  Write ✅  Delete files ✅ (deletes at source!)
S3 shortcut:         Read ✅  Write ❌  Delete ❌
GCS shortcut:        Read ✅  Write ❌  Delete ❌
OneLake shortcut:    Read ✅  Write ✅  Delete files ✅ (deletes at target!)
Dataverse shortcut:  Read ✅  Write ❌  Delete ❌
Delete THE shortcut: Only removes pointer. Source data UNTOUCHED. Always safe.

Output Modes (Streaming)

Append:   New rows only (stateless transforms — filter, map)
Complete: Full result rewrite (aggregations — GROUP BY)
Update:   Changed rows only (aggregations — delta changes)

Deployment Pipeline Rules

Deployable: Notebooks, Pipelines, Dataflows, Semantic Models, Reports, Lakehouse/Warehouse metadata
NOT deployable: DATA, Mirrored DBs, Eventstreams, KQL DBs, Connections, Workspace roles

Security Layers

1. Workspace roles (Admin/Member/Contributor/Viewer)
2. Item permissions (Read/Build/Reshare)
3. OneLake data access roles (table/folder level)
4. Row-level security (filter predicate functions)
5. Column-level security (GRANT on specific columns)
6. Dynamic data masking (email(), partial(), default())
7. Sensitivity labels (Confidential/Internal/Public)

Optimization Quick Reference

Lakehouse:   OPTIMIZE + VACUUM + Z-ORDER + V-Order + optimizeWrite + TBLPROPERTIES (retention, CDF, autoCompact)
Warehouse:   Statistics + Result Set Caching + SELECT specific columns
Spark:       Shuffle partitions (reduce for small data) + AQE + Broadcast joins
Pipeline:    Parallel ForEach + Stagger schedules + Retry
Eventhouse:  Retention + Caching + Materialized views + Partitioning

Exam-Day Tips

  1. Read every word carefully — “MOST efficient” vs “CORRECT” can lead to different answers. The exam tests the BEST answer, not just a working answer.

  2. Look for keywords — “near real-time” → Eventstream/Mirroring. “No-code” → Dataflow Gen2. “T-SQL” → Warehouse. “PySpark” → Lakehouse notebook. “Continuously” → Streaming. “Scheduled” → Pipeline trigger.

  3. Eliminate obviously wrong answers — if the question asks about writing data and an option mentions “SQL analytics endpoint,” eliminate it immediately (read-only).

  4. Case studies are weighted heavily — read the entire case study before answering. The scenario context often eliminates 2 of 4 options.

  5. Use Microsoft Learn during the exam — you have split-screen access to Microsoft Learn. Use it for syntax you cannot remember (KQL functions, M language), but do not spend too much time searching.

  6. Time management — 120 minutes for ~50 questions ≈ 2.4 minutes per question. Flag difficult questions and return to them.

  7. The exam is scenario-based — it does not ask “What is V-Order?” It asks “A team reports slow Direct Lake performance on a Lakehouse table with 500 small files. What should you do?” (Answer: OPTIMIZE with V-Order).

Practice Questions (Sample Scenarios)

Q1: A data engineer needs to load data from an on-premises SQL Server into a Fabric Lakehouse. The data center has strict firewall rules that do not allow inbound connections. What should they configure?

A) VNet data gateway B) On-premises data gateway C) Direct JDBC connection D) Azure ExpressRoute

Answer: B — On-premises data gateway creates an outbound HTTPS connection from the corporate network. No inbound ports needed.

Q2: A Spark notebook processes 50,000 rows but creates 200 output files after a GROUP BY operation. What setting should be adjusted?

A) spark.sql.shuffle.partitions → reduce to 10-20 B) spark.databricks.delta.optimizeWrite → true C) spark.sql.adaptive.enabled → false D) spark.executor.memory → increase

Answer: A — 200 shuffle partitions for 50K rows creates 200 tiny files. Reducing to 10-20 creates appropriately sized files. (B also helps but A is the primary fix.)

Q3: A team needs to implement row-level security so that regional managers see only their region’s data in Power BI reports. The data is in a Fabric Warehouse. What should they create?

A) A filter function and security policy on the Warehouse table B) A DAX filter in the Power BI report C) An OneLake data access role D) A separate Warehouse per region

Answer: A — RLS in Warehouse uses T-SQL filter predicate functions with CREATE SECURITY POLICY. This enforces security at the data level, affecting all consumers including Power BI.

Q4: A streaming workload ingests 1 million events per hour from Azure Event Hubs. The team needs sub-second query performance on the last 24 hours of data and also needs to keep 90 days of history for batch analysis. What architecture should they implement?

A) Eventstream → Eventhouse only B) Eventstream → Lakehouse only C) Eventstream → both Eventhouse (real-time) and Lakehouse (historical) D) Pipeline with hourly schedule

Answer: C — The dual-path architecture: Eventstream routes to Eventhouse for real-time queries (sub-second KQL) AND Lakehouse for 90-day historical batch analysis.

Q5: A data engineer creates a Dataflow Gen2 that filters rows and adds a custom column. The dataflow processes 10 million rows but takes 45 minutes. Checking the native query for the filter step shows it is grayed out. What is the most likely cause?

A) The source does not support query folding B) The custom column step broke query folding C) The filter condition is too complex D) The data is too large for Dataflow Gen2

Answer: B — Custom columns use M formulas that cannot translate to SQL, breaking query folding. All 10M rows are downloaded before filtering. Fix: move the filter BEFORE the custom column.

Resources

Resource Link
Official exam page DP-700 on Microsoft Learn
Official study guide Skills measured (April 2026)
Free practice assessment Microsoft practice assessment
Microsoft Learn collection DP-700 learning path
DriveDataScience Fabric posts All 32 Fabric posts
Exam sandbox Try the exam interface

Wrapping Up

The DP-700 is a scenario-based exam — it tests your ability to make the RIGHT decision, not just the CORRECT one. Knowing that OPTIMIZE compacts files is not enough. You need to know WHEN to use OPTIMIZE vs Z-ORDER vs VACUUM, and WHY each matters for specific scenarios (Direct Lake performance, storage cost, query speed).

Our 32 Fabric posts cover every exam objective. The 8-week study plan gives you structure. The key concepts quick reference gives you exam-day recall. And the practice questions show you the exam’s style.

Good luck on your DP-700!

The complete DriveDataScience Fabric learning path:Start Here — All Learning PathsFabric Learning Path (32 posts)


Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link