DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam
The DP-700 (Implementing Data Engineering Solutions Using Microsoft Fabric) is Microsoft’s certification for Fabric Data Engineers. It validates your ability to design, implement, secure, monitor, and optimize data engineering solutions on Microsoft Fabric.
This post is your study companion — every exam objective mapped to the DriveDataScience post that covers it, a recommended study plan, exam-day tips, and the key concepts you MUST know for each domain.
This guide is based on the official skills measured as of April 20, 2026. The exam has three equally weighted domains (~33% each). You need a score of 700 or higher to pass.
Table of Contents
- Exam Overview
- The Three Domains
- Domain 1: Implement and Manage an Analytics Solution (30-35%)
- Domain 2: Ingest and Transform Data (30-35%)
- Domain 3: Monitor and Optimize an Analytics Solution (30-35%)
- The 8-Week Study Plan
- Key Concepts You MUST Know (Quick Reference)
- Exam-Day Tips
- Practice Questions (Sample Scenarios)
- Resources
- Wrapping Up
Exam Overview
| Detail | Info |
|---|---|
| Exam code | DP-700 |
| Full name | Implementing Data Engineering Solutions Using Microsoft Fabric |
| Certification | Microsoft Certified: Fabric Data Engineer Associate |
| Passing score | 700 / 1000 |
| Duration | 120 minutes |
| Question types | Multiple choice, drag-and-drop, case studies, yes/no scenarios |
| Languages | English, Chinese, French, German, Japanese, Portuguese, Spanish |
| Prerequisite | None (DP-900 knowledge recommended) |
| Cost | $165 USD |
| Renewal | Annual (free online assessment) |
| In-exam resources | Microsoft Learn accessible during exam (split screen) |
The Three Domains
┌─────────────────────────────────────────────────────┐
│ DP-700: Implementing Data Engineering Solutions │
│ Using Microsoft Fabric │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Domain 1: Implement & Manage (30-35%) │ │
│ │ Workspace config, CI/CD, security, │ │
│ │ orchestration │ │
│ ├──────────────────────────────────────────┤ │
│ │ Domain 2: Ingest & Transform (30-35%) │ │
│ │ Loading patterns, batch, streaming, │ │
│ │ shortcuts, mirroring, transformations │ │
│ ├──────────────────────────────────────────┤ │
│ │ Domain 3: Monitor & Optimize (30-35%) │ │
│ │ Monitoring, error resolution, │ │
│ │ performance optimization │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Domain 1: Implement and Manage an Analytics Solution (30-35%)
Configure Microsoft Fabric Workspace Settings
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Configure Spark workspace settings | Apache Spark in Fabric | Shuffle partitions, AQE, V-Order, starter pools vs custom environments |
| Configure Spark workspace settings | Fabric Notebooks | Spark environments, library management, runtime versions |
| Configure domain workspace settings | Capacity, Workspaces & Items | Domains, workspace creation, capacity assignment |
| Configure OneLake workspace settings | OneLake Deep Dive | OneLake settings, cache, shortcuts config |
| Configure Dataflows Gen2 workspace settings | Dataflow Gen2: Production | Query folding, staging, performance settings |
| Configure Dataflows Gen2 workspace settings | M Language Complete Guide | M functions, custom columns, error handling, query folding |
What to memorize: Spark shuffle partitions default (200), V-Order default (enabled), AQE default (enabled), starter pool startup time (~10 seconds), runtime versions.
Implement Lifecycle Management in Fabric
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Configure version control | Git Integration & CI/CD | Azure DevOps, GitHub, commit/sync, branching |
| Implement database projects | Git Integration & CI/CD | What is tracked in Git (definitions, not data) |
| Create and configure deployment pipelines | Git Integration & CI/CD | Dev → Test → Prod stages, deployment rules, selective deployment |
| Deployment errors | Monitoring & Troubleshooting | What can/cannot be deployed, common deployment failures |
What to memorize: Git tracks item DEFINITIONS not DATA. Deployable: notebooks, pipelines, dataflows, semantic models, reports. NOT deployable: data, mirrored databases, eventstreams, KQL databases, connections. Deployment rules swap connections per environment.
Configure Security and Governance
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Workspace-level access controls | Security & Governance | Admin, Member, Contributor, Viewer roles |
| Item-level access controls | Security & Governance | Sharing items, Read/Build/Reshare permissions |
| RLS, CLS, object-level, file-level | Security & Governance | RLS filter functions, GRANT on columns, OneLake data access roles |
| RLS, CLS in Warehouse | Warehouse Practical Guide | T-SQL security policy, GRANT/DENY |
| RLS, CLS in Warehouse | Warehouse Advanced | SSMS connectivity, granting access |
| Dynamic data masking | Security & Governance | email(), partial(), default() masking functions |
| Sensitivity labels | Security & Governance | Confidential, Internal, Public — labels flow downstream |
| Endorse items | Security & Governance | Promoted vs Certified endorsement |
| Audit logs | Monitoring & Troubleshooting | Key audit events, Purview compliance portal |
| OneLake security | OneLake Deep Dive | OneLake data access roles, workspace roles |
What to memorize: 7 security layers (workspace → item → OneLake data access → RLS → CLS → masking → sensitivity labels). RLS uses filter predicate functions + security policies. CLS uses GRANT SELECT on specific columns. Dynamic masking functions: email(), partial(), default(). Sensitivity labels inherit downstream.
Orchestrate Processes
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Choose between Dataflow Gen2, pipeline, notebook | Fabric Data Factory | Decision matrix: DFG2 for simple cleaning, notebook for complex, pipeline for orchestration |
| Choose between Dataflow Gen2, pipeline, notebook | Dataflow Gen2: Production | The complete decision guide (20-scenario table) |
| Schedules and event-based triggers | Triggers & Scheduling | Schedule, event, tumbling window triggers |
| Orchestration with notebooks and pipelines | Fabric Notebooks | Parameters, mssparkutils.notebook.run, %run, exit values |
| Orchestration patterns | Triggers & Scheduling | Master-child, conditional, fan-out fan-in, retry |
| Dynamic expressions | Triggers & Scheduling | @utcNow(), @formatDateTime(), @pipeline().parameters |
What to memorize: Dataflow Gen2 = no-code simple transforms. Notebook = code, complex logic, Delta MERGE. Pipeline = orchestration. Parameters: notebook reads with mssparkutils.widgets.get(), returns with mssparkutils.notebook.exit(). Pipeline expressions: @utcNow(), @formatDateTime(), @activity().output.
Domain 2: Ingest and Transform Data (30-35%)
Design and Implement Loading Patterns
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Full and incremental data loads | Dataflow Gen2: Production | Full (Replace mode) vs incremental (Append + date filter/watermark) |
| Full and incremental data loads | Medallion Architecture | Bronze (raw) → Silver (clean) → Gold (enriched) |
| Prepare data for dimensional model | Normalization & Star Schema | Fact tables, dimension tables, surrogate keys, SCD |
| Prepare data for dimensional model | SCD Types | SCD Type 0, 1, 2, 3, 6 |
| Loading pattern for streaming data | Spark Structured Streaming | Checkpoints, output modes, watermarks, foreachBatch |
| Loading pattern for streaming data | Real-Time Intelligence | Eventstream → Eventhouse, dual-path architecture |
Ingest and Transform Batch Data
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Choose appropriate data store | Lakehouse vs Warehouse | Lakehouse (PySpark, unstructured) vs Warehouse (T-SQL, structured) |
| Choose between DFG2, notebooks, KQL, T-SQL | Fabric Data Factory | When to use which tool |
| Create and manage OneLake shortcuts | OneLake Shortcuts | Internal vs external, R/W/D behavior, caching, security |
| Implement mirroring | Mirrored Databases | Sources, setup, read-only, free compute, limitations |
| Configure connections and gateways | Connections & Gateways | Connection types, on-prem gateway, VNet gateway, Service Principal |
| Ingest data by using pipelines | Fabric Data Factory | Copy Activity, pipeline parameters, expressions |
| Transform by PySpark | PySpark Transformations | filter, select, withColumn, groupBy, join, window functions |
| Transform by PySpark | PySpark Window Functions | ROW_NUMBER, RANK, LAG, LEAD, running totals |
| Transform by SQL | SQL Functions | COALESCE, NULLIF, TRIM, DATEDIFF, CAST |
| Transform by KQL | RTI Deep Dive | KQL date, string, aggregation functions |
| Transform by M Language (Dataflow Gen2) | M Language Complete Guide | Text, date, table functions, custom functions, query folding |
| Denormalize data | Normalization & Star Schema | Star schema, denormalization trade-offs |
| Handle duplicate, missing, late-arriving data | Data Quality Framework | Dedup, null handling, quarantine pattern |
Ingest and Transform Streaming Data
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Choose appropriate streaming engine | Real-Time Intelligence | Eventstream vs Spark Structured Streaming |
| Native tables vs OneLake shortcuts in RTI | RTI Deep Dive | When to use each in Eventhouse |
| Query acceleration vs standard shortcuts | RTI Deep Dive | Accelerated (cached, fast) vs standard (live, slower) |
| Process data by Eventstreams | Real-Time Intelligence | Sources, destinations, in-flight transformations |
| Process data by Spark structured streaming | Spark Structured Streaming | readStream, writeStream, checkpoints, foreachBatch |
| Process data by KQL | RTI Deep Dive | KQL queries, materialized views |
| Create windowing functions | Spark Structured Streaming | Tumbling, sliding, session windows in Spark |
| Create windowing functions | RTI Deep Dive | Tumbling, hopping, sliding, session, snapshot in KQL |
What to memorize: Output modes: Append (new rows only), Complete (full rewrite), Update (changed rows). Watermarks handle late data. Checkpoint = crash recovery. foreachBatch = MERGE with streaming. Window types: tumbling (non-overlapping), hopping (overlapping), session (activity-based), snapshot (point-in-time).
Domain 3: Monitor and Optimize an Analytics Solution (30-35%)
Monitor Fabric Items
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Monitor data ingestion | Monitoring & Troubleshooting | Monitoring Hub, pipeline run details |
| Monitor data transformation | Monitoring & Troubleshooting | Notebook monitoring, Spark UI |
| Monitor semantic model refresh | Power BI Direct Lake | Direct Lake refresh, fallback detection |
| Configure alerts | Data Activator | Triggers, actions, cooldowns |
Identify and Resolve Errors
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Pipeline errors | Monitoring & Troubleshooting | Connection failed, timeout, mapping errors |
| Dataflow Gen2 errors | Monitoring & Troubleshooting | Type conversion, destination write, timeout |
| Notebook errors | Monitoring & Troubleshooting | Table not found, OOM, module not found |
| Eventhouse errors | Monitoring & Troubleshooting | Ingestion lag, query timeout, disk full |
| Eventstream errors | Monitoring & Troubleshooting | Deserialization, connection lost |
| T-SQL errors | SQL Transactions & ACID | TRY/CATCH, deadlocks |
| OneLake shortcut errors | Monitoring & Troubleshooting | Credential expired, path incorrect, permission denied |
Optimize Performance
| Exam Objective | Read This Post | Key Concepts |
|---|---|---|
| Optimize a Lakehouse table | Optimization Guide | OPTIMIZE, VACUUM, Z-ORDER, V-Order, partitioning |
| Optimize a Lakehouse table | Delta Lake Table Properties | TBLPROPERTIES: retention, CDF, autoOptimize, column mapping |
| Optimize a pipeline | Optimization Guide | Parallel ForEach, staggering, retry |
| Optimize a data warehouse | Optimization Guide | Statistics, result set caching, SELECT columns |
| Optimize a data warehouse | Warehouse Advanced | COPY INTO, CTAS, DMVs, Query Insights |
| Optimize Eventstreams and Eventhouses | Optimization Guide | Retention, caching, materialized views |
| Optimize Spark performance | Apache Spark in Fabric | Shuffle partitions, AQE, broadcast joins |
| Optimize query performance | Optimization Guide | The complete optimization checklist |
What to memorize: OPTIMIZE compacts small files. VACUUM removes old versions. Z-ORDER co-locates by frequently filtered column. V-Order sorts within Parquet for Direct Lake. Statistics improve Warehouse query plans. Result set caching returns cached query results. AQE auto-optimizes Spark. Broadcast joins eliminate shuffles for small tables.
The 8-Week Study Plan
Week 1-2: Foundations (Domain 1 — Implement & Manage)
Day 1-2: Read Fabric overview posts
→ What is Fabric, Capacity/Workspaces, OneLake Deep Dive
Day 3-4: Lakehouse vs Warehouse
→ Lakehouse Practical Guide, Warehouse Practical Guide
Day 5-7: Security
→ Security & Governance, practice RLS/CLS/masking in Warehouse
Day 8-10: CI/CD
→ Git Integration & Deployment Pipelines
Day 11-14: Orchestration
→ Data Factory, Triggers & Scheduling, Notebooks (mssparkutils)
Week 3-4: Batch Ingestion (Domain 2 — Ingest & Transform)
Day 15-17: Data loading
→ Shortcuts, Mirroring, Connections & Gateways
Day 18-20: Dataflow Gen2
→ DFG2 three-part series (Introduction, Advanced, Production)
→ M Language Complete Guide (Power Query reference)
Day 21-23: PySpark transformations
→ PySpark Transformations, Joins, Window Functions
Day 24-26: Dimensional modeling
→ Star Schema, SCD Types
Day 27-28: Data quality
→ Data Quality Framework, Medallion Architecture
Week 5-6: Streaming + Advanced (Domain 2 continued + Domain 3)
Day 29-31: Streaming
→ Spark Structured Streaming (output modes, checkpoints, windows)
Day 32-34: Real-Time Intelligence
→ RTI Overview + RTI Deep Dive (window types, KQL, materialized views)
Day 35-37: Advanced Warehouse
→ COPY INTO, CTAS, DMVs, Visual Query, SSMS
Day 38-40: MLVs
→ Materialized Lake Views
Day 41-42: Power BI
→ Direct Lake, semantic models, fallback behavior
Week 7-8: Optimize + Review
Day 43-45: Optimization
→ Optimization Guide (all workloads), Apache Spark tuning
Day 46-47: Monitoring
→ Monitoring & Troubleshooting, Data Activator
Day 48-49: Error resolution
→ Review error tables for all item types
Day 50-52: Practice
→ Microsoft free practice assessment
→ Review all interview questions from our posts
Day 53-56: Final review
→ Re-read this study guide
→ Focus on weak areas identified in practice assessment
Key Concepts You MUST Know (Quick Reference)
Lakehouse vs Warehouse Decision
Need PySpark/Spark? → Lakehouse
Need T-SQL write (INSERT/MERGE)? → Warehouse
Need stored procedures? → Warehouse
Need semi-structured data? → Lakehouse
Need file storage (CSV/JSON)? → Lakehouse
SQL analytics endpoint? → Lakehouse (auto-generated, READ-ONLY)
Shortcut Behavior
ADLS Gen2 shortcut: Read ✅ Write ✅ Delete files ✅ (deletes at source!)
S3 shortcut: Read ✅ Write ❌ Delete ❌
GCS shortcut: Read ✅ Write ❌ Delete ❌
OneLake shortcut: Read ✅ Write ✅ Delete files ✅ (deletes at target!)
Dataverse shortcut: Read ✅ Write ❌ Delete ❌
Delete THE shortcut: Only removes pointer. Source data UNTOUCHED. Always safe.
Output Modes (Streaming)
Append: New rows only (stateless transforms — filter, map)
Complete: Full result rewrite (aggregations — GROUP BY)
Update: Changed rows only (aggregations — delta changes)
Deployment Pipeline Rules
Deployable: Notebooks, Pipelines, Dataflows, Semantic Models, Reports, Lakehouse/Warehouse metadata
NOT deployable: DATA, Mirrored DBs, Eventstreams, KQL DBs, Connections, Workspace roles
Security Layers
1. Workspace roles (Admin/Member/Contributor/Viewer)
2. Item permissions (Read/Build/Reshare)
3. OneLake data access roles (table/folder level)
4. Row-level security (filter predicate functions)
5. Column-level security (GRANT on specific columns)
6. Dynamic data masking (email(), partial(), default())
7. Sensitivity labels (Confidential/Internal/Public)
Optimization Quick Reference
Lakehouse: OPTIMIZE + VACUUM + Z-ORDER + V-Order + optimizeWrite + TBLPROPERTIES (retention, CDF, autoCompact)
Warehouse: Statistics + Result Set Caching + SELECT specific columns
Spark: Shuffle partitions (reduce for small data) + AQE + Broadcast joins
Pipeline: Parallel ForEach + Stagger schedules + Retry
Eventhouse: Retention + Caching + Materialized views + Partitioning
Exam-Day Tips
-
Read every word carefully — “MOST efficient” vs “CORRECT” can lead to different answers. The exam tests the BEST answer, not just a working answer.
-
Look for keywords — “near real-time” → Eventstream/Mirroring. “No-code” → Dataflow Gen2. “T-SQL” → Warehouse. “PySpark” → Lakehouse notebook. “Continuously” → Streaming. “Scheduled” → Pipeline trigger.
-
Eliminate obviously wrong answers — if the question asks about writing data and an option mentions “SQL analytics endpoint,” eliminate it immediately (read-only).
-
Case studies are weighted heavily — read the entire case study before answering. The scenario context often eliminates 2 of 4 options.
-
Use Microsoft Learn during the exam — you have split-screen access to Microsoft Learn. Use it for syntax you cannot remember (KQL functions, M language), but do not spend too much time searching.
-
Time management — 120 minutes for ~50 questions ≈ 2.4 minutes per question. Flag difficult questions and return to them.
-
The exam is scenario-based — it does not ask “What is V-Order?” It asks “A team reports slow Direct Lake performance on a Lakehouse table with 500 small files. What should you do?” (Answer: OPTIMIZE with V-Order).
Practice Questions (Sample Scenarios)
Q1: A data engineer needs to load data from an on-premises SQL Server into a Fabric Lakehouse. The data center has strict firewall rules that do not allow inbound connections. What should they configure?
A) VNet data gateway B) On-premises data gateway C) Direct JDBC connection D) Azure ExpressRoute
Answer: B — On-premises data gateway creates an outbound HTTPS connection from the corporate network. No inbound ports needed.
Q2: A Spark notebook processes 50,000 rows but creates 200 output files after a GROUP BY operation. What setting should be adjusted?
A) spark.sql.shuffle.partitions → reduce to 10-20 B) spark.databricks.delta.optimizeWrite → true C) spark.sql.adaptive.enabled → false D) spark.executor.memory → increase
Answer: A — 200 shuffle partitions for 50K rows creates 200 tiny files. Reducing to 10-20 creates appropriately sized files. (B also helps but A is the primary fix.)
Q3: A team needs to implement row-level security so that regional managers see only their region’s data in Power BI reports. The data is in a Fabric Warehouse. What should they create?
A) A filter function and security policy on the Warehouse table B) A DAX filter in the Power BI report C) An OneLake data access role D) A separate Warehouse per region
Answer: A — RLS in Warehouse uses T-SQL filter predicate functions with CREATE SECURITY POLICY. This enforces security at the data level, affecting all consumers including Power BI.
Q4: A streaming workload ingests 1 million events per hour from Azure Event Hubs. The team needs sub-second query performance on the last 24 hours of data and also needs to keep 90 days of history for batch analysis. What architecture should they implement?
A) Eventstream → Eventhouse only B) Eventstream → Lakehouse only C) Eventstream → both Eventhouse (real-time) and Lakehouse (historical) D) Pipeline with hourly schedule
Answer: C — The dual-path architecture: Eventstream routes to Eventhouse for real-time queries (sub-second KQL) AND Lakehouse for 90-day historical batch analysis.
Q5: A data engineer creates a Dataflow Gen2 that filters rows and adds a custom column. The dataflow processes 10 million rows but takes 45 minutes. Checking the native query for the filter step shows it is grayed out. What is the most likely cause?
A) The source does not support query folding B) The custom column step broke query folding C) The filter condition is too complex D) The data is too large for Dataflow Gen2
Answer: B — Custom columns use M formulas that cannot translate to SQL, breaking query folding. All 10M rows are downloaded before filtering. Fix: move the filter BEFORE the custom column.
Resources
| Resource | Link |
|---|---|
| Official exam page | DP-700 on Microsoft Learn |
| Official study guide | Skills measured (April 2026) |
| Free practice assessment | Microsoft practice assessment |
| Microsoft Learn collection | DP-700 learning path |
| DriveDataScience Fabric posts | All 32 Fabric posts |
| Exam sandbox | Try the exam interface |
Wrapping Up
The DP-700 is a scenario-based exam — it tests your ability to make the RIGHT decision, not just the CORRECT one. Knowing that OPTIMIZE compacts files is not enough. You need to know WHEN to use OPTIMIZE vs Z-ORDER vs VACUUM, and WHY each matters for specific scenarios (Direct Lake performance, storage cost, query speed).
Our 32 Fabric posts cover every exam objective. The 8-week study plan gives you structure. The key concepts quick reference gives you exam-day recall. And the practice questions show you the exam’s style.
Good luck on your DP-700!
The complete DriveDataScience Fabric learning path: – Start Here — All Learning Paths – Fabric Learning Path (32 posts)
Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.