DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

The DP-700 (Implementing Data Engineering Solutions Using Microsoft Fabric) is Microsoft’s certification for Fabric Data Engineers. It validates your ability to design, implement, secure, monitor, and optimize data engineering solutions on Microsoft Fabric.

This post is your study companion — every exam objective mapped to the DriveDataScience post that covers it, a recommended study plan, exam-day tips, and the key concepts you MUST know for each domain.

This guide is based on the official skills measured as of April 20, 2026. The exam has three equally weighted domains (~33% each). You need a score of 700 or higher to pass.

Exam Overview
The Three Domains
Domain 1: Implement and Manage an Analytics Solution (30-35%)
Domain 2: Ingest and Transform Data (30-35%)
Domain 3: Monitor and Optimize an Analytics Solution (30-35%)
The 8-Week Study Plan
Key Concepts You MUST Know (Quick Reference)
Exam-Day Tips
Practice Questions (Sample Scenarios)
Resources
Wrapping Up

Exam Overview

Detail	Info
Exam code	DP-700
Full name	Implementing Data Engineering Solutions Using Microsoft Fabric
Certification	Microsoft Certified: Fabric Data Engineer Associate
Passing score	700 / 1000
Duration	120 minutes
Question types	Multiple choice, drag-and-drop, case studies, yes/no scenarios
Languages	English, Chinese, French, German, Japanese, Portuguese, Spanish
Prerequisite	None (DP-900 knowledge recommended)
Cost	$165 USD
Renewal	Annual (free online assessment)
In-exam resources	Microsoft Learn accessible during exam (split screen)

The Three Domains

┌─────────────────────────────────────────────────────┐
│  DP-700: Implementing Data Engineering Solutions     │
│  Using Microsoft Fabric                              │
│                                                      │
│  ┌──────────────────────────────────────────┐        │
│  │ Domain 1: Implement & Manage (30-35%)    │        │
│  │  Workspace config, CI/CD, security,      │        │
│  │  orchestration                            │        │
│  ├──────────────────────────────────────────┤        │
│  │ Domain 2: Ingest & Transform (30-35%)    │        │
│  │  Loading patterns, batch, streaming,      │        │
│  │  shortcuts, mirroring, transformations    │        │
│  ├──────────────────────────────────────────┤        │
│  │ Domain 3: Monitor & Optimize (30-35%)    │        │
│  │  Monitoring, error resolution,            │        │
│  │  performance optimization                 │        │
│  └──────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────┘

Domain 1: Implement and Manage an Analytics Solution (30-35%)

Configure Microsoft Fabric Workspace Settings

Exam Objective	Read This Post	Key Concepts
Configure Spark workspace settings	Apache Spark in Fabric	Shuffle partitions, AQE, V-Order, starter pools vs custom environments
Configure Spark workspace settings	Fabric Notebooks	Spark environments, library management, runtime versions
Configure domain workspace settings	Capacity, Workspaces & Items	Domains, workspace creation, capacity assignment
Configure OneLake workspace settings	OneLake Deep Dive	OneLake settings, cache, shortcuts config
Configure Dataflows Gen2 workspace settings	Dataflow Gen2: Production	Query folding, staging, performance settings
Configure Dataflows Gen2 workspace settings	M Language Complete Guide	M functions, custom columns, error handling, query folding

What to memorize: Spark shuffle partitions default (200), V-Order default (enabled), AQE default (enabled), starter pool startup time (~10 seconds), runtime versions.

Implement Lifecycle Management in Fabric

Exam Objective	Read This Post	Key Concepts
Configure version control	Git Integration & CI/CD	Azure DevOps, GitHub, commit/sync, branching
Implement database projects	Git Integration & CI/CD	What is tracked in Git (definitions, not data)
Create and configure deployment pipelines	Git Integration & CI/CD	Dev → Test → Prod stages, deployment rules, selective deployment
Deployment errors	Monitoring & Troubleshooting	What can/cannot be deployed, common deployment failures

What to memorize: Git tracks item DEFINITIONS not DATA. Deployable: notebooks, pipelines, dataflows, semantic models, reports. NOT deployable: data, mirrored databases, eventstreams, KQL databases, connections. Deployment rules swap connections per environment.

Configure Security and Governance

Exam Objective	Read This Post	Key Concepts
Workspace-level access controls	Security & Governance	Admin, Member, Contributor, Viewer roles
Item-level access controls	Security & Governance	Sharing items, Read/Build/Reshare permissions
RLS, CLS, object-level, file-level	Security & Governance	RLS filter functions, GRANT on columns, OneLake data access roles
RLS, CLS in Warehouse	Warehouse Practical Guide	T-SQL security policy, GRANT/DENY
RLS, CLS in Warehouse	Warehouse Advanced	SSMS connectivity, granting access
Dynamic data masking	Security & Governance	email(), partial(), default() masking functions
Sensitivity labels	Security & Governance	Confidential, Internal, Public — labels flow downstream
Endorse items	Security & Governance	Promoted vs Certified endorsement
Audit logs	Monitoring & Troubleshooting	Key audit events, Purview compliance portal
OneLake security	OneLake Deep Dive	OneLake data access roles, workspace roles

What to memorize: 7 security layers (workspace → item → OneLake data access → RLS → CLS → masking → sensitivity labels). RLS uses filter predicate functions + security policies. CLS uses GRANT SELECT on specific columns. Dynamic masking functions: email(), partial(), default(). Sensitivity labels inherit downstream.

Orchestrate Processes

Exam Objective	Read This Post	Key Concepts
Choose between Dataflow Gen2, pipeline, notebook	Fabric Data Factory	Decision matrix: DFG2 for simple cleaning, notebook for complex, pipeline for orchestration
Choose between Dataflow Gen2, pipeline, notebook	Dataflow Gen2: Production	The complete decision guide (20-scenario table)
Schedules and event-based triggers	Triggers & Scheduling	Schedule, event, tumbling window triggers
Orchestration with notebooks and pipelines	Fabric Notebooks	Parameters, mssparkutils.notebook.run, %run, exit values
Orchestration patterns	Triggers & Scheduling	Master-child, conditional, fan-out fan-in, retry
Dynamic expressions	Triggers & Scheduling	@utcNow(), @formatDateTime(), @pipeline().parameters

What to memorize: Dataflow Gen2 = no-code simple transforms. Notebook = code, complex logic, Delta MERGE. Pipeline = orchestration. Parameters: notebook reads with mssparkutils.widgets.get(), returns with mssparkutils.notebook.exit(). Pipeline expressions: @utcNow(), @formatDateTime(), @activity().output.

Domain 2: Ingest and Transform Data (30-35%)

Design and Implement Loading Patterns

Exam Objective	Read This Post	Key Concepts
Full and incremental data loads	Dataflow Gen2: Production	Full (Replace mode) vs incremental (Append + date filter/watermark)
Full and incremental data loads	Medallion Architecture	Bronze (raw) → Silver (clean) → Gold (enriched)
Prepare data for dimensional model	Normalization & Star Schema	Fact tables, dimension tables, surrogate keys, SCD
Prepare data for dimensional model	SCD Types	SCD Type 0, 1, 2, 3, 6
Loading pattern for streaming data	Spark Structured Streaming	Checkpoints, output modes, watermarks, foreachBatch
Loading pattern for streaming data	Real-Time Intelligence	Eventstream → Eventhouse, dual-path architecture

Ingest and Transform Batch Data

Exam Objective	Read This Post	Key Concepts
Choose appropriate data store	Lakehouse vs Warehouse	Lakehouse (PySpark, unstructured) vs Warehouse (T-SQL, structured)
Choose between DFG2, notebooks, KQL, T-SQL	Fabric Data Factory	When to use which tool
Create and manage OneLake shortcuts	OneLake Shortcuts	Internal vs external, R/W/D behavior, caching, security
Implement mirroring	Mirrored Databases	Sources, setup, read-only, free compute, limitations
Configure connections and gateways	Connections & Gateways	Connection types, on-prem gateway, VNet gateway, Service Principal
Ingest data by using pipelines	Fabric Data Factory	Copy Activity, pipeline parameters, expressions
Transform by PySpark	PySpark Transformations	filter, select, withColumn, groupBy, join, window functions
Transform by PySpark	PySpark Window Functions	ROW_NUMBER, RANK, LAG, LEAD, running totals
Transform by SQL	SQL Functions	COALESCE, NULLIF, TRIM, DATEDIFF, CAST
Transform by KQL	RTI Deep Dive	KQL date, string, aggregation functions
Transform by M Language (Dataflow Gen2)	M Language Complete Guide	Text, date, table functions, custom functions, query folding
Denormalize data	Normalization & Star Schema	Star schema, denormalization trade-offs
Handle duplicate, missing, late-arriving data	Data Quality Framework	Dedup, null handling, quarantine pattern

Ingest and Transform Streaming Data

Exam Objective	Read This Post	Key Concepts
Choose appropriate streaming engine	Real-Time Intelligence	Eventstream vs Spark Structured Streaming
Native tables vs OneLake shortcuts in RTI	RTI Deep Dive	When to use each in Eventhouse
Query acceleration vs standard shortcuts	RTI Deep Dive	Accelerated (cached, fast) vs standard (live, slower)
Process data by Eventstreams	Real-Time Intelligence	Sources, destinations, in-flight transformations
Process data by Spark structured streaming	Spark Structured Streaming	readStream, writeStream, checkpoints, foreachBatch
Process data by KQL	RTI Deep Dive	KQL queries, materialized views
Create windowing functions	Spark Structured Streaming	Tumbling, sliding, session windows in Spark
Create windowing functions	RTI Deep Dive	Tumbling, hopping, sliding, session, snapshot in KQL

What to memorize: Output modes: Append (new rows only), Complete (full rewrite), Update (changed rows). Watermarks handle late data. Checkpoint = crash recovery. foreachBatch = MERGE with streaming. Window types: tumbling (non-overlapping), hopping (overlapping), session (activity-based), snapshot (point-in-time).

Domain 3: Monitor and Optimize an Analytics Solution (30-35%)

Monitor Fabric Items

Exam Objective	Read This Post	Key Concepts
Monitor data ingestion	Monitoring & Troubleshooting	Monitoring Hub, pipeline run details
Monitor data transformation	Monitoring & Troubleshooting	Notebook monitoring, Spark UI
Monitor semantic model refresh	Power BI Direct Lake	Direct Lake refresh, fallback detection
Configure alerts	Data Activator	Triggers, actions, cooldowns

Identify and Resolve Errors

Exam Objective	Read This Post	Key Concepts
Pipeline errors	Monitoring & Troubleshooting	Connection failed, timeout, mapping errors
Dataflow Gen2 errors	Monitoring & Troubleshooting	Type conversion, destination write, timeout
Notebook errors	Monitoring & Troubleshooting	Table not found, OOM, module not found
Eventhouse errors	Monitoring & Troubleshooting	Ingestion lag, query timeout, disk full
Eventstream errors	Monitoring & Troubleshooting	Deserialization, connection lost
T-SQL errors	SQL Transactions & ACID	TRY/CATCH, deadlocks
OneLake shortcut errors	Monitoring & Troubleshooting	Credential expired, path incorrect, permission denied

Optimize Performance

Exam Objective	Read This Post	Key Concepts
Optimize a Lakehouse table	Optimization Guide	OPTIMIZE, VACUUM, Z-ORDER, V-Order, partitioning
Optimize a Lakehouse table	Delta Lake Table Properties	TBLPROPERTIES: retention, CDF, autoOptimize, column mapping
Optimize a pipeline	Optimization Guide	Parallel ForEach, staggering, retry
Optimize a data warehouse	Optimization Guide	Statistics, result set caching, SELECT columns
Optimize a data warehouse	Warehouse Advanced	COPY INTO, CTAS, DMVs, Query Insights
Optimize Eventstreams and Eventhouses	Optimization Guide	Retention, caching, materialized views
Optimize Spark performance	Apache Spark in Fabric	Shuffle partitions, AQE, broadcast joins
Optimize query performance	Optimization Guide	The complete optimization checklist

What to memorize: OPTIMIZE compacts small files. VACUUM removes old versions. Z-ORDER co-locates by frequently filtered column. V-Order sorts within Parquet for Direct Lake. Statistics improve Warehouse query plans. Result set caching returns cached query results. AQE auto-optimizes Spark. Broadcast joins eliminate shuffles for small tables.

The 8-Week Study Plan

Week 1-2: Foundations (Domain 1 — Implement & Manage)

Day 1-2: Read Fabric overview posts
  → What is Fabric, Capacity/Workspaces, OneLake Deep Dive
Day 3-4: Lakehouse vs Warehouse
  → Lakehouse Practical Guide, Warehouse Practical Guide
Day 5-7: Security
  → Security & Governance, practice RLS/CLS/masking in Warehouse
Day 8-10: CI/CD
  → Git Integration & Deployment Pipelines
Day 11-14: Orchestration
  → Data Factory, Triggers & Scheduling, Notebooks (mssparkutils)

Week 3-4: Batch Ingestion (Domain 2 — Ingest & Transform)

Day 15-17: Data loading
  → Shortcuts, Mirroring, Connections & Gateways
Day 18-20: Dataflow Gen2
  → DFG2 three-part series (Introduction, Advanced, Production)
  → M Language Complete Guide (Power Query reference)
Day 21-23: PySpark transformations
  → PySpark Transformations, Joins, Window Functions
Day 24-26: Dimensional modeling
  → Star Schema, SCD Types
Day 27-28: Data quality
  → Data Quality Framework, Medallion Architecture

Week 5-6: Streaming + Advanced (Domain 2 continued + Domain 3)

Day 29-31: Streaming
  → Spark Structured Streaming (output modes, checkpoints, windows)
Day 32-34: Real-Time Intelligence
  → RTI Overview + RTI Deep Dive (window types, KQL, materialized views)
Day 35-37: Advanced Warehouse
  → COPY INTO, CTAS, DMVs, Visual Query, SSMS
Day 38-40: MLVs
  → Materialized Lake Views
Day 41-42: Power BI
  → Direct Lake, semantic models, fallback behavior

Week 7-8: Optimize + Review

Day 43-45: Optimization
  → Optimization Guide (all workloads), Apache Spark tuning
Day 46-47: Monitoring
  → Monitoring & Troubleshooting, Data Activator
Day 48-49: Error resolution
  → Review error tables for all item types
Day 50-52: Practice
  → Microsoft free practice assessment
  → Review all interview questions from our posts
Day 53-56: Final review
  → Re-read this study guide
  → Focus on weak areas identified in practice assessment

Key Concepts You MUST Know (Quick Reference)

Lakehouse vs Warehouse Decision

Need PySpark/Spark?              → Lakehouse
Need T-SQL write (INSERT/MERGE)? → Warehouse
Need stored procedures?          → Warehouse
Need semi-structured data?       → Lakehouse
Need file storage (CSV/JSON)?    → Lakehouse
SQL analytics endpoint?          → Lakehouse (auto-generated, READ-ONLY)

Shortcut Behavior

ADLS Gen2 shortcut:  Read ✅  Write ✅  Delete files ✅ (deletes at source!)
S3 shortcut:         Read ✅  Write ❌  Delete ❌
GCS shortcut:        Read ✅  Write ❌  Delete ❌
OneLake shortcut:    Read ✅  Write ✅  Delete files ✅ (deletes at target!)
Dataverse shortcut:  Read ✅  Write ❌  Delete ❌
Delete THE shortcut: Only removes pointer. Source data UNTOUCHED. Always safe.

Output Modes (Streaming)

Append:   New rows only (stateless transforms — filter, map)
Complete: Full result rewrite (aggregations — GROUP BY)
Update:   Changed rows only (aggregations — delta changes)

Deployment Pipeline Rules

Deployable: Notebooks, Pipelines, Dataflows, Semantic Models, Reports, Lakehouse/Warehouse metadata
NOT deployable: DATA, Mirrored DBs, Eventstreams, KQL DBs, Connections, Workspace roles

Security Layers

1. Workspace roles (Admin/Member/Contributor/Viewer)
2. Item permissions (Read/Build/Reshare)
3. OneLake data access roles (table/folder level)
4. Row-level security (filter predicate functions)
5. Column-level security (GRANT on specific columns)
6. Dynamic data masking (email(), partial(), default())
7. Sensitivity labels (Confidential/Internal/Public)

Optimization Quick Reference

Lakehouse:   OPTIMIZE + VACUUM + Z-ORDER + V-Order + optimizeWrite + TBLPROPERTIES (retention, CDF, autoCompact)
Warehouse:   Statistics + Result Set Caching + SELECT specific columns
Spark:       Shuffle partitions (reduce for small data) + AQE + Broadcast joins
Pipeline:    Parallel ForEach + Stagger schedules + Retry
Eventhouse:  Retention + Caching + Materialized views + Partitioning

Exam-Day Tips

Read every word carefully — “MOST efficient” vs “CORRECT” can lead to different answers. The exam tests the BEST answer, not just a working answer.
Look for keywords — “near real-time” → Eventstream/Mirroring. “No-code” → Dataflow Gen2. “T-SQL” → Warehouse. “PySpark” → Lakehouse notebook. “Continuously” → Streaming. “Scheduled” → Pipeline trigger.
Eliminate obviously wrong answers — if the question asks about writing data and an option mentions “SQL analytics endpoint,” eliminate it immediately (read-only).
Case studies are weighted heavily — read the entire case study before answering. The scenario context often eliminates 2 of 4 options.
Use Microsoft Learn during the exam — you have split-screen access to Microsoft Learn. Use it for syntax you cannot remember (KQL functions, M language), but do not spend too much time searching.
Time management — 120 minutes for ~50 questions ≈ 2.4 minutes per question. Flag difficult questions and return to them.
The exam is scenario-based — it does not ask “What is V-Order?” It asks “A team reports slow Direct Lake performance on a Lakehouse table with 500 small files. What should you do?” (Answer: OPTIMIZE with V-Order).

Practice Questions (Sample Scenarios)

Q1: A data engineer needs to load data from an on-premises SQL Server into a Fabric Lakehouse. The data center has strict firewall rules that do not allow inbound connections. What should they configure?

A) VNet data gateway B) On-premises data gateway C) Direct JDBC connection D) Azure ExpressRoute

Answer: B — On-premises data gateway creates an outbound HTTPS connection from the corporate network. No inbound ports needed.

Q2: A Spark notebook processes 50,000 rows but creates 200 output files after a GROUP BY operation. What setting should be adjusted?

A) spark.sql.shuffle.partitions → reduce to 10-20 B) spark.databricks.delta.optimizeWrite → true C) spark.sql.adaptive.enabled → false D) spark.executor.memory → increase

Answer: A — 200 shuffle partitions for 50K rows creates 200 tiny files. Reducing to 10-20 creates appropriately sized files. (B also helps but A is the primary fix.)

Q3: A team needs to implement row-level security so that regional managers see only their region’s data in Power BI reports. The data is in a Fabric Warehouse. What should they create?

A) A filter function and security policy on the Warehouse table B) A DAX filter in the Power BI report C) An OneLake data access role D) A separate Warehouse per region

Answer: A — RLS in Warehouse uses T-SQL filter predicate functions with CREATE SECURITY POLICY. This enforces security at the data level, affecting all consumers including Power BI.

Q4: A streaming workload ingests 1 million events per hour from Azure Event Hubs. The team needs sub-second query performance on the last 24 hours of data and also needs to keep 90 days of history for batch analysis. What architecture should they implement?

A) Eventstream → Eventhouse only B) Eventstream → Lakehouse only C) Eventstream → both Eventhouse (real-time) and Lakehouse (historical) D) Pipeline with hourly schedule

Answer: C — The dual-path architecture: Eventstream routes to Eventhouse for real-time queries (sub-second KQL) AND Lakehouse for 90-day historical batch analysis.

Q5: A data engineer creates a Dataflow Gen2 that filters rows and adds a custom column. The dataflow processes 10 million rows but takes 45 minutes. Checking the native query for the filter step shows it is grayed out. What is the most likely cause?

A) The source does not support query folding B) The custom column step broke query folding C) The filter condition is too complex D) The data is too large for Dataflow Gen2

Answer: B — Custom columns use M formulas that cannot translate to SQL, breaking query folding. All 10M rows are downloaded before filtering. Fix: move the filter BEFORE the custom column.

Resources

Resource	Link
Official exam page	DP-700 on Microsoft Learn
Official study guide	Skills measured (April 2026)
Free practice assessment	Microsoft practice assessment
Microsoft Learn collection	DP-700 learning path
DriveDataScience Fabric posts	All 32 Fabric posts
Exam sandbox	Try the exam interface

Wrapping Up

The DP-700 is a scenario-based exam — it tests your ability to make the RIGHT decision, not just the CORRECT one. Knowing that OPTIMIZE compacts files is not enough. You need to know WHEN to use OPTIMIZE vs Z-ORDER vs VACUUM, and WHY each matters for specific scenarios (Direct Lake performance, storage cost, query speed).

Our 32 Fabric posts cover every exam objective. The 8-week study plan gives you structure. The key concepts quick reference gives you exam-day recall. And the practice questions show you the exam’s style.

Good luck on your DP-700!

The complete DriveDataScience Fabric learning path: – Start Here — All Learning Paths – Fabric Learning Path (32 posts)

Naveen Vuppula is a Senior Data Engineering Consultant and app developer based in Ontario, Canada. He writes about Python, SQL, AWS, Azure, and everything data engineering at DriveDataScience.com.

DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

Table of Contents

Exam Overview

The Three Domains

Domain 1: Implement and Manage an Analytics Solution (30-35%)

Configure Microsoft Fabric Workspace Settings

Implement Lifecycle Management in Fabric

Configure Security and Governance

Orchestrate Processes

Domain 2: Ingest and Transform Data (30-35%)

Design and Implement Loading Patterns

Ingest and Transform Batch Data

Ingest and Transform Streaming Data

Domain 3: Monitor and Optimize an Analytics Solution (30-35%)

Monitor Fabric Items

Identify and Resolve Errors

Optimize Performance

The 8-Week Study Plan

Week 1-2: Foundations (Domain 1 — Implement & Manage)

Week 3-4: Batch Ingestion (Domain 2 — Ingest & Transform)

Week 5-6: Streaming + Advanced (Domain 2 continued + Domain 3)

Week 7-8: Optimize + Review

Key Concepts You MUST Know (Quick Reference)

Lakehouse vs Warehouse Decision

Shortcut Behavior

Output Modes (Streaming)

Deployment Pipeline Rules

Security Layers

Optimization Quick Reference

Exam-Day Tips

Practice Questions (Sample Scenarios)

Resources

Wrapping Up

Leave a Comment Cancel Reply

DP-700 Certification Study Guide: Every Exam Objective Mapped to DriveDataScience Posts, Study Plan, and Tips to Pass the Microsoft Fabric Data Engineer Associate Exam

Table of Contents

Exam Overview

The Three Domains

Domain 1: Implement and Manage an Analytics Solution (30-35%)

Configure Microsoft Fabric Workspace Settings

Implement Lifecycle Management in Fabric

Configure Security and Governance

Orchestrate Processes

Domain 2: Ingest and Transform Data (30-35%)

Design and Implement Loading Patterns

Ingest and Transform Batch Data

Ingest and Transform Streaming Data

Domain 3: Monitor and Optimize an Analytics Solution (30-35%)

Monitor Fabric Items

Identify and Resolve Errors

Optimize Performance

The 8-Week Study Plan

Week 1-2: Foundations (Domain 1 — Implement & Manage)

Week 3-4: Batch Ingestion (Domain 2 — Ingest & Transform)

Week 5-6: Streaming + Advanced (Domain 2 continued + Domain 3)

Week 7-8: Optimize + Review

Key Concepts You MUST Know (Quick Reference)

Lakehouse vs Warehouse Decision

Shortcut Behavior

Output Modes (Streaming)

Deployment Pipeline Rules

Security Layers

Optimization Quick Reference

Exam-Day Tips

Practice Questions (Sample Scenarios)

Resources

Wrapping Up

Related Posts

Leave a Comment Cancel Reply