Azure - DriveDataScience

Real-Time Intelligence in Microsoft Fabric: Eventstream, Eventhouse, KQL Database, Real-Time Dashboards, and Processing Millions of Events Per Second

Leave a Comment / Azure, Data Engineering

Master Fabric Real-Time Intelligence. Eventstream for no-code streaming ingestion with 12 supported sources and in-flight transformations. Eventhouse for petabyte-scale time-series storage. KQL vs SQL comparison with essential queries. Real-Time Dashboards with auto-refresh. Data Activator for alerts. Four real-world scenarios (IoT factory, clickstream, financial transactions, app logs). The dual-path batch plus real-time architecture.

Real-Time Intelligence in Microsoft Fabric: Eventstream, Eventhouse, KQL Database, Real-Time Dashboards, and Processing Millions of Events Per Second Read More »

Fabric Notebooks: The Complete Guide to Spark Environments, Library Management, mssparkutils, Multi-Language Cells, and Production Notebook Patterns

Leave a Comment / Azure, Data Engineering

The complete Fabric Notebooks guide. Four languages in one notebook (PySpark, SQL, Scala, R) with temp view bridging. Spark Environments for persistent library management. Installing PyPI, wheel, and jar packages. mssparkutils deep dive: fs (file operations), notebook (run, runMultiple, exit), credentials (Key Vault, tokens), env (workspace info). Notebook chaining: percent-run vs mssparkutils.notebook.run vs runMultiple for parallel. The Config-Functions-Main production pattern. Parameters from pipelines with widgets. Session management, Spark configuration for performance, and error handling patterns.

Fabric Notebooks: The Complete Guide to Spark Environments, Library Management, mssparkutils, Multi-Language Cells, and Production Notebook Patterns Read More »

Fabric Git Integration and Deployment Pipelines: Version Control, CI/CD, and Promoting Changes from Dev to UAT to Production

Leave a Comment / Azure, Data Engineering

Complete Fabric CI/CD guide. Git Integration for source control: connecting to Azure DevOps and GitHub, commit and sync workflow, branching strategies with feature branch workflow, conflict resolution. Deployment Pipelines for release: creating stages (Dev, Test, Prod), assigning workspaces, one-click deployment, deployment rules for environment-specific connections, selective deployment. Three real-world scenarios (new pipeline development, hotfix, team of 5 collaboration). API automation, recommended production setup, and the complete flow from feature branch to production.

Fabric Git Integration and Deployment Pipelines: Version Control, CI/CD, and Promoting Changes from Dev to UAT to Production Read More »

Mirrored Databases in Microsoft Fabric: Real-Time Replication from SQL Server, Cosmos DB, Snowflake, and PostgreSQL Without Building a Single Pipeline

Leave a Comment / Azure, Data Engineering

Master Fabric Mirrored Databases. What mirroring is and how it eliminates ingestion pipelines, how it works under the hood (transaction log, initial snapshot, continuous CDC). Setup guides for all sources: Azure SQL Database, SQL Server 2016-2025, Cosmos DB (JSON to Delta conversion), Snowflake, and PostgreSQL. Querying mirrored data via SQL endpoint, Spark notebooks, and Power BI Direct Lake. Mirroring plus shortcuts for multi-cloud joins. Four real-world scenarios (e-commerce, banking, IoT, multi-cloud). Mirroring vs pipelines comparison and when to use both. Free compute, limitations, and security considerations.

Mirrored Databases in Microsoft Fabric: Real-Time Replication from SQL Server, Cosmos DB, Snowflake, and PostgreSQL Without Building a Single Pipeline Read More »

Microsoft Fabric Warehouse: The Complete Practical Guide — T-SQL, Tables, Views, Stored Procedures, Security, and Building Your Gold Layer

Leave a Comment / Azure, Data Engineering

The hands-on Fabric Warehouse guide. Creating tables with T-SQL, loading data three ways (cross-database from Lakehouse, pipeline Copy, T-SQL MERGE), SCD Type 1 and Type 2 with MERGE, views for Power BI (monthly revenue, customer 360), stored procedures with TRY/CATCH transactions (dimension loading, full ETL), schemas (staging/gold/reports), table cloning, complete security implementation (object-level GRANT/DENY, row-level security with filter functions, column-level security, dynamic data masking), cross-database Warehouse+Lakehouse queries, and a complete star schema build script.

Microsoft Fabric Warehouse: The Complete Practical Guide — T-SQL, Tables, Views, Stored Procedures, Security, and Building Your Gold Layer Read More »

Microsoft Fabric Lakehouse: The Complete Practical Guide — Tables, Files, Notebooks, SQL Endpoint, Delta Lake, and Building Your First Data Lake

Leave a Comment / Azure, Data Engineering

The hands-on Fabric Lakehouse guide. Tables vs Files sections explained, three upload methods (UI drag-and-drop, notebook, pipeline), reading CSV/JSON/Parquet/Excel in notebooks, creating Delta tables with PySpark and SparkSQL, managed vs unmanaged tables, schema management (bronze/silver/gold), essential notebook operations (read, write, append, overwrite, Delta MERGE, OPTIMIZE, VACUUM, time travel), SQL analytics endpoint in practice (querying, creating views, read-only limitations), shortcuts, Medallion Architecture setup, and end-to-end CSV-to-dashboard example.

Microsoft Fabric Lakehouse: The Complete Practical Guide — Tables, Files, Notebooks, SQL Endpoint, Delta Lake, and Building Your First Data Lake Read More »

Dataflow Gen2 in Production: Pipeline Integration, Parameterization, Incremental Refresh, Performance Optimization, and the Complete Decision Guide

Leave a Comment / Azure, Data Engineering

Take Dataflow Gen2 to production. Pipeline integration patterns (Copy then Dataflow then Notebook then Refresh), parameterization (create, use in filters, pass from pipeline), incremental refresh with date filters and watermark tables, query folding explained with foldable vs non-foldable steps table, performance optimization (reduce at source, column selection, buffering), monitoring and debugging, the complete Dataflow Gen2 vs Notebook decision matrix with 20 scenarios, Medallion Architecture mapping, and three real-world production examples.

Dataflow Gen2 in Production: Pipeline Integration, Parameterization, Incremental Refresh, Performance Optimization, and the Complete Decision Guide Read More »

Dataflow Gen2 Advanced Transformations: Merge Queries, Append, Pivot, Group By, Custom Columns, and Error Handling

Leave a Comment / Azure, Data Engineering

Master Dataflow Gen2 advanced transformations. Merge Queries with all 6 join types and fuzzy matching, Append Queries for UNION ALL, Group By with multiple aggregations, Pivot and Unpivot (with Unpivot Other Columns best practice), Conditional Columns as no-code CASE WHEN, Custom Columns with 25+ M formula examples (string, date, null handling, conditional), Replace Errors and try-otherwise pattern, Data Profiling (column quality, distribution, profile), complete 9-step Bronze-to-Silver example, and when Dataflow Gen2 reaches its limits.

Dataflow Gen2 Advanced Transformations: Merge Queries, Append, Pivot, Group By, Custom Columns, and Error Handling Read More »

Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL

Leave a Comment / Azure, Data Engineering

The complete Dataflow Gen2 introduction. What it is vs ADF Mapping Data Flows vs Spark Notebooks, Power Query engine and M language explained, the UI walkthrough with three panels, connecting to all source types (Lakehouse, SQL, CSV, SharePoint), every basic transformation step-by-step (Choose Columns, Filter, Rename, Change Type, Replace Values, Add Column from Examples, Trim, Split, Fill Down, Remove Duplicates, Sort), writing to Lakehouse and Warehouse destinations with Replace vs Append update methods, and monitoring runs.

Dataflow Gen2 in Microsoft Fabric: Introduction, Power Query Basics, Connecting to Sources, and Your First No-Code ETL Read More »

Lakehouse vs Warehouse in Microsoft Fabric: When to Use Which, What Languages Work Where, and Real-World Scenario Guide

Leave a Comment / Azure, Data Engineering

The definitive Lakehouse vs Warehouse guide for Microsoft Fabric. Side-by-side comparison across 17 features, the SQL analytics endpoint explained (why read-only), languages and interfaces matrix (PySpark, SparkSQL, T-SQL — what works where), read vs write capabilities table, security model differences, five real-world scenarios (e-commerce ETL, financial reporting, IoT, Customer 360 with ML, self-service analytics), the recommended Medallion pattern (Lakehouse for Bronze/Silver, Warehouse for Gold), cross-database queries, and migration guide from Synapse/Databricks.

Lakehouse vs Warehouse in Microsoft Fabric: When to Use Which, What Languages Work Where, and Real-World Scenario Guide Read More »