Skip to content

Paul DeSalvo's blog

Home
About Me
Jupyter Notebooks
Newsletter
Archives

Search

Category: Data Engineering

Creativity Loves Constraints: Lessons from the Data Engineering Trenches

We love to dream big in data.Real-time everything. Auto-scaling infrastructure. Infinite flexibility. A tool for every use case. But back in the real world? You’re dealing with budget approvals, half-documented APIs, slow dashboards, and a team of two trying to wrangle fifteen tools. And that’s not a failure — that’s the job. In fact, that’s…

April 16, 2025
Microsoft Fabric: Finally a Way to Get Sh*t Done in Data Without Fighting the Stack

I recently joined an organization that runs entirely on the Microsoft stack—a shift for me, coming from AWS environments where I relied on third-party tools for data integration and orchestration. Frankly, I knew this was going to be a challenge. In the past, working with native Microsoft cloud tools meant stitching together brittle pipelines, jumping…

March 31, 2025
Do You Really Need Data Modeling? A Practical Look

For years, data modeling has been the foundation of structured reporting, ensuring performance, consistency, and efficiency. But today, the landscape has changed. With cheap storage, powerful processing, and modern BI tools that enable flexible, real-time analysis, is data modeling still necessary, or has it become just one of many options? Many organizations, especially startups, are…

February 5, 2025
Insights, Not Infrastructure: The True Goal of Data Engineering

“No one wants to use software. They just want to catch Pokémon.” This quote from The Staff Engineer’s Path nails a key truth: people don’t care about the tools, just the results. In data engineering, this couldn’t be more relevant. Business teams don’t want to wrestle with raw data or learn SQL; they want clear,…

January 17, 2025
Demystifying Real-Time Reporting

Real-time reporting is about making decisions based on data the moment it’s created. As businesses strive for faster insights, BI teams are often tasked with handling these requests, particularly in lean tech startups where developer resources are stretched thin. However, assigning these requests to BI teams often results in frustration and inefficiency. To deliver effective…

December 23, 2024
Streamline Your API Workflows with DuckDB

DuckDB outperforms Pandas for API integrations by addressing key pain points: it enforces schema consistency, prevents data type mismatches, and handles deduplication efficiently with built-in database operations. Unlike Pandas, DuckDB offers persistent local storage, enabling you to work beyond memory constraints and handle large datasets seamlessly. It also supports downstream SQL transformations and exports to…

November 27, 2024
Revolutionizing Data Engineering: The Zero ETL Movement

Imagine you’re a chef running a bustling restaurant. In the traditional world of data (or in this case, food), you’d order ingredients from various suppliers, wait for deliveries, sort through shipments, and prep everything before you can even start cooking. It’s time-consuming, prone to errors, and by the time the dish reaches your customers, those…

September 24, 2024
The Modern Data Stack: Still Too Complicated

In the quest to make data-driven decisions, what seems like a straightforward process of moving data from source systems to a central analytical workspace often explodes in complexity and overhead. This post explores why the modern data stack remains too complicated and how various tools and services attempt to address these challenges today.

August 30, 2024
Simplify your Data Engineering Process with Datastream for BigQuery

Datastream for BigQuery simplifies and automates the tedious aspects of traditional data engineering. This serverless change data capture (CDC) replication service seamlessly replicates your application database to BigQuery, particularly for supported databases with moderate data volumes.

May 15, 2024
The Problems with Data Warehousing for Modern Analytics

Cloud data warehouses have become the cornerstone of modern data analytics stacks, providing a centralized repository for storing and efficiently querying data from multiple sources. They offer a rich ecosystem of integrated data apps, enabling seamless team collaboration. However, as data analytics has evolved, cloud data warehouses have become expensive and slow. In this post,…

April 9, 2024

Quick Links

Jupyter Notebooks

© 2025 Paul DeSalvo’s Blog. All Rights Reserved.