<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Andrew Ferreira]]></title><description><![CDATA[Helping churches see the one in the crowd—through data, AI, and a whole lot of prayer.]]></description><link>https://www.andrewferreira.io</link><image><url>https://substackcdn.com/image/fetch/$s_!Wp48!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d5ce1a-29bc-4710-8827-01c5a2caab93_1024x1024.png</url><title>Andrew Ferreira</title><link>https://www.andrewferreira.io</link></image><generator>Substack</generator><lastBuildDate>Sat, 02 May 2026 11:37:49 GMT</lastBuildDate><atom:link href="https://www.andrewferreira.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Andrew Gessinger Ferreira]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aprendiporai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aprendiporai@substack.com]]></itunes:email><itunes:name><![CDATA[Andrew Ferreira]]></itunes:name></itunes:owner><itunes:author><![CDATA[Andrew Ferreira]]></itunes:author><googleplay:owner><![CDATA[aprendiporai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aprendiporai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Andrew Ferreira]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Breaking the Data Loop: Our Journey to 100% Data Quality Coverage]]></title><description><![CDATA[BLUF (Bottom Line Up Front): We've increased our data testing coverage from 7% to 60% by implementing systematic quality checks and leveraging AI assistance.]]></description><link>https://www.andrewferreira.io/p/breaking-the-data-loop-our-journey</link><guid isPermaLink="false">https://www.andrewferreira.io/p/breaking-the-data-loop-our-journey</guid><dc:creator><![CDATA[Andrew Ferreira]]></dc:creator><pubDate>Tue, 20 May 2025 13:12:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8CHS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>BLUF (Bottom Line Up Front):</strong> We've increased our data testing coverage from 7% to 60% by implementing systematic quality checks and leveraging AI assistance. This approach ensures data accuracy, reduces the "data loop," and enables truly self-service analytics for ministry leaders to make timely, data-driven decisions.</p><div><hr></div><h2>The Data Loop Challenge</h2><p>We are on a journey together, right? How to break the data loop? I've been talking about this so much that I decided to create a diagram to illustrate the loop:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8CHS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8CHS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 424w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 848w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 1272w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8CHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png" width="1456" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.andrewferreira.io/i/164003002?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8CHS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 424w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 848w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 1272w, https://substackcdn.com/image/fetch/$s_!8CHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c7de526-3b1a-43e7-9bd0-d408c8a23343_1892x462.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Here is the mermaid code in case you want to adjust it a bit:</p><pre><code>flowchart LR
    %% Define node styles
    classDef leader fill:#D4F1F9,stroke:#05445E,stroke-width:2px,color:#05445E,font-weight:bold
    classDef analyst fill:#FFE6CC,stroke:#D79B00,stroke-width:2px,color:#783F04,font-weight:bold
    classDef engineer fill:#D5E8D4,stroke:#82B366,stroke-width:2px,color:#3D6143,font-weight:bold
    classDef decision fill:#F8CECC,stroke:#B85450,stroke-width:2px,color:#6E0000,font-weight:bold,border-radius:10px
    
    %% Nodes with improved descriptions
    A[/"Business Leader&lt;br/&gt;asks a question"/]
    B["Data Analyst&lt;br/&gt;searches for data"]
    C{"Is the data&lt;br/&gt;available?"}
    D["Data Analyst&lt;br/&gt;models data"]
    E["Data Engineer&lt;br/&gt;creates new data point"]
    F["Data Analyst&lt;br/&gt;updates model"]
    G["Data Analyst&lt;br/&gt;answers question"]
    H[/"Business Leader&lt;br/&gt;has more questions"/]

    %% Applying styles
    class A,H leader
    class B,D,F,G analyst
    class E engineer
    class C decision
    
    %% Connections with labels and styles
    A ==&gt; B
    B ==&gt; D
    D ==&gt; C
    
    %% Data missing branch
    C -- "No" --&gt; E
    E -- "New table/view&lt;br/&gt;created" --&gt; F
    F --&gt; D
    
    %% Data available branch
    C -- "Yes" --&gt; G
    G ==&gt; H
    
    %% Close the loop with styled edge
    H -. "New question" .-&gt; A
    
    %% Add a title
    subgraph "The Data Loop Challenge"
    end</code></pre><p>In order to reduce this loop, we are trying to empower analysts or even business leaders to ask questions without having to go through the entire cycle. But to get there, we concluded that we need to clean up our data assets (as discussed last week) and ensure data accuracy. Here's the deal: if we want to empower non-technical people to interact with data without supervision, we need to be sure that our data is reliable enough that no one needs to "double-check" it.</p><h2>From 7% to 60%: Our Quality Testing Journey</h2><p>You might be shocked to know that out of our entire data warehouse, only 7% of our assets had some sort of quality check to validate the data. The way we were operating was simply developing new queries with the data analyst, validating the data at that point, and if everything passed our initial validation, we would deploy the new asset to our warehouse.</p><p>But surprise, surprise, data changes. An asset can start reporting bad data, and people can make decisions with it without noticing a subtle change in the backend causing a metric to go in the wrong direction.</p><p>For that reason, we set a goal to raise our data coverage from 7% to 100%. We're actually working toward that goal right now. We are currently at 60% coverage and increasing.</p><h2>How Dataform Assertions Work</h2><p>What does that mean in practice?</p><p>At our church, we use <a href="https://cloud.google.com/dataform/docs/overview">Dataform</a> to transform our data. It's a very helpful tool similar to dbt but with the advantage of being within our BigQuery environment.</p><p>The way Dataform works, you define data models in a SQLx file, where a<a href="https://cloud.google.com/dataform/docs/overview#sqlx_file_config_block"> config header is required</a>. This config header includes many settings including what are called "assertions," which are quality tests that run right after a SQLx file is executed.</p><p>There are a few useful assertions that we can apply to our data. Some are very simple, such as defining what a unique row looks like in this model (e.g., IDs or a combination of columns) and columns that should never be null for each particular model. I know this is database 101, but when you use a data warehouse such as BigQuery that is very flexible with database structure, quality checks like these can go unnoticed, causing your data to not behave as expected.</p><p>The assertions also give you the ability to add row condition tests. While the body of your SQLx defines the business logic with SQL-like code, you can set your row condition assertion to test it. And I'll be honest, it feels redundant sometimes because it feels like you're already excluding bad data with proper joins and WHERE clauses. Many times I thought to myself, "OK, this assertion is not needed," but guess what? It was needed. It caught things in the data that I wasn't even aware existed.</p><h2>Surprising Findings from Quality Testing</h2><p>Another common row assertion logic that we're putting in place is to catch bad data entry. Our church started in 1996, and we're in 2025. How is it possible that someone got baptized in the year 0225? I know, I know, this type of bad entry should have been caught further left in the system that allowed the entry, but nevertheless, the data made it through our data warehouse, causing our reports to be inaccurate. This extended the data loop because business leaders were questioning the integrity of the data. Without an assertion looking out for us, it would be painful to find those eight bad records claiming someone was baptized in 225 A.D.</p><p>Another lesson we learned is that if we already tested all the underlying tables of a view, there's no need to test the view itself, right? We were surprisingly wrong about that assumption. Having assertions on the view level requires the view to actually run, and we found that we had a few views that had good tables behind them but, thanks to bad syntax, the views weren't actually running. Again, these views were published and live in the data warehouse, meaning it's possible that a dashboard was connected to such a view. If a business leader tried to open that dashboard, it would trigger the execution of the view, which would have failed, leading the business leader to come back to us with questions&#8212;triggering the whole data loop again.</p><p>This rigorous testing approach directly supports our goal by ensuring that ministry leaders can trust the data they're using for decision-making.</p><h2>Alfred: Our AI Assistant for Test Coverage</h2><p>Now, let me be honest with you, adding assertions to assets is not fun. We need to understand the data model, think of ways to test it, and having to fix the issues the assertions are finding is even worse. So we had to come up with a way to speed up this process.</p><p>That's why we created Alfred. Alfred is not an acronym for something smarter; it's just a reference to the Batman movies where Alfred was a major helper. Alfred is a Claude Agent that we created to be another data engineer on our team. We have provided Alfred with all the context we have documented (we'll talk about documentation next week) and our standards for quality testing. Now, every time we need to create assertions, we can start by copying and pasting the SQLx to Alfred so we can work together on developing ways to test if the business logic is being implemented correctly.</p><p>Alfred has definitely sped up our process. We were at 7% coverage two months ago and have already hit the 60% mark thanks to Alfred.</p><h2>The Interconnected Warehouse: Unexpected Benefits</h2><p>One side effect of adding assertions to every model in our data warehouse is that every time we try to publish a new model, all assertions are tested to make sure we're not breaking anything while deploying something new. And I don't know how we made it this far without the assertions, because we're finding out that our warehouse is so interconnected. A merge request is taking longer to be approved because all these assertions are thankfully catching issues before they become problems.</p><p>That's how we're making progress toward our vision of an AI-ready and self-service data warehouse&#8212;<a href="https://www.andrewferreira.io/p/building-an-ai-ready-data-warehouse">reducing obsolete data assets</a> and making sure the ones that remain are well-tested. This approach directly supports our goal by ensuring accurate data that truly represents our ministry impact.</p><p>Next step? We need to talk about documentation. Stay tuned for next week's post!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Building an AI-Ready Data Warehouse: The Cleanup Journey]]></title><description><![CDATA[BLUF (Bottom Line Up Front): To create an AI-ready data warehouse that supports self-service analytics, we've implemented a systematic cleanup process that removed over 3,000 unused data assets.]]></description><link>https://www.andrewferreira.io/p/building-an-ai-ready-data-warehouse</link><guid isPermaLink="false">https://www.andrewferreira.io/p/building-an-ai-ready-data-warehouse</guid><dc:creator><![CDATA[Andrew Ferreira]]></dc:creator><pubDate>Tue, 13 May 2025 12:48:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Wp48!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d5ce1a-29bc-4710-8827-01c5a2caab93_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>BLUF (Bottom Line Up Front):</strong>  To create an AI-ready data warehouse that supports self-service analytics, we've implemented a systematic cleanup process that removed over 3,000 unused data assets. This approach helps us deliver faster insights to ministry leaders and reduces the risk of AI hallucinations by eliminating outdated information.</p><div><hr></div><h2><strong>The Data Loop Problem</strong></h2><p>Last week I mentioned something that has been stuck in my mind for days: the data loop. The long back and forth process from leaders wanting to be data-driven, to analysts, to data engineers, back to analysts and finally back to leaders after days, sometimes weeks after a simple question was asked.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is simply not acceptable anymore. AI has arrived and everyone around us expects this process to be faster and better, and they are right.</p><p>We want our church to reach more people and we believe data is a tool to help us get there, so data insights must be ready when leaders ask questions.</p><p>For that reason we are committed to creating a data warehouse that is AI-ready and suitable for self-service analytics. But getting there is not easy. Here are a few elements of our strategy to help us get there.</p><h2>Step 1: Clean-up</h2><p>We have been doing data for over 10 years at our church, which means many, many data assets have been created over time. Many of these assets became obsolete over time, but they are still available in our data warehouse.</p><p>We are on a mission of eliminating unused data assets and combining assets that can be combined.</p><h3>The Materialized Views Challenge</h3><p>The clean-up task was particularly challenging. We use BigQuery to build our data warehouse and decided to implement our golden assets (the final assets consumed by our clients such as app, Tableau, etc.) as materialized views instead of tables. This approach allows us to change the underlying structure of the view without any of the destinations noticing any restructuring of the backend.</p><p>This was a great decision that gave the engineering team enough flexibility to make bold moves, but on BigQuery we have no way to use the logs to find out what view is being consumed. If view <code>vw_my_view</code> is referencing <code>table_a</code> and <code>table_b</code> in a simple query such as:</p><pre><code><code>select * from table_a, table_b</code></code></pre><p>Google Cloud Platform (GCP) will provide logs that <code>table_a</code> and <code>table_b</code> are being used, but not <code>vw_my_view</code>. And this is a problem, because both tables might be still important for other useful views, but <code>vw_my_view</code> itself became obsolete because it has not been used for months.</p><h3>Our Technical Solution</h3><p>Still, GCP does provide the query text, and with the help of AI that was enough:</p><pre><code><code>INSERT INTO 
`project_id.dataset_id.views_usage`
WITH
 model_results AS (
SELECT
 ml_generate_text_llm_result,
 queried_at,
 user_email
FROM
 ML.GENERATE_TEXT( MODEL `dataset_id.gemini`,
(
SELECT
CONCAT( 'For the given query, generate the following metadata for each referenced table or view and return ONLY a valid JSON array with objects containing these exact fields: project_id, dataset_id, table_id. Query: ', query) AS prompt,
 queried_at,
 user_email
FROM
 `dataset_id.jobs` ),
STRUCT( 0.2 AS temperature,
TRUE AS FLATTEN_JSON_OUTPUT ) ) )
SELECT
 JSON_EXTRACT_SCALAR(table_json, "$.project_id") AS project_id,
 JSON_EXTRACT_SCALAR(table_json, "$.dataset_id") AS dataset_id,
 JSON_EXTRACT_SCALAR(table_json, "$.table_id") AS table_id,
 queried_at,
 user_email
FROM
 model_results,
UNNEST(JSON_EXTRACT_ARRAY(REGEXP_REPLACE(ml_generate_text_llm_result, r'```json\n|\n```', ''))) AS table_json</code></code></pre><p>The query above is capable of taking the entire query text found in the logs, and then combines the query text with pre-made instructions I give to Gemini specifying what I want from the query text and what format I want the results in. Gemini then takes the entire prompt with the query text (an unstructured data) and returns all the table/view IDs being called in the query. Finally, I have the last step of the query to make sure that each table/view identified by Gemini is represented with its own row in the query results. Voil&#224;. That's how we found a way to catch materialized views that have not been used by any client.</p><h3>Results and Risk Management</h3><p>Thanks to this view catcher, we now have a scheduled job searching daily for unused views which then triggers an alert to the data engineers to start the process of decommissioning them.</p><p><strong>Over the past few weeks, we were able to delete over 3,000 assets.</strong></p><p>Think about this number and how it can lead to outdated answers to leaders and contribute to AI hallucination. These are over 3,000 tables or views that haven't been used in a while. All these data assets might have old and obsolete metadata such as descriptions or even metrics that no one uses anymore. If I provide all these assets to AI, AI would not be able to discern what's good or bad, so if a pastor uses an AI solution to ask about a data point we might have in store, AI might answer with one of the 3,000 bad data points that no one uses anymore. That's why this is a reason to celebrate and that's why if you are also pursuing an AI-ready database, you also need to find ways to identify the lifecycle of the assets of your data warehouse as well.</p><p>We definitely broke some things in the process, don&#8217;t be surprised if you break some things on your end as well. Every time we're about to delete an object from our data warehouse, we move these objects to what we call a "dumpster dataset" within BigQuery. This dumpster has an expiration period of 30 days. Yup, all tables or views moved to this dataset auto-delete in 30 days, but this is enough time for the "scream test" to notify us (if we move an asset and it breaks something important, someone or something will scream, hence "scream test").</p><h3>Next Steps</h3><p>This is definitely a lot of work, but this is taking us much closer to an AI-ready data warehouse where only relevant data is stored so that our users, either pastors or data analysts, can ask questions about our data without all the noise from unused or outdated data.</p><p>This is just step one from our strategy to building an AI-ready and self-service-ready data warehouse. Our next step is expanding test coverage and following the "document-first" approach when building data assets. Let's unpack these other steps in the coming weeks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Tableau Conference 2025]]></title><description><![CDATA[An engineer's perspective]]></description><link>https://www.andrewferreira.io/p/tableau-conference-2025</link><guid isPermaLink="false">https://www.andrewferreira.io/p/tableau-conference-2025</guid><dc:creator><![CDATA[Andrew Ferreira]]></dc:creator><pubDate>Sun, 04 May 2025 22:49:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rkeu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>BLUF (Bottom Line Up Front):</strong> The data visualization industry is transforming through AI integration. To stay ahead, we should: 1) Move business logic from visualization tools to the data layer, 2) Invest heavily in metadata and semantic layers, and 3) Strengthen our data governance framework.</p><div><hr></div><p>Tableau is a major player when it comes to data visualization, and they just hosted a conference in San Diego, California to launch new features and explain their agentic era: how a data visualization company is positioning itself for the AI revolution, and a word of advice to data analysts on how to adapt in this transforming season the industry is going through.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Here are my main takeaways as an engineer attending a conference that <em>used</em> to be focused primarily on data analysis:</p><h2><strong>The "move to the left" movement</strong></h2><p>The usual diagram to describe the flow of data usually starts with all the data sources on the far left, and then multiple arrows go from there, passing through many boxes in the middle where the ETL (Extract, Transform, and Load) processes happen &#8211; or in our case ELT &#8211; and the final step is the data destinations such as Tableau where all the data is visualized for impactful ministry decisions.</p><p>One mistake we (and many others in the industry) have made over the years was to build business logic on the visualization tools. They are usually hidden in a form of calculated fields within published data sources in Tableau or, worse yet, inside isolated Tableau workbooks and views that are being used on final dashboards for leaders to use and make executive and ministry decisions.</p><p>The issue here is that these calculated fields are gold with precious business logic locked in a data visualization tool, that if done well is in a published data source so that other workbooks and views can benefit from it or worse, these precious business logic can be hidden in a workbook, so that if another data analyst needs that logic they will have to rebuild it, creating what all of us data professionals hate the most: same metric, different results.</p><p><strong>The advice here is to move all the business logic to the left,</strong> closer to the data. Rebuild those precious and rich calculated fields packed with CASE and IF statements and move them away from the visualization layer, back to the data layer.</p><p>The main immediate benefits here are:</p><ul><li><p>Unlocking the business logic from the visualization layer and making it available to all data destinations that might benefit from knowing key metrics that our organization cares about</p></li><li><p>It eliminates or minimizes the risk of having to recreate the same metric which significantly increases the risk of the same metric reporting different numbers</p></li><li><p>Bringing the logic to the data layer allows agents (GPTs) to read our entire diagram codified with all the declared data sources, SQL transformations, and finally all the business logic with the proper description so that it quickly answers leader questions like "what is this metric?" and lineage questions such as "where did you get this data?" and even for other data engineers, it helps us to build other queries using existent components found in the diagram.</p></li></ul><p>Now, let's address the elephant in the room. If you are a data analyst, even the idea of eliminating calculated fields probably makes you cringe a bit. I get it. I will confess that my immediate reaction when I heard this advice was like "That's it! Let's all stop the calculated fields for good", but a more non-nuclear approach proposed in the conference was to keep giving analysts the freedom to do what they do best: create with leaders all the business logic they need using the tools and language they are used to; and we, the engineers, simply need to pay attention if those new dashboards will take off and have high visibility; if so, then that's the signal for us to go ahead and move those rich business logic back to the data layer so that other analysts, data destinations and GPTs can benefit from it.</p><p><strong>Action Step for us:</strong></p><ul><li><p>Explain to leaders and other data analysts the benefits of this approach</p></li><li><p>Find the most popular dashboards and start transitioning those calculated fields to the data layer</p></li><li><p>Implement a process to continually look for new and popular dashboards and migrate any calculated fields to the data layer</p></li></ul><p></p><h2><strong>Metadata, Metadata, Metadata</strong></h2><p>I am sure you have been there before; you have a problem to solve and have spent hours finding the solution. Now it's time to double-check with a know-it-all GPT to ensure you are not going crazy with this novel and perfect solution you found out. But quickly you realize that whatever GPT you are using, knows nothing about you or the problem. It has access to your solution, but it also has no idea of what other data assets you have available to use to give you the most efficient answer or even find a much better and more concise solution to your problem.</p><p>And this is not an issue just for the data professionals; if you are a leader or a pastor trying to understand what data points are available for you to make a huge impact on your organization; if one of them goes to a GPT they are in a much worse place because they might not have any tools to provide GPT with the context it needs.</p><p>In this AI revolution you often hear that data is gold, well if that's the case, then metadata must be the harvesting tool we need.</p><p>The advice here is to ensure that all data assets you have are packed with metadata because all this metadata will later become part of your prompt to a GPT; it will give GPT all the context &#8211; data source, ETL process, business logic and data destination &#8211; so that when you come with a problem to solve or a simple question about what data point is available it has all the context it needs to assist you to find the answer quickly and effectively.</p><p>And according to the conference and my own research, something like dbt or Dataform provides you the tools you need to metadata everything, but then dbt and Cube Dev go one step beyond by offering a way for you to build a semantic layer, a universal layer that allows you to move your business logic and definitions everywhere you go while providing a standard framework to write documentation as code where all metrics, entities, and relationships of your data are declared, which is exactly what a GPT needs to shine at its job.</p><p>This is an image I saw often at the conference that explains well that all the standard ETL/ELT process is still needed, and the semantic layer comes on top of it to allow what they called "data portability" where you "define once, use everywhere" on Tableau or any other data destinations including LLMs to answer simple questions and help engineers build new data models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rkeu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rkeu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 424w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 848w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 1272w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rkeu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png" width="1382" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1382,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:337889,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aprendiporai.substack.com/i/162847800?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rkeu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 424w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 848w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 1272w, https://substackcdn.com/image/fetch/$s_!rkeu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac3377d-0c67-460c-aa44-aa87672958e9_1382x896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Another benefit of having a semantic layer is that it breaks what I call the data loop: someone asks for an analysis, the data analyst tries to find a dashboard or a table in the warehouse that answers that question, but nothing does, so they contact the engineer who will then find in the raw tables something that needs to be transformed properly, tested and deployed to the warehouse, so that the analyst can take it, analyze it and finally answer the question to the leader who now has four new questions.</p><p>The semantic layer helps us reduce or even eliminate the data loop because it empowers analysts to safely join things, it empowers GPTs to assist engineers in developing new queries, and it equips non-technical people to browse a comprehensive data dictionary and even, in a future not too far away, to safely query our data.</p><p><strong>Action Steps for us:</strong></p><ul><li><p>The "garbage in, garbage out" rule is still true, so we need to get ready for the semantic layer by simplifying our data models and adding all the metadata to them.</p></li><li><p>Find and implement a semantic layer solution</p></li></ul><p></p><h2><strong>It's time to govern like never before</strong></h2><p>If you thought that data governance was important, it just became a lot more critical. The main idea of Tableau is to democratize data, and the "move to the left" and metadata approach definitely invites more people to do data with us, which also raises the risk of bad analyses and bad queries.</p><p>That's why we will have to focus more and more on data governance so that GPTs and others can get their answers safely.</p><p>Tableau provided this framework about the core components of data governance that we should pursue:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uqdx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uqdx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 424w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 848w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 1272w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uqdx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png" width="1456" height="897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:897,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1198039,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aprendiporai.substack.com/i/162847800?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uqdx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 424w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 848w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 1272w, https://substackcdn.com/image/fetch/$s_!uqdx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4918db-4c69-4e4a-92c4-3238b8ed3c6e_2516x1550.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As I reflected on these components and attended multiple sessions, I heard and saw enough things sprinkled here and there to help build a more solid picture of what this looks like in real life.</p><p>The main ones that caught my attention were:</p><p><strong>Data Lifecycle Management</strong>: We all love to create new things, but archiving obsolete data that no one else uses should be part of our routine, and as exciting as launching a new and shiny thing.</p><p><strong>Data Quality and Master Management</strong>: Test coverage was my takeaway here. The recommendation is to break long queries into small queries (they call them data models), which allows us to reuse these small pieces of logic in more places and test them individually to ensure the data pipeline is moving from sources to the dashboard as expected.</p><p><strong>Data Discovery and Understanding</strong>: It goes back to the semantic layer and the metadata concept. It reinforces the idea of documenting everything, so that we can search for data points using natural language and combining all business logic in one place; "define once, use everywhere."</p><p><strong>Action Steps:</strong></p><ul><li><p>Add metadata to all data assets</p></li><li><p>Expand test coverage to 100%</p></li><li><p>Define and implement data lifecycle</p></li><li><p>Launch a data dictionary</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.andrewferreira.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>