Transitioning from dbt to Agile Data Engine with Gen AI

Dec 16, 2024 12:12:08 PM

Adapting to resilient DataOps using Gen AI 

We’ve all been there. The data is piling up, the team is working overtime, and it becomes pretty clear: we might’ve outgrown our current data tools. Whether we’re talking dbt, Matillion, Microsoft SSIS, or something else, these tools have their perks but also their limits, especially when it comes to handling more data, more people, and complex governance. 

We start running into issues like struggling to scale our data development, keeping our governance tight, making sure the team can work together smoothly, and managing changes without wanting to pull our hair out. And let’s not even talk about getting new team members up to speed with our setup! 

Moving to Agile Data Engine Without the Headache 

The idea of moving to something like the Agile Data Engine seems tempting but also a huge task, right? How do we move everything without losing data, wasting time, or having to press pause on our important data work?

What if we could use generative AI to make this move a bit easier? 

 

Combining smart tech with data architect expertise

Using generative AI doesn’t mean we just press a button and sit back. We need to use its capabilities to handle highly detailed, tedious work while our expertise ensures the migration fits what the company needs. It’s about combining smart tech with our years of cultivated knowledge that ensures the shift to Agile Data Engine is smooth, efficient, and sensible for you. 

A manageable, strategic move to Agile Data Engine, backed by the smart working of generative AI, offers a path to better scalability, governance, and team collaboration for your data function and environments.

 

Using GenAI: a step-by-step guide to migrating from dbt Models to Agile Data Engine Packages 

So, let’s look at dbt as an example to picture how this might work. Imagine if generative AI could take a look at how we’ve set things up in dbt, understand all our data models and transformations, and then do the heavy lifting in moving this setup over to Agile Data Engine. 

Generative AI can translate our transformations from the dbt setup into something that works in Agile Data Engine, keeping our data consistent and intact. Plus, imagine not having to set up governance, collaboration settings, and dependency management from scratch in the new system because generative AI has got it covered. 

dbt Models 

With dbt, data (stored in e.g. BigQuery, Databricks, Redshift or Snowflake) can be transformed into views and tables by writing SQL select statements — called “models” in dbt — which dbt turns into tables or views in a data warehouse. 

We’ll start with a simple dbt project taken from the dbt quickstart documentation for dbt Cloud and Snowflake, which contains two models for staging tables, stg_customers and stg_orders, and a customers model which is constructed from the two staging tables.  

stg_customers.sql contents: 

select 
     id as customer_id, 
     first_name, 
     last_name 
 
from raw.jaffle_shop.customers 

stg_orders.sql contents: 

select 
     id as order_id, 
     user_id as customer_id, 
     order_date, 
     status 
 
from raw.jaffle_shop.orders 

customers.sql contents: 

with customers as ( 
 
    select * from  
 
), 
 
orders as ( 
 
    select * from  
 
), 
 
customer_orders as ( 
 
    select 
         customer_id, 
 
        min(order_date) as first_order_date, 
         max(order_date) as most_recent_order_date, 
         count(order_id) as number_of_orders 
 
    from orders 
 
    group by 1 
 
), 
 
final as ( 
 
    select 
         customers.customer_id, 
         customers.first_name, 
         customers.last_name, 
         customer_orders.first_order_date, 
         customer_orders.most_recent_order_date, 
         coalesce(customer_orders.number_of_orders, 0) as number_of_orders 
 
    from customers 
 
    left join customer_orders using (customer_id) 
 
) 
 
select * from final 

 

Packages in Agile Data Engine

Now, suppose we want to import the previous dbt project into Agile Data Engine. The tables and views, called entities in Agile Data Engine, are imported into Agile Data Engine as a package. Packages can be imported to Agile Data Engine using a JSON format: 

{ 
   "packageName": "string", 
   "templateVersion": "2.11", 
   "dataPackage": { 
     "packageId": "<uuid>", 
     "packageName": "dbt_to_ade_demo" 
   }, 
   "entities": [ 
     { 
       "entityId": "<uuid>", 
       "packageId": "<uuid>", 
       "packageName": "dbt_to_ade_demo", 
       "entityLogicalName": string, 
       "entityType": "GENERIC", 
       "physicalType": "TABLE" | "VIEW" 
     } 
   ], 
   "attributes": [ 
     { 
       "attributeId": "<uuid>", 
       "entityId": "<uuid>", 
       "attributeName": string, 
       "position": number -- positions start from 1, 
       "datatype": "INTEGER8" | "VARCHAR" | "DATE", 
       "dataLength": number -- required for VARCHAR (default 255), 
       "isPersistent": true 
     } 
   ], 

} 
 
  

In a JSON-formatted Agile Data Engine package, the 'customers' entity could look something like:  

{ 
   "entityId": "<uuid>", 
   "packageId": "<uuid>", 
   "packageName": "dbt_to_ade_demo", 
   "entityLogicalName": "customers", 
   "entityType": "GENERIC", 
   "physicalType": "TABLE" 
} 

and the attributes could be declared separately as: 

[ 
   { 
     "attributeId": "<uuid>", 
     "entityId": "<uuid>", 
     "attributeName": "customer_id", 
     "position": 1, 
     "datatype": "INTEGER8", 
     "isPersistent": true 
   }, 
   { 
     "attributeId": "<uuid>", 
     "entityId": "<uuid>", 
     "attributeName": "first_name", 
     "position": 2, 
     "datatype": "VARCHAR", 
     "dataLength": 32, 
     "isPersistent": true 
   }, 
   { 
     "attributeId": "<uuid>", 
     "entityId": "<uuid>", 
     "attributeName": "last_name", 
     "position": 3, 
     "datatype": "VARCHAR", 
     "dataLength": 32, 
     "isPersistent": true 
   }, 

... 
] 

Moving from dbt to Agile Data Engine 

Now, we're ready to leverage generative AI, like the well-known ChatGPT, to generate an ADE package from dbt models. 

To transform dbt models into a valid JSON-formatted ADE package, we need to supply the dbt models, a template for the JSON-formatted ADE package, and specific instructions for the used generative AI application. Here’s an example of how the instructions might look: 

Each of the sql files defines a single dbt model through a select query. Convert the models into entities (one entity per dbt model) in a package in the JSON-formatted ADE package. 

Use the filenames as the entity logical names. Use placeholders for the uuids that allow replacing them with actual uuids while retaining relations based on the ids, e.g. "stg_customers__customer_id__uuid_placeholder" for the attribute id of stg_customer model's attribute customer id. 

Given this input, the LLM produces a valid JSON-formatted ADE package as follows: 
 
{ 
   "packageName": "dbt_to_ade_demo", 
   "templateVersion": "2.11", 
   "dataPackage": { 
     "packageId": "<dataPackage-uuid>", 
     "packageName": "dbt_to_ade_demo" 
   }, 
   "entities": [ 
     { 
       "entityId": "<stg_customers-uuid>", 
       "packageId": "<dataPackage-uuid>", 
       "packageName": "dbt_to_ade_demo", 
       "entityLogicalName": "stg_customers", 
       "entityType": "GENERIC", 
       "physicalType": "TABLE" 
     }, 
     { 
       "entityId": "<stg_orders-uuid>", 
       "packageId": "<dataPackage-uuid>", 
       "packageName": "dbt_to_ade_demo", 
       "entityLogicalName": "stg_orders", 
       "entityType": "GENERIC", 
       "physicalType": "TABLE" 
     }, 
     { 
       "entityId": "<customers-uuid>", 
       "packageId": "<dataPackage-uuid>", 
       "packageName": "dbt_to_ade_demo", 
       "entityLogicalName": "customers", 
       "entityType": "GENERIC", 
       "physicalType": "VIEW" 
     } 
   ], 
   "attributes": [ 
     { 
       "attributeId": "<stg_customers__customer_id__uuid_placeholder>", 
       "entityId": "<stg_customers-uuid>", 
       "attributeName": "customer_id", 
       "position": 1, 
       "datatype": "INTEGER8", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_customers__first_name__uuid_placeholder>", 
       "entityId": "<stg_customers-uuid>", 
       "attributeName": "first_name", 
       "position": 2, 
       "datatype": "VARCHAR", 
       "dataLength": 255, 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_customers__last_name__uuid_placeholder>", 
       "entityId": "<stg_customers-uuid>", 
       "attributeName": "last_name", 
       "position": 3, 
       "datatype": "VARCHAR", 
       "dataLength": 255, 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_orders__order_id__uuid_placeholder>", 
       "entityId": "<stg_orders-uuid>", 
       "attributeName": "order_id", 
       "position": 1, 
       "datatype": "INTEGER8", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_orders__customer_id__uuid_placeholder>", 
       "entityId": "<stg_orders-uuid>", 
       "attributeName": "customer_id", 
       "position": 2, 
       "datatype": "INTEGER8", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_orders__order_date__uuid_placeholder>", 
       "entityId": "<stg_orders-uuid>", 
       "attributeName": "order_date", 
       "position": 3, 
       "datatype": "DATE", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<stg_orders__status__uuid_placeholder>", 
       "entityId": "<stg_orders-uuid>", 
       "attributeName": "status", 
       "position": 4, 
       "datatype": "VARCHAR", 
       "dataLength": 255, 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__customer_id__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "customer_id", 
       "position": 1, 
       "datatype": "INTEGER8", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__first_name__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "first_name", 
       "position": 2, 
       "datatype": "VARCHAR", 
       "dataLength": 255, 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__last_name__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "last_name", 
       "position": 3, 
       "datatype": "VARCHAR", 
       "dataLength": 255, 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__first_order_date__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "first_order_date", 
       "position": 4, 
       "datatype": "DATE", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__most_recent_order_date__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "most_recent_order_date", 
       "position": 5, 
       "datatype": "DATE", 
       "isPersistent": true 
     }, 
     { 
       "attributeId": "<customers__number_of_orders__uuid_placeholder>", 
       "entityId": "<customers-uuid>", 
       "attributeName": "number_of_orders", 
       "position": 6, 
       "datatype": "INTEGER8", 
       "isPersistent": true 
     } 
   ] 
} 

 

The future is now: Redefining DataOps with GenAI 

Adding generative AI to our data work is becoming the standard, rather than a mere passing phase. As managing data gets more complicated and the amount of data grows, combining Agile Data Engine with generative AI isn't just solving the problems we are used to facing today; it's also starting a revolution. This catalyst is reframing how we tackle present difficulties and preparing companies to quickly adjust to new data challenges ahead, keeping us at the forefront of innovation in technology and operations. 

That said, if there's something more you want to know about how easy it can be to switch to Agile Data Engine (such as best practice for using GenAI for migrations to Agile Data Engine from other tools) reach out and let us know what you need to know! 😊 We also provide demos that show how Agile Data Engine helps improve your data quality and ways of working to restore trust in data in your organization and transform data into a strong business driver.