Volver al curso

Data Engineering con Python: Pipelines y Big Data

leccion
5 / 12
advanced
32 horas
ETL/ELT: Diseno y Construccion de Pipelines

dbt: Transformaciones de Datos como Codigo

Lectura
50 min~2 min lectura

dbt: El Estandar para Transformaciones de Datos

dbt ha revolucionado como los Data Engineers construyen transformaciones. Permite escribir modelos en SQL puro con versionado en Git, tests automatizados y documentacion generada. En 2025, dbt es requisito en el 60% de ofertas de Data Engineering en LATAM.

# Instalar y crear proyecto
pip install dbt-bigquery
dbt init mi_warehouse

# Estructura de proyecto dbt
mi_warehouse/
  dbt_project.yml
  models/
    staging/
      stg_orders.sql
      stg_customers.sql
    intermediate/
      int_daily_revenue.sql
    marts/
      dim_customers.sql
      fct_orders.sql
  tests/
  macros/
-- models/staging/stg_orders.sql
WITH source AS (
    SELECT * FROM {{ source('raw', 'orders') }}
),
renamed AS (
    SELECT
        order_id::VARCHAR AS order_id,
        user_id::VARCHAR AS customer_id,
        LOWER(TRIM(status)) AS order_status,
        total_amount::DECIMAL(12,2) AS order_total,
        currency,
        CASE country_code
            WHEN 'AR' THEN 'Argentina'
            WHEN 'MX' THEN 'Mexico'
            WHEN 'CO' THEN 'Colombia'
            WHEN 'BR' THEN 'Brasil'
        END AS country_name,
        created_at::TIMESTAMP AS ordered_at
    FROM source
    WHERE order_id IS NOT NULL AND total_amount > 0
)
SELECT * FROM renamed
-- models/marts/fct_orders.sql (incremental)
{{ config(materialized='incremental', unique_key='order_id') }}

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    {% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
    {% endif %}
),
customers AS (SELECT * FROM {{ ref('dim_customers') }})

SELECT
    o.order_id, o.customer_id, c.customer_segment,
    o.order_total, o.currency, o.country_name, o.ordered_at,
    ROW_NUMBER() OVER (PARTITION BY o.customer_id ORDER BY o.ordered_at) AS order_sequence
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id

🚀 dbt en Produccion

Comandos esenciales: dbt run (ejecutar), dbt test (tests), dbt docs generate (documentacion). En CI/CD: dbt build --select state:modified+ solo ejecuta lo que cambio.

dbt conecta el mundo del SQL con las mejores practicas de software engineering.