Getting started with iomete & DBT

Namig Aliyev
Namig Aliyev
August 1, 2022
June 26, 2022
5 min read
Getting started with iomete & DBT

In this post we will discuss using dbt with the iomete Data Platform, and show you how to get started by connecting dbt to iomete lakehouse to bootstrap your dbt journey.

What is DBT?

DBT (data build tool) is a command line tool that enables data analysts and engineers to transform data in their warehouses more effectively. It’s extremely good at transforming data that’s already loaded into your warehouse.

dbt in iomete platform.

iomete & DBT

iomete is a managed lakehouse platform built on two powerful engines, Apache Iceberg and Apache Spark.

We built a new DBT adapter to enable our customers to leverage DBT as their transformation layer,.

Here are the links to the dbt-iomete github repository and iomete-dbt profile.

Set up and connect iomete

Prerequisites

To use DBT on the iomete platform one needs:

  • An iomete account. If you do not have an account yet, reach out via intercom chat and we'll help you on your way.
  • A running lakehouse cluster.

If the above requirements are met, let's see where to find the DBT connection parameters on the iomete platform.

Connection parameters

Username & password
Use your username and password that you used to log into the iomete platform.

Cluster name
Select your Lakehouse where you want DBT to run.

Lakehouses
NOTE: lakehouse name used as cluster_name in DBT profile setup.

Host
Enter the selected lakehouse and show connection details, and then Copy JDBC URL.

Lakehouse connection details


JDBC: jdbc:hive2://dwh-910848238944.iomete.com/;transportMode=http;ssl=true;httpPath=reporting/cliservice
 
 

Extract the host part from the JDBC URL. Hostname is a combination of  account_number with the prefix dwh- and the suffix .iomete.com.


host: dwh-910848238944.iomete.com
 
 

Prepare Data

💡IDEA: Suppose we have two tables and we’ll use these tables to generate a new table or view using DBT.

Let's prepare the two tables to be used by the DBT project:

1. Create external tables for mysql tutorial database. Which will be used to get some aggregation:


CREATE TABLE IF NOT EXISTS employees
USING org.apache.spark.sql.jdbc
OPTIONS (  
	url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",  
	dbtable "employees.employees",  
	driver 'com.mysql.cj.jdbc.Driver',  
	user 'tutorial_user',  
	password '9tVDVEKp'
);

CREATE TABLE IF NOT EXISTS salaries
USING org.apache.spark.sql.jdbc
OPTIONS (  
	url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",  
	dbtable "employees.salaries",  
  driver 'com.mysql.cj.jdbc.Driver',  
  user 'tutorial_user',  
  password '9tVDVEKp'
);
 
 

2. Check data from the SQL editor:

Our data is ready to use in DBT models to create new tables and views. See below to the DBT model section.

Build your first project

A project is needed to run our DBT activities. We can easily create our new project by following the commands below.

  1. Create a dbt-samples directory.

cd Documents/mkdir dbt-samplescd dbt-samples
 
 

2. Install dbt-iomete in virtual python environment.


virtualenv .envsource .env/bin/activatepip install dbt-iomete
 
 

3. Initiate the dbt_project project using the init command:


dbt init dbt_project
 
 

4. Show the new added profile configuration:


cat ~/.dbt/profiles.yml
 
 

5. Navigate into your project's directory:


cd dbt_project
 
 

6. Run the debug command from your project to confirm that you can successfully connect:


dbt debug
 
 
A successful dbt debug command

Build your first models

A model is a select statement. Models are defined in .sql files typically in your models directory.

  • Each .sql file contains one model / select statement.
  • The name of the file is used as the model name.

💡REMEMBER: We have two tables employees and salaries. And we want to drive a new table which is going to be join these tables.

  1. Open your project in your favorite code editor.
  2. Create a new SQL file in the models directory, named models/employee_salaries.sql
  3. Paste the following query into the models/employee_salaries.sql file.

SELECT e.emp_no, e.first_name, e.last_name, s.salaryFROM dbt_database.employees eJOIN dbt_database.salaries s ON e.emp_no = s.emp_no

Note: dbt_project’s default materialization is view and above model going to be create a employee_salaries view.

4. Run the below command from the project's home directory.

dbt run

A successful run with the dbt CLI

5. See the changes on iomete.

employee_salaries view


If you want to create table instead of view, you can change materialization type to table globally from dbt_profile.yml file in dbt_project’s home directory.

models:  dbt_project:    example:      +materialized: table  #<-- here

Or if you only want to make model-specific configuration, you must specify it in the first lines of the model.

{{ config(materialized='table') }}SELECT e.emp_no, e.first_name, e.last_name, s.salaryFROM dbt_database.employees eJOIN dbt_database.salaries s ON e.emp_no = s.emp_no

Enter the run command again to execute the new model.

dbt run

A successful run with the dbt CLI

See the changes on iomete:

employee_salaries table

Conclusion

We covered a quick intro to DBT, and worked through setting up our environment to get DBT connected to the iomete lakehouse.

If you have any questions or feedback related to DBT on the iomete lakehouse platform, please reach out to us via our intercom chat.

Related Articles