How to use Dataflows Gen2
Traditionally, data engineers spend significant time extracting, transforming, and loading data into a consumable format for downstream analytics. The goal of Dataflows Gen2 is to provide an easy, reusable way to perform ETL tasks using Power Query Online.
If you only choose to use a data pipeline, you copy data, then use your preferred coding language to extract, transform, and load the data. Alternatively, you can create a Dataflow Gen2 first to extract and transform the data. You can also load the data into a lakehouse, and other destinations. Now the business can easily consume the curated semantic model.
Adding a data destination to your dataflow is optional, and the dataflow preserves all transformation steps. To perform other tasks or load data to a different destination after transformation, create a data pipeline and add the Dataflow Gen2 activity to your orchestration.
Another option might be to use a data pipeline and Dataflow Gen2 for ELT (Extract, Load, Transform) process. For this order, you'd use a Pipeline to extract and load the data into your preferred destination, such as the lakehouse. Then you'd create a Dataflow Gen2 to connect to Lakehouse data to cleanse and transform data. In this case, you'd offer the Dataflow as a curated semantic model for data analysts to develop reports.
Dataflows can be horizontally partitioned as well. Once you create a global dataflow, data analysts can use dataflows to create specialized semantic models for specific needs.
Dataflows allow you to promote reusable ETL logic that prevents the need to create more connections to your data source. Dataflows offer a wide variety of transformations, and can be run manually, on a refresh schedule, or as part of a data pipeline orchestration.
Comments
Post a Comment