Ingest data into a lakehouse

Ingesting data into your lakehouse is the first step in your ETL process. Use any of the following methods to bring data into your lakehouse.

  • Upload: Upload local files.
  • Dataflows Gen2: Import and transform data using Power Query.
  • Notebooks: Use Apache Spark to ingest, transform, and load data.
  • Data Factory pipelines: Use the Copy data activity.

This data can then be loaded directly into files or tables. Consider your data loading pattern when ingesting data to determine if you should load all raw data as files before processing or use staging tables.

Spark job definitions can also be used to submit batch/streaming jobs to Spark clusters. By uploading the binary files from the compilation output of different languages (for example, .jar from Java), you can apply different transformation logic to the data hosted on a lakehouse. Besides the binary file, you can further customize the behavior of the job by uploading more libraries and command line arguments.


java programming training courses malaysia

Comments

Popular posts from this blog

Azure built-in roles for tables

Cisco Certification Training Courses Malaysia

IOT Internet of Things Training Courses Malaysia