Spark pools in Microsoft Fabric

 Microsoft Fabric provides a starter pool in each workspace, enabling Spark jobs to be started and run quickly with minimal setup and configuration. You can configure the starter pool to optimize the nodes it contains in accordance with your specific workload needs or cost constraints.

Additionally, you can create custom Spark pools with specific node configurations that support your particular data processing needs.

You can manage settings for the starter pool and create new Spark pools in the Admin portal section of the workspace settings, under Capacity settings, then Data Engineering/Science Settings.

Specific configuration settings for Spark pools include:

  • Node Family: The type of virtual machines used for the Spark cluster nodes. In most cases, memory optimized nodes provide optimal performance.
  • Autoscale: Whether or not to automatically provision nodes as needed, and if so, the initial and maximum number of nodes to be allocated to the pool.
  • Dynamic allocation: Whether or not to dynamically allocate executor processes on the worker nodes based on data volumes.

If you create one or more custom Spark pools in a workspace, you can set one of them (or the starter pool) as the default pool to be used if a specific pool is not specified for a given Spark job.


mobile app development

Comments

Popular posts from this blog

Azure built-in roles for tables

Explore Dataflows Gen2 in Microsoft Fabric

Select and configure an appropriate method for access to Azure Blobs