Power BI Archives - Tiger Analytics Fri, 19 Jul 2024 14:30:19 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://www.tigeranalytics.com/wp-content/uploads/2023/09/favicon-Tiger-Analytics_-150x150.png Power BI Archives - Tiger Analytics 32 32 A Comprehensive Guide to Pricing and Licensing on Microsoft Fabric https://www.tigeranalytics.com/perspectives/blog/a-comprehensive-guide-to-pricing-and-licensing-on-microsoft-fabric/ Mon, 01 Jul 2024 12:13:32 +0000 https://www.tigeranalytics.com/?post_type=blog&p=22659 This comprehensive guide explores Microsoft Fabric's pricing strategies, including capacity tiers, SKUs, and tenant hierarchy, helping organizations optimize their data management costs. It breaks down the differences between reserved and pay-as-you-go models, explaining Capacity Units (CUs) and providing detailed pricing information. By understanding these pricing intricacies, businesses can make informed decisions to fully leverage their data across various functions, leading to more efficient operations and better customer experiences.

The post A Comprehensive Guide to Pricing and Licensing on Microsoft Fabric appeared first on Tiger Analytics.

]]>
Organizations often face challenges in effectively leveraging data to streamline operations and enhance customer satisfaction. Siloed data, complexities associated with ingesting, processing, and storing data at scale, and limited collaboration across departments can hinder a company’s ability to make informed, data-driven decisions. This can result in missed opportunities, inefficiencies, and suboptimal customer experiences.

Here’s where Microsoft’s new SaaS platform “Microsoft Fabric” can give organizations a much-needed boost. By integrating data across various functions, including data science (DS), data engineering (DE), data analytics (DA), and business intelligence (BI), Microsoft Fabric enables companies to harness the full potential of their data. The goal is to enable seamless sharing of data across the organization while simplifying all the key functions of Data Engineering, Data Science, and Data Analytics to facilitate quicker and better-informed decision-making at scale.

For enterprises looking to utilize Microsoft Fabric’s full capabilities, understanding the platform’s pricing and licensing intricacies is crucial, impacting several key financial aspects of the organization:

1. Reserved vs Pay-as-you-go: Understanding pay-as-you-go versus reserved pricing helps in precise budgeting and can affect both initial and long-term operational costs.
2. Capacity Tiers: Clear knowledge of capacity tiers allows for predictable scaling of operations, facilitating smooth expansions without unexpected costs.
3. Fabric Tenant Hierarchy: It is important to understand the tenant hierarchy as this would have a bearing on the organization’s need to buy capacity based on their unique needs.
4. Existing Power BI Licenses: For customers having existing Power BI, it is important to understand how to utilize existing licenses (free/pro/premium) and how it ties in with Fabric SKU.

At Tiger Analytics, our team of seasoned SMEs have helped clients navigate the intricacies of licensing and pricing models for robust platforms like Microsoft Fabric based on their specific needs.

In this blog, we will provide insights into Microsoft Fabric’s pricing strategies to help organizations make more informed decisions when considering this platform.

Overview of Microsoft Fabric:

Microsoft Fabric offers a unified and simplified cloud SaaS platform designed around the following ‘Experiences’:

  • Data Ingestion – Data Factory
  • Data Engineering – Synapse DE
  • Data Science – Synapse DS
  • Data Warehousing – Synapse DW
  • Real-Time Analytics – Synapse RTA
  • Business Intelligence – Power BI
  • Unified storage – OneLake

A Simplified Pricing Structure

Unlike Azure, where each tool has separate pricing, Microsoft Fabric simplifies this by focusing on two primary cost factors:

1. Compute Capacity: A single compute capacity can support all functionalities concurrently, which can be shared across multiple projects and users without any limitations on the number of workspaces utilizing it. You do not need to select capacities individually for Data Factory, Synapse Data Warehousing, and other Fabric experiences.

2. Storage: Storage costs are separate yet simplified, making choices easier for the end customer.

Microsoft Fabric

Understanding Fabric’s Capacity Structure

To effectively navigate the pricing and licensing of Microsoft Fabric, it is crucial to understand how a Fabric Capacity is associated with Tenant and Workspaces. These three together help organize the resources within an Organization and help manage costs and operational efficiency.

1. Tenant: This represents the highest organizational level within Microsoft Fabric, and is associated with a single Microsoft Entra ID. An organization could also have multiple tenants.

2. Capacity: Under each tenant, there are one or more capacities. These represent pools of compute and storage resources that power the various Microsoft Fabric services. Capacities provide capabilities for workload execution. These are analogous to horsepower for car engines. The more you provision capacity, the more workloads can be run or can be run faster.

3. Workspace: Workspaces are environments where specific projects and workflows are executed. Workspaces are assigned a capacity, which represents the computing resources it can utilize. Multiple workspaces can share the resources of a single capacity, making it a flexible way to manage different projects or departmental needs without the necessity of allocating additional resources for each new project/ department.

Microsoft Fabric
Microsoft Fabric
The figure above portrays the Tenant hierarchy in Fabric and how different organizations can provision capacities based on their requirements.

Understanding Capacity Levels, SKUs, and Pricing in Microsoft Fabric

Microsoft Fabric capacities are defined by a Stock Keeping Unit (SKU) that corresponds to a specific amount of compute power, measured in Capacity Units (CUs). A CU is a unit that quantifies the amount of compute power available.

Capacity Units (CUs) = Compute Power

As shown in the table below, each SKU (Fn) is represented with a CU. E.g. F4 is double in capacity as compared to F2 but is half that of F8.

The breakdown below shows the SKUs available for the West Europe region, showing both Pay As You Go and Reserved (1-year) pricing options:

Microsoft Fabric

Comparative table showing Fabric SKUs, CUs, associated PBI SKU, Pay-as-you-Go and Reserved pricing for a region.
1 CU pay-as-you-price at West EU Region = $0.22/hour
1 CU PAYGO monthly rate calculation: $0.22*730 =$160.6, F2 =$160.6*2=$321.2
1 CU RI monthly rate calculation: Round ($0.22* (1-0.405)*730*12,0)/12=~$95.557…F2 RI = ~$95.557…*2=~$191.11

Pricing Models Explained:

Pay As You Go: This flexible model allows you to pay monthly based on the SKU you select, making it ideal if your workload demands are uncertain. You can purchase more capacity or even upgrade/downgrade your capacity. You further get an option to pause your capacities to save costs.

Reserved (1 year): In this option, you pay reserved prices monthly. The reservation is for 1 year. The prices of reserved can give you a savings of around 40%. It involves no option to pause and is billed monthly regardless of capacity usage.

Storage Costs in Microsoft Fabric (OneLake)

In Microsoft Fabric, compute capacity does not include data storage costs. This means that businesses need to budget separately for storage expenses.

  • Storage costs need to be paid for separately.
  • Storage costs in Fabric (OneLake) are similar to ADLS (Azure Data Lake Storage).
  • BCDR (Business continuity Disaster recovery) charges are also included. This comes into play when Workspaces are deleted but some data needs to be extracted from the same.
  • Beyond this, there are costs for cache storage (for KQL DB)
  • There are also costs for the transfer of data between regions – which is known as Bandwidth pricing. More details are in this link.

Optimizing Resource Use in Microsoft Fabric: Understanding Bursting and Smoothing Techniques

Despite purchasing a capacity, your workload may demand higher resources in between.

For this, Fabric allows two methods to help with faster execution (burst) while flattening the usage over time (smooth) to maintain optimal costs.

  • Bursting: Bursting enables the use of additional compute resources beyond your existing capacity to accelerate workload execution. For instance, if a task normally takes 60 seconds using 64 CUs, bursting can allocate 256 CUs to complete the same task in just 15 seconds.
  • Smoothing: Smoothing is applied automatically in Fabric across all capacities to manage brief spikes in resource usage. This method distributes the compute demand more evenly over time, which helps in avoiding extra costs that could occur with sudden increases in resource use.

Understanding Consumption: Where do your Computation Units (CUs) go?

Microsoft FabricImage credit: Microsoft

The following components in Fabric consume or utilize the CU (Capacity Units)

  • Data Factory Pipelines
  • Data Flow Gen2
  • Synapse Warehouse
  • Spark Compute
  • Event Stream
  • KQL Database
  • OneLake
  • Copilot
  • VNet Data Gateway
  • Data Activator (Reflex)
  • PowerBI

The CU consumption depends on the solution implemented for functionality. Here’s an example for better understanding:

Business Requirement: Ingest data from an on-prem data source and use it for Power BI reporting.

Solution Implemented: Data Factory pipelines with Notebooks to perform DQ checks on the ingested data. PowerBI reports were created pointing to the data in One Lake.

How are CU’s consumed:

CUs would be consumed every time the data factory pipeline executes and further invokes the Notebook (Spark Compute) to perform data quality checks.

Further, CU’s would get consumed whenever the data refreshes on the dashboard.

Microsoft Fabric Pricing Calculator:

Microsoft has streamlined the pricing calculation with its online calculator. By selecting your region, currency, and billing frequency (hourly or monthly), you can quickly view the pay-as-you-go rates for all SKUs. This gives you an immediate estimate of the monthly compute and storage costs for your chosen region. Additionally, links for reserved pricing and bandwidth charges are also available.

For more detailed and specific pricing analysis, Microsoft offers an advanced Fabric SKU Calculator tool through partner organizations.

Understanding Fabric Licensing: Types and Strategic Considerations

Licensing in Microsoft Fabric is essential because it legally permits and enables the use of its services within your organizational framework, ensuring compliance and tailored access to various functionalities. Licensing is distinct from pricing, as licensing outlines the terms and types of access granted, whereas pricing involves the costs associated with these licenses.

There are two types of licensing in Fabric:

  • Capacity-Based Licensing: This licensing model is required for operating Fabric’s services, where Capacity Units (CUs) define the extent of compute resources available to your organization. Different Stock Keeping Units (SKUs) are designed to accommodate varying workload demands, ranging from F2 to F2048. This flexibility allows businesses to scale their operations up or down based on their specific needs.
  • Per-User Licensing: User-based licensing was used in Power BI, and this has not changed in Fabric (for compatibility). The User accounts include:
    • Free
    • Pro
    • Premium Per User (PPU)

Each tailored to specific sets of capabilities as seen in the table below:

Microsoft Fabric
Image Credit: Microsoft (https://learn.microsoft.com/en-us/fabric/enterprise/licenses)

Understanding Licensing Scenarios

To optimally select the right Fabric licensing options and understand how they can be applied in real-world scenarios, it’s helpful to look at specific use cases within an organization. These scenarios highlight the practical benefits of choosing the right license type based on individual and organizational needs.

Scenario 1: When do you merely require a Power BI Pro License?

Consider the case of Sarah, a data analyst whose role involves creating and managing Power BI dashboards used organization-wide. These dashboards are critical for providing the leadership with the data needed to make informed decisions. In such a scenario, a Pro License is best because it allows Sarah to:

  • Create and manage Power BI dashboards within a dedicated workspace.
  • Set sharing permissions to control who can access the dashboards.
  • Enable colleagues to build their visualizations and reports from her Power BI datasets, fostering a collaborative work environment.

In the above scenario, a Pro license would suffice (based on the above-listed requirements.)

Scenario 2: What are the Licensing Options for Small Businesses?*

Consider a small business with about 60 users that wants to leverage premium Power BI features (pls. refer to the comparison table above which shows the capabilities for free, pro, and PPU (premium per user) to enhance its data analysis capabilities. The company has two primary licensing options within Microsoft Fabric to accommodate its needs, each with different cost implications and service access levels.

Option 1: Premium Per User (PPU) Licensing

  • This option involves purchasing a Premium Per User license for each of the 60 users.
  • Cost Calculation: 60 users x $20 per month = $1,200 per month.
  • Note: This option does not include any Fabric services or capacities; it only covers the Power BI Premium features.

Option 2: Combining F4 Capacity with Power BI Pro Licenses

  • Alternatively, the company can opt for a combination of an F4 Fabric capacity and 60 Power BI Pro licenses.
  • Cost Calculation: F4 capacity at $525 per month + (60 Power BI Pro licenses x $10 = $600) = $1,125 per month. Additional storage and other service costs may apply.
  • Benefits: This option is not only more cost-effective compared to Option 1, but it also provides access to broader Fabric services beyond just Power BI, enhancing the organization’s overall data management capabilities.

Option 2 offers a more economical and service-inclusive approach. Furthermore, it opens up opportunities to scale up using higher Fabric capacities with reserved (1-year) pricing for even greater efficiency and cost savings in the long run.

Microsoft Fabric
Table: Fabric SKU and Power BI SKUs for reference calculations and comparisons

Scenario 3: A Medium business organization is looking to implement analytics solutions using Fabric services and reporting using Power BI. They are also looking to share Power BI content for collaborative decision-making. What are the licensing options in Fabric?

Considerations:

1. Since the organization is looking to share Power BI content, you will need Power BI premium or equivalent Fabric capacities (F64 and above)
2. Microsoft is transitioning/enabling Power BI premium capacities to automatically be Fabric capacities – which brings more flexibility for organizations while keeping costs the same (when compared with PPU licenses)
3. It would be wise to start with F64 Pay-As-You-Go initially, check for performance and other factors such as bursting in the monthly bills, and finally decide on the final Fabric capacity with reserved pricing to avail up to 40% savings.

Scenario 4: An organization is looking to use Co-Pilot extensively across the platform. What Fabric capacity can they start with?

Considerations: A minimum of F64 SKU is required to be able to use Co-Pilot.

The table above provides a reference for understanding how different SKUs align with specific user needs and organizational goals, helping to further clarify the most effective licensing strategies for various roles within a company.

Key Considerations for selecting the right Fabric SKU and License

Now that we have seen some practical scenarios related to making licensing decisions, let us list out the key considerations for selecting the optimal Fabric SKU and license:

  • Organization Size & Usage Patterns:
    • A large organization with diverse data needs will likely require a higher-capacity SKU and more user licenses. Consider a mix of per-user and capacity licenses – analyze which teams work heavily in Fabric vs. those who are light consumers.
    • If your organization already uses Power BI extensively, or it’s central to your use of Fabric, having at least one Pro or PPU license is essential.
  • Workload Types and Frequency:
    • Batch vs. real-time processing: One-time bulk data migrations might benefit from short-term bursts, while continuous streaming needs consistent capacity.
    • Complexity of transformations: Resource-intensive tasks like complex data modeling, machine learning, or large-scale Spark jobs will consume more CUs than simple data movement.
    • Frequency of Power BI Use: Frequent dataset refreshes and report queries in Power BI significantly increase compute resource consumption.
    • Content Sharing/ CoPilot usage: To share the Power BI content freely across the organization or in order to use CoPilot, you must be on a minimum F64 or higher SKUs.
  • Operational Time:
    • Pay-as-you-go v/s Reserved (1-year) pricing: Reserved capacity locks in a price for consistent usage, while pay-as-you-go is better for sporadic workloads. The Reserved licensing provides roughly about 40% savings over the Pay-as-you-Go.
    • Pausing: You can pause your existing pay-as-you-go license when the capacity is not in use, resulting in cost savings.
    • Development vs. production: Dev environments can often use lower tiers or be paused when idle to reduce costs.
  • Region:
    • Costs vary by Azure region. Align your Fabric deployment with your primary user location to minimize data egress charges.
  • Power BI Premium: While Power BI licenses have not changed in Fabric, it is important to consider that the Power BI premium license would be merged with Fabric (F) licenses. The Free and Pro licenses would not be impacted.
  • Mixed Use: You may need to consider purchasing both Fabric (capacity) and Power BI licenses for sharing content across the organization.

How to Bring These Factors into Your Planning

Before beginning the Fabric deployment, consider these steps to ensure you choose the right SKU and licensing options:

  • Start with Baselining: Before scaling up, run pilot workloads to understand your capacity unit (CU) consumption patterns. This helps in accurately predicting resource needs and avoiding unexpected costs.
  • Estimate Growth: Project future data volumes, user counts, and evolving analytics needs. This foresight ensures that your chosen capacity can handle future demands without frequent upgrades.
  • Right-size, Don’t Overprovision: Initially, select an SKU that slightly exceeds your current needs. Microsoft Fabric’s flexibility allows you to scale up as necessary, preventing unnecessary spending on excess capacity.
  • Regularly Monitor Usage: Utilize the Capacity Metrics App to track resource usage and identify trends. This ongoing monitoring allows for timely adjustments and optimization of your resource allocation, ensuring cost-effectiveness.

Power BI Capacity Metrics App: Your Cost Control Center in Fabric

The Power BI Capacity Metrics App is an essential tool for understanding how different Microsoft Fabric components consume resources. It provides

  • Detailed reports and visualizations on the usage of computing and storage.
  • Empowers you to identify cost trends, potential overages, and optimization opportunities.
  • Helps you to stay within your budget.

Microsoft Fabric

Microsoft Fabric has streamlined licensing and pricing options, offering significant benefits at both capacity and storage levels:

Capacity Benefits
Microsoft Fabric
Image credits: Microsoft

Storage Benefits
Microsoft Fabric

In this blog, we’ve explored the intricacies of Microsoft Fabric’s pricing and licensing, along with practical considerations for making informed purchase decisions. If you want to integrate Fabric into your business, you can purchase the capacities and licenses from Azure Portal or reach out to us in case you need to discuss your use case.

The post A Comprehensive Guide to Pricing and Licensing on Microsoft Fabric appeared first on Tiger Analytics.

]]>
Advanced Data Strategies in Power BI: A Guide to Integrating Custom Partitions with Incremental Refresh https://www.tigeranalytics.com/perspectives/blog/advanced-data-strategies-in-power-bi-a-guide-to-integrating-custom-partitions-with-incremental-refresh/ Fri, 03 May 2024 05:35:55 +0000 https://www.tigeranalytics.com/?post_type=blog&p=21623 Explore advanced data management strategies in Power BI through a detailed examination of integrating Custom Partitions with Incremental Refresh to efficiently handle large datasets. Key benefits such as improved query performance, more efficient data refresh, and better data organization are outlined, along with a practical guide on implementing these strategies in Power BI environments.

The post Advanced Data Strategies in Power BI: A Guide to Integrating Custom Partitions with Incremental Refresh appeared first on Tiger Analytics.

]]>
D, a data engineer with a knack for solving complex problems, recently faced a challenging task. A client needed a smart way to manage their data in Power BI, especially after acquiring new companies. This meant separating newly acquired third-party data from their existing internal data, while also ensuring that historical data remained intact and accessible. The challenge? This process involved refreshing large data sets, sometimes as many as 25 million rows for a single year, just to incorporate a few thousand new entries. This task was not just time-consuming but would also put a strain on computational resources.

At first glance, Power BI’s Custom Partitions seemed like a promising solution. It would allow D to organize data neatly, separating third-party data from internal data as the client wanted. However, Power BI typically partitions data by date, not by the source or type of data, which made combining Custom Partitions with Incremental Refresh—a method that updates only recent changes rather than the entire dataset—a bit of a puzzle.

Limitations of Custom Partition and Incremental Refresh in Power BI

Custom Partitions offer the advantage of dividing the table into different parts based on the conditions defined, enabling selective loading of partitions during refreshes. However, Power BI’s built-in Incremental Refresh feature, while automated and convenient, has limitations in terms of customization. It primarily works on date columns, making it challenging to partition the table based on non-date columns like ‘business region’.

Partition

Incremental Refresh Pros:

  • Creation of partitions is automated, and the updation of partitions based on date is also automated, no manual intervention is needed.

Incremental Refresh Cons:

  • Cannot have two separate logics defined for partition of data based on flag column.
  • Cannot support the movement of data using the Power BI Pipeline feature.

Custom Partitions Pros:

  • Can create partitions of our own logical partitions.
  • Can support the movement of data using the Power BI Pipeline Feature.

Custom Partitions Cons:

  • All the processes should be done manually.

To tackle these challenges, D came up with another solution. By using custom C# scripts and Azure Functions, D found a way to integrate Custom Partitions with an Incremental Refresh in the Power BI model. This solution not only allowed for efficient management of third-party and internal data but also streamlined the refresh process. Additionally, D utilized Azure Data Factory to automate the refresh process based on specific policies, ensuring that data remained up-to-date without unnecessary manual effort.

This is how we at Tiger Analytics, solved our client’s problem and separated third-party data. In this blog, we’ll explore the benefits of combining Custom Partitions with Incremental Refresh. Based on our experiences, how this combination can enhance data management in Power BI and provide a more efficient and streamlined approach to data processing.

Benefits of combining Incremental Refresh with Custom Partitions in Power BI

Merging the capabilities of Incremental Refresh with Custom Partitions in Power BI offers a powerful solution to overcome the inherent limitations of each approach individually. This fusion enables businesses to fine-tune their data management processes, ensuring more efficient use of resources and a tailored fit to their specific data scenarios.

Leveraging tools like Azure Function Apps, the Table Object Model (TOM) library, and Power BI’s XMLA endpoints, automating the creation and management of Custom Partitions becomes feasible. This automation grants the flexibility to design data partitions that meet precise business needs while enjoying the streamlined management and automated updates provided by Power BI.

Fact Sale

Optimizing Query Performance:

  • Custom Partitions improve query performance by dividing data into logical segments based on specific criteria, such as a flag column.
  • When combined with an Incremental Refresh, only the partitioned data that has been modified or updated needs to be processed during queries.
  • This combined approach reduces the amount of data accessed, leading to faster query response times and improved overall performance.

Efficient Data Refresh:

  • Incremental Refresh allows Power BI to refresh only the recently modified or added data, reducing the time and resources required for data refreshes.
  • When paired with Custom Partitions, the refresh process can be targeted to specific partitions, rather than refreshing the entire dataset.
  • This targeted approach ensures that only the necessary partitions are refreshed, minimizing processing time and optimizing resource utilization.

Enhanced Data Organization and Analysis:

  • Custom Partitions provide a logical division of data, improving data organization and making it easier to navigate and analyze within the data model.
  • With Incremental Refresh, analysts can focus on the most recent data changes, allowing for more accurate and up-to-date analysis.
  • The combination of Custom Partitions and Incremental Refresh enables more efficient data exploration and enhances the overall data analysis process.

Scalability for Large Datasets:

  • Large datasets can benefit significantly from combining Custom Partitions and Incremental Refresh.
  • Custom Partitions allow for efficient querying of specific data segments, reducing the strain on system resources when dealing with large volumes of data.
  • Incremental Refresh enables faster and more manageable updates to large datasets by focusing on the incremental changes, rather than refreshing the entire dataset.

Implementation Considerations:

  • Combining Custom Partitions and Incremental Refresh may require a workaround, such as using calculated tables and parameters.
  • Careful planning is necessary to establish relationships between the partition table, data tables, and Incremental Refresh configuration.
  • Proper documentation and communication of the combined approach are essential to ensure understanding and maintainability of the solution.

How to implement Incremental Refresh and Custom Partitions: A step-by-step guide

Prerequisites:

Power BI Premium Capacity or PPU License: The use of XMLA endpoints, which are necessary for managing Custom Partitions, is limited to Power BI Premium capacities. Alternatively, you can utilize Power BI premium-per-user (PPU) licensing to access these capabilities.
PPU: https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-per-user-faq
Xmla Reference: https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools

Dataset Published to Premium Workspace: The dataset for implementing Custom Partitions and Incremental Refresh should be published to a Power BI Premium workspace.

Permissions for Azure Functions and Power BI Admin Portal: To automate the creation and management of Custom Partitions, you need the appropriate permissions. This includes the ability to create and manage Azure Functions and the necessary rights to modify settings in Power BI’s Admin portal.

  • In the Function App, Navigate to Settings -> Identity and Turn On the system assigned.
  • Next, create a security group in Azure and add the function App as a member.
  • Go to Power BI, navigate to the Admin portal, and add the security group to the Admin API setting that allows service principles to use Power BI APIs.
  • Go to Workspace, Go for access, and Add the function as a member to the Workspace.

Check Incremental Refresh Policy: The Incremental Refresh policy needs to be false to create partitions on the table (through code).

Refresh Policy

Fulfilling these prerequisites will enable effective utilization of Custom Partitions and Incremental Refresh in Power BI.

Implementation at a glance:

Create an Azure Function with .NET as the Runtime Stack: Begin by adding the necessary DLL files for Power BI model creation and modification to the Azure Function console.

Connect to the Power BI Server Using C# Code: Establish a connection by passing the required connection parameters, such as the connection string and the table name where partitions need to be implemented. (C# code and additional details are available in the GitHub link provided in the note section).

Develop Code for Creating Partitions: Utilize the inbuilt functions from the imported DLL files to create partitions within the Power BI server.

Implement Code for Multiple Partitions: Use a combination of for-loop and if-conditions to develop code capable of handling multiple partitions.

There are two types of data partitions to consider based on the Flag value:

  • Flag Value ‘Y’ will be stored in a single partition, referred to as the ABC Partition.
  • Flag Value ‘N’ will be partitioned based on the date column, adhering to the incremental logic implemented. (Examples of partition naming include 2020, 2021, 2022, 202301, 202302, 202303, etc., up to 202312, 202401, 202402).

Check and Create ABC Partition if it does not exist: The initial step in the logic involves verifying the existence of the ABC Partition. If it does not exist, the system should create it.

Implement Logic Within the Looping Block:

  • The first action is to verify the existence of yearly partitions for the last three years. If any are missing, they should be created.
  • Next, combine all partitions from the previous year into a single-year partition.
  • Subsequently, create new partitions for the upcoming year until April.
  • Any partitions outside the required date range should be deleted.

Automate Partition Refresh with a Pipeline: Establish a pipeline designed to trigger the Azure function on the 1st of April of every year, aligning with the business logic.

Partition Logical flow:

Partition Logical flow

Step-by-Step implementation:

  • From the home dashboard, search for and select the Function App service. Enter all necessary details, review the configuration, and click ‘Create’.

    Create Function App

  • Configure the function’s runtime settings.

    Create Function App

  • Check the Required dll’s

    Required dll's

  • Navigate to Development Tools -> Advanced Tools -> Go

    Advanced Tools
    Kudu Plus

  • Give CMD -> site -> wwwroot -> new function name -> bin and paste all dll’s

    dll's

  • The primary coding work, including the creation of partitions, is done in the run.csx file. This is where you’ll write the C# code.

    pbi-dataset-parttition

  • The Partitions should be as below:

    Fact Scale
    pbi-dataset-parttition
    Input body to the function:

    {
        "connectionstring":"Connection details",
        "datasource": "datasource name",
        "workspace": "workspace name",
        "dataset": "dataset name",
        "table": "Table name",
        "partition": "Y",
        "sourceschema": "Schema name",
        "sourceobject": "Source object Name table or view name",
        "partitionstatement": "Year",
        "history": "2"
    }
    

Refresh the selected partition using Azure Pipeline:

Azure Pipeline

  • Create Azure Pipeline, which uses web activity to call the Rest-API refresh method in the Power BI model.
  • The first step for using the Pipeline is to have the APP registered with Power Bi workspace and Model access.
  • Then, with the APP, get the AAD Token for authentication.
  • With the AAD Token, use the In-Built refresh POST methods in Rest-API for refreshing the required table and partition.
  • To make the Pipeline wait till the refresh is complete, use the In-Built refreshes GET methods in Rest-API. Implementing GET methods within the pipeline to monitor the refresh status, ensures the process completes successfully.
  • The Pipeline is built in a modular way, where workspaceID and DatasetID and Table name and partition name are passed.
  • The pipeline can call any model refresh until the API used in the Pipeline has access to the Model and Workspace.
    Pipeline Flow

    What does each activity in the pipeline mean:
    • Get Secret from AKV: This block of pipeline accepts the key vault URL and the secret name which has a secret for an app used to access Power BI. The output of this block is a secret value.
    • Get AAD Token: This block accepts the input of tenant id, app id, and output of Get Secret from AKV which gives an output as a token through which enables access to the Power BI model.
    • Get Dataset Refresh: This block accepts the input of workspace id, dataset id, body, and then token which we get from the 2nd block then this block triggers the refresh of the corresponding table and partitions that are passed through the body for the model. This block will follow the post method.

    Until Refresh Complete:

    • Wait: To ensure the refresh completes, this block checks every 10 seconds.
    • Get Dataset: This involves inputting the workspace ID, dataset ID, and request body, following the GET method. The output is a list of ongoing refreshes on the model.
    • Set Dataset: Assigning the output of the previous block to a variable
      This block will run till the variable is not equal to unknown.
    • If Condition: This step checks if the refresh process has failed. If so, the pipeline’s execution is considered unsuccessful.

Refresh the selected partition using Azure Function:

  • Please follow the same steps as above from 1- 6 to create the Azure function for refresh.
  • In the code+test pane add the c# code shared in the github.

Model Table Refresh

Input body to the function:

{
    "workspace": "Workspace Name",
    "dataset": "Semantic Model Name",
    "tables": [
        {
            "name": "Table1RefreshallPartitions",
            "refreshPartitions": false
        },
        {
            "name": "Table2Refreshselectedpartitions",
            "refreshPartitions": true,
            "partitions": [
                "202402",
                "202403",
                "202404"
            ]
        }
    ]
}

Both Incremental Refresh and Custom Partitions in Power BI are essential for efficiently managing data susceptible to change within a large fact table. They allow you to optimize resource utilization, reduce unnecessary processing, and maintain control over partition design to align with your business needs. By combining these features, you can overcome the limitations of each approach and ensure a streamlined and effective data management solution.

References:

https://www.tackytech.blog/how-to-automate-the-management-of-custom-partitions-for-power-bi-datasets/

Note: Access the following GitHub link for Azure Function code, body which we pass for function and pipeline JSON files. Copy the JSON file inside the pipeline folder and paste that in adf pipeline by renaming the pipeline name as mentioned in the file, you will get the pipeline.

https://github.com/TirumalaBabu2000/Incremental_Refresh_and_Custom_Partition_Pipeline.git

The post Advanced Data Strategies in Power BI: A Guide to Integrating Custom Partitions with Incremental Refresh appeared first on Tiger Analytics.

]]>
Why Self-Service BI is Important in an Agile Business Scenario https://www.tigeranalytics.com/perspectives/blog/why-self-service-bi-is-important-in-an-agile-business-scenario-3/ Thu, 02 Feb 2023 18:33:25 +0000 https://www.tigeranalytics.com/?p=10826 Self-service BI empowers business users to independently query, visualize, and analyze data, fostering a data-driven culture through accessible learning resources. Read how organizations can harness it towards informed decision-making and collaboration among multiple stakeholders, ultimately enhancing communication channels and data utilization.

The post Why Self-Service BI is Important in an Agile Business Scenario appeared first on Tiger Analytics.

]]>
In organizations where data-driven solutions are valued, Data democratization is becoming a necessity.

Data virtualization and Data federation software act as enablers of Data democratization by eliminating an organization’s data silos and making the data accessible through virtual storage mediums.

Access to data at the right time and in the right manner is crucial for making data-driven decisions. However, classifying the data and granting relevant access to varied users has always been a challenge.

With traditional BI implementations, the responsibility for report development rests primarily on IT teams. As the number of stakeholders and the demand for accessible data increase, Self-Service BI can be regarded as a form of Data Democratization, equipping more business users to work on their own through monitored access profiles and data accessibility, easing the burden off single teams.

Self-Service BI enables business users to access and explore data by running their own queries, creating their own data visualizations, dashboards and reports, filtering, sorting, and analyzing data. The availability of online learning materials, self-paced learning, and access to resources has made it possible for users to feel comfortable using different data touchpoints and dashboards to make insightful decisions. Data democratization has ushered in a data-driven culture which means that access to data can now be shared by multiple stakeholders. All in all, we’re seeing improved communication channels and better collaboration.

Setting up a Self-Service BI

At Tiger Analytics, we partnered with a large-scale silicon component manufacturing company based out of the US, to implement a Self-Service BI solution.

The Higher-Order curated datasets that were built for self-service enabled tech-savvy business users to conduct ad-hoc analyses. This saved a lot of the time and effort that it would have taken them to build a traditional BI report/Dashboard. The users did not have to wait in a queue for their respective requirements, which also meant that they had access to the information they needed much earlier, enabling them to deliver quicker analysis and helping them generate reports faster.

The key advantage of this self-service analysis was that the business user could conduct ad-hoc analyses from time to time, focusing on their high-value priorities, and was now able to get faster results, on the go, rather than going through an entire report development life cycle.

self service enablement

Building a modern data platform that can handle data sources of variety and volume and that can support scalability and concurrency to manage business dynamics is of utmost importance.

While implementing a Self-Service BI within an organization, here are a few of our best practices that you can replicate:

  • Provide proper business names for tables, columns, and measures
  • Create a report catalog page for the users to find the reports and datasets
  • Add a data dictionary page to include definitions of data elements in the report
  • Build templates to create consistent report layouts by users
  • Display only the relevant tables – Hide other config or supporting tables
  • Build proper relationships between the tables since users can simply drag and drop fields across different tables.
  • Add the description to tables and fields for better understanding
  • Add required synonyms for the fields; if users use Q&A, then it will help them
  • Establish a governance committee to enable self-service for required end users
  • Create end-user training modules for effective use of self-service
  • Self-service should be limited to only a specific set of users
  • IT needs to monitor the self-service usage to avoid concurrency and performance issues
  • Make tools available to the end users for self-service enablement
  • Restrict publishing of ad-hoc reports in common workspaces
  • IT to ensure infrastructure is scalable and efficient for Self-service and Reporting needs

We’ve extracted and created shared datasets, created semantic layers on top of the data layer, and defined key data quality metrics, data management, access, and usage policies. The diagram below depicts various stages of the Self-Service BI life cycle.

self service bi lifecycle

The requirement gathering is done at the ‘line of business’ level, unlike traditional BI, where it’s at the ‘report’ level. This enables multiple user personas to use the same shared datasets.

Self-Service User hierarchy

Once we have the baseline requirements, it is imperative to group users into multiple user personas based on their skills and requirements. This will help in creating different roles and defining access for each group.

While working with our client, here’s how we segregated the users into four user personas.

1. Power User

The Power user is a technical user. They have access to the base tables in the database. Power users can create their own views by applying the filters on the tables/views in the database, and they know how to combine data from external files with the tables and create reports.

2. Ad-hoc query user

The Ad-hoc query user knows how to use Power BI. They can connect to curated datasets and create custom visualizations. They are also capable of creating custom calculations in Power BI and have the provision for sharing the report within their line of business.

3. Excel analyst

The Excel Analyst can connect to shared datasets in excel, create their own custom calculations in excel, and create pivot tables and charts in Excel.

4. End User

The End User has access to the Reports and Dashboards, can slice and dice data, filter data, and share bookmarks within the team.

self service user hierarchy

Self-Service User journey and process flow

Here’s how we’ve mapped the available features and the users, as shown below:

self service bi table

Once the users are defined, and the datasets are ready, a process flow needs to be defined to document the data access flow for the user. A report catalog page is created to organize all the available reports in various headers.

  • A data dictionary page is created in each report for the users to comprehend the data elements in the report.
  • User training sessions are set up for the Business users to train on report usability.
  • User is assigned a Developer License on the access request.
  • Users are allowed to review the catalog of shared reports and datasets.
  • If the required dataset is available, the user can create a new report on top of the dataset and publish to the department portal and share it with their respective department.
  • If the required dataset is not available, the user requests for the development and deployment of new datasets/reports which can be used by them.
  • The user can either share a report at the department level or can create a report to be used at the organizational level and share it in the Enterprise Report portal.

There can be follow-ups to review the progress and share tips and tricks, BI best practices, and templates.

self service bi best practices

One of the most common challenges we’ve seen with Self-Service BI is the lack of adoption by business users. The users might have difficulty in understanding how to use a report or a shared dataset available to them or might create reports with inaccurate analytical results. To ease these issues, it’s a good practice to institute Self-Service Office Hours as a forum for self-service users where the BI team can help users to understand what data is available to them and how they can access It.

The members of the BI team could make themselves available to help or support business users on an ongoing basis and are available for centralized monitoring.

With this, users get their questions answered, and that helps bridge the data literacy gap. This effort also enables collaboration among different teams within the organization. The users can then hop in and hop out of the session as required.

The Road ahead…

Regardless of the size of the organization, data availability is not enough. That data needs to be accessible so that the leadership can use that information to derive useful insights and craft meaningful strategic interventions.

Self-Service BI implementation empowers employees by giving them access to data. And even with our clients, this has considerably reduced the cost of report development, fast-tracked data-driven decisions, and improved collaboration within the organization.

As organizations and their needs continue to evolve, so does their self-service journey, making data-driven insights the new normal.

Read our other articles on BI.

The post Why Self-Service BI is Important in an Agile Business Scenario appeared first on Tiger Analytics.

]]>
Achieving IPL Fantasy Glory with Data-Backed Insights and Strategies https://www.tigeranalytics.com/perspectives/blog/ipl-fantasy-leaderboard-data-analysis/ Tue, 21 Sep 2021 16:22:35 +0000 https://www.tigeranalytics.com/?p=5742 A cricket enthusiast shares insights on building a winning IPL fantasy team. From data analysis tools such as Kaggle and Howstat to tips on player selection, venue analysis, and strategic gameplay, this guide emphasizes the role of statistics in making informed choices, ultimately highlighting the unpredictability of the sport.

The post Achieving IPL Fantasy Glory with Data-Backed Insights and Strategies appeared first on Tiger Analytics.

]]>
When you’re a die-hard cricket fan who watches almost every game, choosing the right players in IPL fantasy may seem as easy as knowing exactly what dish to order in a restaurant that has a massive menu. However, this wasn’t true in my case. In fact, this blog isn’t about me giving gyaan on the subject, but rather a reflection of my trials and errors over the last few years in putting together a team that could make a mark on the leaderboard – even if the slightest.

Of late though, I have been making conscious efforts to better my game, and seem to be doing fairly well now. This, however, was no easy task. My path became clearer and my efforts more fruitful when I was able to take data analytics along with me into this fantasy world.

So, from one enthusiast to the other, here are my two cents on what can help you create the right team based on numbers, the power of observation, and a little bit of luck.

Consistency is Key

The first and foremost point to keep in mind is finding consistent performers, and there are some tools that can help you determine who is on top of their game that season. Here are some of my obvious picks:

Suryakumar Yadav: My top pick will always be Yadav of the Mumbai Indians. In 2020, he collected a total of 400 runs in just 16 matches. He was consistent in 2018 and 2019 as well by amassing 400+ runs. Yadav made a name for himself from the very beginning of his cricket career, which can be further proved by his consistent performance for Team India as well.

Bhuvaneshwar Kumar: Kumar is a sure-shot player, and, as Michael Vaughn pointed out, he is also possibly the “smartest player” his generation has to offer. Even in ODI and T20 matches in the pre-IPL era, he was always able to out-smart his opponents. From the 151 matches he has played in the IPL, he has maintained an average of 24, and one can always expect wickets from him. Economically too, Kumar is a man to watch out for, as he is in the top five.

David Warner: No matter how the Sunrisers Hyderabad perform, Warner remains consistent, and has managed to remain the highest run-scorer for his team.

KL Rahul: Rahul, with an average of 44, is a consistent player who is also great at stitching together partnerships.

Playing by Numbers

After some years of bungling up, I realized that variables such as economy, average, past performance, etc. can be best understood using data analysis. I have found the platforms Kaggle and Howstat to be useful resources.

Kaggle, which serves as a home for a lot of challenging data science competitions, has an IPL dataset that has ball-by-ball details of all the IPL matches till 2020. That’s 12 years of granular data you can aggregate to get the metrics of your choice. You can get the dataset at this link: https://www.kaggle.com/patrickb1912/ipl-complete-dataset-20082020.

Howstat, on the other hand, has the most frequently used metrics laid out beautifully. Thanks to the folks with whom I play cricket, I came to know about this wonderful website.

Let’s talk about venues specifically, which, as you may know by now, play a critical role in the IPL fantasy world too, and can directly impact the kind of players you pick. In the early days (pre-2010), when Dada would walk onto the pitch at Eden Gardens, the crowd would roar because it was given that he would put up a good show on his home turf. But why take the risk and rely on gut and cheers where numbers can lead you to assured results? Especially when the stakes are so high and the competition so fierce.

Here is where I would look to data analysis to help me make a more informed decision. For example, if one were to refer to KL Rahul’s scores on Howstat, you can see that despite having played the most matches at Chinnaswamy Stadium (15 in total), Rahul’s average at theIS Bindra Stadium is much better. His average is 49.78 at IS Bindra, while at Chinnaswamy Stadium, his average is 38.22.

(Pro-tip: One player who comes to mind not just for his batting skills but also for his ability to perform well across pitches is Shikhar Dhawan. I would definitely include him in my team. He also secures tons of catches which adds to the points tally).

Now some of you may not have the time to sort through the many IPL datasets available on a platform such as Kaggle, which is understandable as even the best of us can be intimidated by numbers. One tip I have for you folk is to merely look at what the numbers point to on your app of choice. By looking at the percentage of people choosing a particular player on the Dream11 app, for example, you can understand which players are on top of their game that season.

This is best determined somewhere around the middle of the season, after around five-six matches, as that is when you will know who is at his peak and whom you can skip from your team.

The Non-Data Way

If you are struggling to make your way through all the numbers, I have some non-statistical tips too, which I learned to include in my game only after my many trials and tribulations.

1. It’s human nature to compare ourselves to others – you know how it goes, the grass is always greener and all that jazz. This leads to mimicry, and while at times it helps to follow in the footsteps of those you aspire to be (on the leaderboard in this case), unfortunately, in 20-20 fantasy, this doesn’t work. The best route is to work out your own strategy and make your own mistakes.

2. Make sure to use boosts and player-transfer options wisely in the initial stages. It’s only normal to want a good leaderboard score while starting out, but this could lead you to exhaust your transfer list very early on, leaving you with the same players through the season. This can also significantly bring down your score. Using sufficient transfers and boosts towards the business end of things (post 20 matches or so) can go a long way.

3. Using the uncapped player-transfer option is also worth exploring. This can reveal a whole range of players and talent from different countries, who haven’t played for Team India, but who are extremely skilled.

4. Coming to all-rounders – my tip would be to have three in your team. This is especially important while selecting your captain and vice-captain. For example, Chris Woakes is one all-rounder who has worked well for me this season before he left.

Use your gut, use your mind

What I can say for certain through this blog, is that nothing is certain in IPL fantasy cricket. Yet, while this may seem like the most unsatisfactory take-away, I can vouch for one thing – data analysis can definitely change your game for the better.

Of course, certain factors are out of our control. Injuries, fouls, poor weather, etc. are an inevitable part of any sport and could significantly change the outcome of a game. But if one dataset or one number-crunch can change how you view a match and give you better insight, wouldn’t that be something worth exploring? In Dhoni’s own words, ‘Bas dimaag laga ke khel’!

The post Achieving IPL Fantasy Glory with Data-Backed Insights and Strategies appeared first on Tiger Analytics.

]]>