Data Security Archives - Tiger Analytics Thu, 07 Mar 2024 09:18:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://www.tigeranalytics.com/wp-content/uploads/2023/09/favicon-Tiger-Analytics_-150x150.png Data Security Archives - Tiger Analytics 32 32 Invisible Threats, Visible Solutions: Integrating AWS Macie and Tiger Data Fabric for Ultimate Security https://www.tigeranalytics.com/perspectives/blog/invisible-threats-visible-solutions-integrating-aws-macie-and-tiger-data-fabric-for-ultimate-security/ Thu, 07 Mar 2024 07:03:07 +0000 https://www.tigeranalytics.com/?post_type=blog&p=20726 Data defenses are now fortified against potential breaches with the Tiger Data Fabric-AWS Macie integration, automating sensitive data discovery, evaluation, and protection in the data pipeline for enhanced security. Explore how to integrate AWS Macie into a data fabric.

The post Invisible Threats, Visible Solutions: Integrating AWS Macie and Tiger Data Fabric for Ultimate Security appeared first on Tiger Analytics.

]]>
Discovering and handling sensitive data in the data lake or analytics environment can be challenging. It involves overcoming technical complexities in data processing and dealing with the associated costs of resources and computing.  Identifying sensitive information at the entry point of the data pipeline, probably during data ingestion, can help overcome these challenges to some extent. This proactive approach allows organizations to fortify their defenses against potential breaches and unauthorized access.

According to AWS, Amazon Macie is “a data security service that uses machine learning (ML) and pattern matching to discover and help protect sensitive data”, such as personally identifiable information (PII), payment card data, and Amazon Web Services . At Tiger Analytics we’ve integrated these features into our pipelines within our proprietary Data Fabric solution called Tiger Data Fabric.

The Tiger Data Fabric is a self-service, low/no-code data management platform that facilitates seamless data integration, efficient data ingestion, robust data quality checks, data standardization, and effective data provisioning. Its user-centric, UI-driven approach demystifies data handling, enabling professionals with diverse technical proficiencies to interact with and manage their data resources effortlessly.

Leveraging Salient Features for Enhanced Security

The Tiger Data Fabric-AWS Macie integration offers a robust solution to enhance data security measures, including:

  • Data Discovery: The solution, with the help of Macie, discovers and locates sensitive data within the active data pipeline.
  • Data Protection: The design pattern isolates the sensitive data in a secure location with restricted access.
  • Customized Actions: The solution gives flexibility to design (customize) the actions to be taken when sensitive data is identified. For instance, the discovered sensitive data can be encrypted, redacted, pseudonymized, or even dropped from the pipeline with necessary approvals from the data owners.
  • Alerts and Notification: Data owners receive alerts when any sensitive data is detected, allowing them to take the required actions in response.

Tiger Data Fabric has many data engineering capabilities and has been enhanced recently to include sensitive data scans at the data ingestion step of the pipeline. Source data present on the S3 landing zone path is scanned for sensitive information and results are captured and stored at another path in the S3 bucket.

By integrating AWS Macie with the Tiger Data Fabric, we’re able to:

  • Automate the discovery of sensitive data.
  • Discover a variety of sensitive data types.
  • Evaluate and monitor data for security and access control.
  • Review and analyze findings.

For data engineers looking to integrate “sensitive data management” into their data pipelines , here’s a walkthrough of how we, at Tiger Analytics, implement these components for maximum value:

  • S3 Buckets store data in various stages of processing. A raw databucket for uploading objects for the data pipeline, a scanning bucket where objects are scanned for sensitive data, a manual review bucket which harbors objects where sensitive data was discovered, and a scanned data bucket for starting the next ingestion step of the data pipeline.
  • Lambda and Step Functions execute the critical tasks of running sensitive data scans and managing workflows. Step Functions coordinate Lambda functions to manage business logic and execute the steps mentioned below:
    • triggerMacieJob: This Lambda function creates a Macie-sensitive data discovery job on the designated S3 bucket during the scan stage..
    • pollWait: This Step Function waits for a specific state to be reached, ensuring the job runs smoothly.
    • checkJobStatus: This Lambda function checks the status of the Macie scan job.
    • isJobComplete: This Step function uses a Choice state to determine if the job has finished. If it has, it triggers additional steps to be executed.
    • waitForJobToComplete: This Step function employs a Choice state to wait for the job to complete and prevent the next action from running before the scan is finished.
    • UpdateCatalog: This Lambda function updates the catalog table in the backend Data Fabric database, and ensures that all job statuses are accurately reflected in the database.
  • A Macie scan job scans the specified S3 bucket for sensitive data. The process of creating the Macie job involves multiple steps, allowing us to choose data identifiers, either through custom configurations or standard options:
    • We create a one-time Macie job through the triggerMacieJob Lambda function.
    • We provide the complete S3 bucket path for sensitive data buckets to filter out the scan and avoid unnecessary scanning on other buckets.
    • While creating the job, Macie provides a provision to select data identifiers for sensitive data. In the AWS Data Fabric, we have automated the selection of custom identifiers for the scan, including CREDIT_CARD_NUMBER, DRIVERS_LICENSE, PHONE_NUMBER, USA_PASSPORT_NUMBER, and USA_SOCIAL_SECURITY_NUMBER.

      The findings can be seen on the AWS console and filtered based on S3 Buckets.  We employed Glue jobs to parse the results and route the data to the manual review bucket and raw buckets. The Macie job execution time is around 4-5 minutes. After scanning, if there are less than 1000 sensitive records, they are moved to the quarantine bucket.

  • The parsing of Macie results is handled by a Glue job, implemented as a Python script. This script is responsible for extracting and organizing information from the Macie scanned results bucket.
    • In the parser job, we retrieve the severity level (High, Medium, or Low) assigned by AWS Macie during the one-time job scan.
    • In the Macie scanning bucket, we created separate folders for each source system and data asset, registered through Tiger Data Fabric UI.

      For example: zdf-fmwrk-macie-scan-zn-us-east-2/data/src_sys_id=100/data_asset_id=100000/20231026115848

      The parser job checks for severity and the report in the specified path. If sensitive data is detected, it is moved to the quarantine bucket. We format this data into parquet and process it using Spark data frames.

    • If we peruse the parquet file, found below, sensitive data can be clearly seen as SSN and phone number columns.
    • In the quarantine bucket, the same file is being moved after finding the sensitive data.

      If there are no sensitive records, move the data to the raw zone from where data is further sent to the data lake.
  • Airflow operators come in handy for orchestrating the entire pipeline, whether we integrate native AWS security services with Amazon MWAA or implement custom airflow on EC2 or EKS.
    • GlueJobOperator: Executes all the Glue jobs pre and post-Macie scan.
    • StepFunctionStartExecutionOperator: Starts the execution of the Step Function.
    • StepFunctionExecutionSensor: Waits for the Step Function execution to be completed.
    • StepFunctionGetExecutionOutput Operator: Gets the output from the Step function.
  • IAM Policies grant the necessary permissions for the AWS Lambda functions to access AWS resources that are part of the application. Also, access to the Macie review bucket is managed using standard IAM policies and best practices.

Things to Keep in Mind for Effective Implementation

  • Based on our experience integrating AWS Macie with the Tiger Data Fabric, here are some points to keep in mind for an effective integration of AWS Macie. Macie’s primary objective is sensitive data discovery. It acts as a background process that keeps scanning the S3 buckets/objects. It generates reports that can be consumed by various users and accordingly, actions can be taken. But if the requirement is to string it with a pipeline and automate the action, based on the reports, then a custom process must be created.
  • Macie stops reporting the location of sensitive data after 1000 occurrences of the same detection type. However, this quota can be increased by requesting AWS. It is important to keep in mind that in our use case, where Macie scans are integrated into the pipeline, each job is dynamically created to scan the dataset. If the sensitive data occurrences per detection type exceed 1000, we move the entire file to the quarantine zone.
  • For certain data elements that Macie doesn’t consider sensitive data, custom data identifiers help a lot. It can be defined via regular expressions and its sensitivity can also be customized. Organizations with data that are deemed sensitive internally by their data governance team can use this feature.
  • Macie also provides an allow list—this helps in ignoring some of the data elements which by default Macie tag as sensitive data.’

The AWS Macie – Tiger Data Fabric integration seamlessly enhances automated data pipelines, addressing the challenges associated with unintended exposure of sensitive information in data lakes. By incorporating customizations such as employing regular expressions for data sensitivity and establishing suppression rules within the data fabrics they are working on, data engineers gain enhanced control and capabilities over managing and safeguarding sensitive data.

Armed with the provided insights, they can easily adapt the use cases and explanations to align with their unique workflows and specific requirements.

The post Invisible Threats, Visible Solutions: Integrating AWS Macie and Tiger Data Fabric for Ultimate Security appeared first on Tiger Analytics.

]]>
Why India-Targeted AI Matters: Exploring Opportunities and Challenges https://www.tigeranalytics.com/perspectives/blog/need-india-centric-ai/ Wed, 11 May 2022 13:42:19 +0000 https://www.tigeranalytics.com/?p=7604 The scope for AI-focused innovation is tremendous, given India’s status as one of the fastest-growing economies with the second-largest population globally. Explore the challenges and opportunities for AI in India.

The post Why India-Targeted AI Matters: Exploring Opportunities and Challenges appeared first on Tiger Analytics.

]]>
To understand the likely impact of India-centric AI, one needs to appreciate the country’s linguistic, cultural, and political diversity. Historically, India’s DNA has been so heterogeneous that extracting clear perspectives and actionable insights to address past issues, current challenges, and moving towards our vision as a country would be impossible without harnessing the power of AI.

The scope for AI-focused innovation is tremendous, given India’s status as one of the fastest-growing economies with the second-largest population globally. India’s digitization journey and the introduction of the Aadhaar system in 2010 – the largest biometric identity project in the world – has opened up new venues for AI and data analytics. The interlinking of Aadhaar with banking systems, the PDS, and several other transaction systems allows greater visibility, insights, and metrics that can be used to bring about improvements. Besides using these to raise the quality of lives of citizens while alleviating disparities, AI can support more proactive planning and formulation of policies and roadmaps. Industry experts concur a trigger and economic growth spurt, opining that “AI can help create almost 20 million jobs in India by 2025 and add up to $957 billion to the Indian economy by 2035.”

The current state of AI in India

The Indian government, having recently announced the “AI for All” strategy, is more driven than ever to nurture core AI skills to future-proof the workforce. This self-learning program looks to raise awareness levels about AI for every Indian citizen, be it a school student or a senior citizen. It targets meeting the demands of a rapidly emerging job market and presenting opportunities to reimagine how industries like farming, healthcare, banking, education, etc., can use technology. A few years prior, in 2018, the government had also increased its funding towards research, training, and skilling in emerging technologies by 100% as compared to 2017.

The booming interest has been reflected in the mushrooming of boutique start-ups across the country, as well. With a combined value of $555 million, it is more than double the previous year’s figure of $215 million. Interestingly, analytics-driven products and services contribute a little over 64% of this market -clocking over $355 million. In parallel, the larger enterprises are taking quantum leaps to deliver AI solutions too. Understandably, a large number of them use AI solutions to improve efficiency, scalability, and security across their existing products and services.

Current challenges of making India-centric AI

There is no doubt that AI is a catalyst for societal progress through digital inclusion. And in a country as diverse as India, this can set the country on an accelerated journey toward socio-economic progress. However, the socio, linguistic and political diversity that is India also means more complex data models that can be gainfully deployed within this landscape. For example, NLP models would have to adapt to text/language changes within just a span of a few miles! And this is just the tip of the iceberg as far as the challenges are concerned.

Let’s look at a few of them:

  • The deployment and usage of AI have been (and continues to be) severely fragmented without a transparent roadmap or clear KPIs to measure success. One of the reasons is the lack of a governing body or a panel of experts to regulate, oversee and track the implementation of socio-economic AI projects at a national level. But there’s no avoiding this challenge, considering that the implications of AI policy-making on Indian societies may be irreversible.
  • The demand-supply divide in India for AI skills is huge. The government initiatives such as Startup India as well as the boom in AI-focused startups have only contributed to extending this divide. The pace of getting a trained workforce to cater to the needs of the industry is accelerating but unable to keep up with the growth trajectory that the industry finds itself in. Large, traditionally run institutions are also embracing AI-driven practices having witnessed the competitive advantage it brings to the businesses. This has added to the scarcity that one faces in finding good quality talent to serve today’s demand.
  • The lack of data maturity is a serious roadblock on the path to establishing India-centric AI initiatives – especially with quite a few region-focused datasets being currently unavailable. There is also a parity issue with quite a few industry giants having access to large amounts of data as compared to the government, let alone start-ups. There is also the added challenge of data quality and a single source of truth that one can use for AI model development
  • Even the fiercest AI advocates would admit that its security challenges are nowhere close to being resolved. There is a need for security and compliance governance protocols to be region-specific so that unique requirements are met and yet there is a generalisability that is required to rationalize these models at the national level.
  • There is also a lot of ongoing debate at a global level on defining the boundaries that ethical AI practices will need to lean on. Given India’s diversity, this is a challenge that is magnified many times over

Niche areas where AI is making an impact

Farming

The role of AI in modern agricultural practices has been transformational – this is significant given that more than half the population of India depends on farming to earn a living. In 2019-2020 alone, over $1 billion was raised to fuel agriculture-food tech start-ups in India. It has helped farmers generate steadier income by managing healthier crops, reducing the damage caused by pests, tracking soil and crop conditions, improving the supply chain, eliminating unsafe or repetitive manual labor, and more.

Healthcare

Indian healthcare systems come with their own set of challenges – from accessibility and availability to quality and poor awareness levels. But each one represents a window of opportunity for AI to be a harbinger of change. For instance, AI-enabled platforms can extend healthcare services to low-income or rural areas, train doctors and nurses, address communication gaps between patients and clinicians, etc. Government-funded projects like NITI Aayog and the National Digital Health Blueprint have also highlighted the need for digital transformation in the healthcare system.

BFSI

The pandemic has accelerated the impact of AI on the BFSI industry in India, with several key processes undergoing digital transformation. The mandatory push for contactless remote banking experience has infused a new culture of innovation in mission-critical back-end and front-end operations. A recent PwC-FICCI survey showed that the banking industry has the country’s highest AI maturity index – leading to the deployment of the top AI use cases. The survey also predicted that Indian banks would see “potential cost savings up to $447 billion by 2023.”

E-commerce

The Indian e-commerce industry has already witnessed big numbers thanks to AI-based strategies, particularly marketing. For retail brands, capturing market share is among the toughest worldwide – with customer behavior being driven by a diverse set of values and expectations. By using AI and ML technologies – backed by data science – it would be easier to tap into multiple demographics without losing the context of messaging.

Manufacturing

Traditionally, the manufacturing industry has been running with expensive and time-consuming manually driven processes. Slowly, more companies realize the impact of AI-powered automation on manufacturing use cases like assembly line production, inventory management, testing and quality assurance, etc. While still at a nascent stage, AR and VR technologies are also seeing adoption in this sector in use cases like prototyping and troubleshooting.

3 crucial data milestones to achieve in India’s AI journey

1) Unbiased data distribution

Forming India-centric datasets starts with a unified framework across the country so that no region is left uncovered. This framework needs to integrate with other systems/data repositories in a secure and seamless manner. Even private companies can share relevant datasets with government institutions to facilitate strategy and policy-making.

2) Localized data ownership

In today’s high-risk data landscape, transferring ownership of India-centric information to companies in other countries can lead to compliance and regulatory problems. Especially when dealing with industries with healthcare or public administration, it is highly advised to maintain data control within the country’s borders.

3) Data ethics and privacy

Data-centric solutions that work towards improving human lives require a thorough understanding of personal and non-personal data, matters of privacy, and infringement among others. The responsible aspect to manage this information takes the challenges beyond the realms of deployment of a mathematical solution. Building an AI mindset that raises difficult questions about ethics, policy, and law, and ensures sustainable solutions with minimized risks and negative impact is key. Plus, data privacy should continue to be a hot button topic, with an uncompromising stance on safeguarding the personal information of Indian citizens.

Final thoughts

India faces a catch-22 situation with one side of the country still holding to its age-old traditions and practices. The other side embraces technology change, be it using UPI transfers, QR codes, or even the Aarogya Setu app. But sheer size and diversity of languages, cultures, and politics dictate that AI will neither fail to find areas to cause a profound impact nor face fewer challenges while implementing it.

As mentioned earlier, the thriving startup growth adds a lot of fuel to AI’s momentum. From just 10 unicorns in India in 2018, we have grown to 38. This number is expected to increase to 62 by 2025. In 2020, AI-based Indian startups received over $835 million in funding and are propelling growth few countries can compete with. AI is a key vehicle to ring in the dawn of a new era for India-centric AI– an India which despite the diversity and complex landscape, leads the way in the effective adoption of AI.

This article was first published in Analytics India Magazine.

The post Why India-Targeted AI Matters: Exploring Opportunities and Challenges appeared first on Tiger Analytics.

]]>
When Opportunity Calls: Unlocking the Power of Analytics in the BPO Industry https://www.tigeranalytics.com/perspectives/blog/role-data-analytics-bpo-industry/ https://www.tigeranalytics.com/perspectives/blog/role-data-analytics-bpo-industry/#comments Thu, 27 Jan 2022 10:26:37 +0000 https://www.tigeranalytics.com/?p=6933 The BPO industry has embraced analytics to optimize profitability, efficiency, and customer satisfaction. This blog delves into the specifics of data utilization, unique challenges, and key business areas where analytics can make a difference.

The post When Opportunity Calls: Unlocking the Power of Analytics in the BPO Industry appeared first on Tiger Analytics.

]]>
Around 1981, the term outsourcing entered our lexicons. Two decades later, we had the BPO boom in India, China, and the Philippines with every street corner magically sprouting call centers. Now, in 2022, the industry is transitioning into an era of analytics, aiming to harness its sea of data for profitability, efficiency, and improved customer experience.

In this blog, we delve into details of what this data is, the unique challenges it poses, and the key business areas that can benefit from the use of analytics. We also share our experiences in developing these tools and how they have helped our clients in the BPO industry.

The Information Ocean

The interaction between BPO agents and customers generates huge volumes of both structured and unstructured (text, audio) data. On the one hand, you have the call data that measures metrics such as the number of incoming calls, time taken to address issues, service levels, and the ratio of handled vs abandoned calls. On the other hand, you have customer data measuring satisfaction levels and sentiment.

Insights from this data can help deliver significant value for your business whether it’s around more call resolution, reduced call time & volume, agent & customer satisfaction, operational cost reduction, growth opportunities through cross-selling & upselling, or increased customer delight.
The trick is to find the balance between demand (customer calls) and supply (agents). An imbalance can often lead to revenue losses and inefficient costs and this is a dynamic that needs to be facilitated by processes and technology.

Challenges of Handling Data

When you are handling such sheer volumes of data, the challenges too can be myriad.
Our clients wage a daily battle with managing these vast volumes, harmonizing internal and external data, and driving value through them. For those that have already embarked on their analytical journey, the primary goals are finding the relevance of what they built, driving scalability, and leveraging new-age predictive tools to drive ROI.

Delivering Business Value

Based on our experience, the business value delivered from advanced Analytics in the BPO industry is unquestionable, exhaustive and primarily influences these key aspects:

1) Call Management

Planning agent resources based on demand (peak and off-peak) and skillsets accounting for how long they take to resolve issues can impact business costs. AI can help automate the process to help optimize costs We have built an automated and real-time scheduling and resource optimization tool that has led one of our BPO clients to a cost reduction of 15%.

2) Customer Experience

Call center analytics give agents access to critical data and insights to work faster and smarter, improve customer relationships and drive growth. Analytics can help understand the past behavior of a customer/similar customers and recommend products or services that will be most relevant, instead of generic offers. It can also predict which customers are likely to need proactive management. Our real-time cross-selling analytics has led to a 20% increase in revenue.

3) Issue Resolution

First-call resolution refers to the percentage of cases that are resolved during the first call between the customer and the call center. Analytics can help automate the categorization process of contact center data by building a predictive model. This can help with a better customer servicing model achieved by appropriately capturing the nuances of customer chats with contact centers. This metric is extremely important as it helps in reducing the customer churn rate.

4) Agent Performance

Analytics on call-center agents can assist in segmenting those who had a low-resolution rate or were spending too much time on minor issues, compared with top-performing agents. This helps the call center resolve gaps or systemic issues, identify agents with leadership potential, and create a developmental plan to reduce attrition and increase productivity.

5) Call Routing

Analytics-based call routing is based on the premise that records of a customer’s call history or demographic profile can provide insight into which call center agent(s) has the right personality, conversational style, or combination of other soft skills to best meet their needs.

6) Speech Analytics

Detecting trends in customer interactions and analyzing audio patterns to read emotions and stress in a speaker’s voice can help reduce customer churn, boost contact center productivity, improve agent performance and reduce costs by 25%. Our tools have clients in predicting member dissatisfaction to achieve a 10% reduction in first complaints and 20% reduction in repeat complaints.

7) Chatbots and Automation

Thanks to the wonders of automation, we can now enhance the user experience to provide personalized attention to customers available 24/7/365. Reduced average call duration and wage costs improve profitability. Self-service channels such as the help center, FAQ page, and customer portals empower customers to resolve simple issues on their own while deflecting more cases for the company. Our AI-enabled chatbots helped in strengthening engagement and quicker resolutions of 80% of user queries.

Lessons from The Philippines

Recently, in collaboration with Microsoft, we conducted a six-week Data & Analytics Assessment for a technology-enabled outsourcing firm in the Philippines. The client was encumbered by complex ETL processes, resource bottlenecks on legacy servers, and a lack of UI for troubleshooting leading to delays in resolution and latency issues. They engaged Tiger Analytics to assess their data landscape.

We recommended an Enterprise Data Warehouse modernization approach to deliver improved scalability & elasticity, strengthened data governance & security, and improved operational efficiency.

We did an in-depth assessment to understand the client’s ecosystem, key challenges faced, data sources, and their current state architecture. Through interactions with IT and business stakeholders, we built a roadmap for a future state data infrastructure that would enable efficiency, scalability, and modernization. We also built a strategic roadmap of 20+ analytics use cases with potential ROI across HR and contact center functions.

The New Era

Today, the Philippines has been recognized as the BPO capital of the world. The competition will toughen both from new players and existing ones. A digital transformation is underway in the BPO industry. Success in this competitive space lies with companies that will harness the huge volume of data they have into meaningful and actionable change.

The post When Opportunity Calls: Unlocking the Power of Analytics in the BPO Industry appeared first on Tiger Analytics.

]]>
https://www.tigeranalytics.com/perspectives/blog/role-data-analytics-bpo-industry/feed/ 187