Data Privacy Archives - Tiger Analytics

In Digital, We Trust: A Deep Dive into Modern Data Privacy Practices

TA@2023 — Wed, 12 Jul 2023 14:43:16 +0000

How can leaders make better, more trustworthy decisions regarding technology? According to the World Economic Forum, that’s where Digital Trust can help steer both companies and customers toward a win-win outcome.

“Privacy serves as a requirement to respect individuals’ rights regarding their personal information and a check on organizational momentum towards processing personal data autonomously and without restriction. A focus on this goal ensures that organizations can unlock the benefits and value of data while protecting individuals from the harms of privacy loss. It effectuates inclusive, ethical, and responsible data use – or digital dignity – by ensuring that personal data is collected and processed for a legitimate purpose(s) (e.g., consent, contractual necessity, public interest, etc.)”

The issue with today’s technology is how to gather insights that can help make better decisions that follow privacy regulations. With the privacy regulations enforcing principles like ‘Right to be Forgotten,’ ‘Privacy by Design,’ ‘Data Portability,’ etc., a checkbox approach to Data Privacy is not sustainable. A paradigm shift in privacy mindset is necessary to mitigate the risks arising due to the use of disruptive technologies.

Organizations need to keep in mind the following aspects of the Data privacy approach while building privacy considerations into their software or services:

Data Classification – Confidential, sensitive, sensitive but need re-use, sensitive but not re-use, etc., based on data classification methodology

Anonymization – This is a process of data de-identification that produces data where individual records cannot be linked back to an original as they do not include the required translation variables to do so. This is an irreversible process.

Re-Identification – The process of re-identifying the de-identified data using the referential values by using the same methods as of De-Identification.

Format Preserving Encryption – The process of transforming data in such a way that the output is in the same format as the input using cipher keys and algorithms.

De-Identification – The process of removing or obscuring any personally identifiable information from individual records in a way that minimizes the risk of unintended disclosure of the identity of individuals and information about them. The data can be reversible. However, it may also include the required translation variables to link back the data to the original data using different mechanisms based on the below table.

Encryption – The process of transforming data using cipher keys and algorithms to make it unreadable cipher text to anyone except those possessing a key. Restoring the data needs both algorithm and cipher keys.

The Data Privacy Approach – From Theory to Implementation

At Tiger Analytics, we built a globally scaled Data and Analytics platform for a US-based Life sciences org, with data discovery and classification as a core component – with the help of AWS Macie, which classifies data based on content, Regex, file extension, and PII classifier.

Masking sensitive data by partially or fully replacing characters with symbols, such as an asterisk (*) or hash (#).
Replacing each instance of sensitive data with a token, or surrogate, string.
Encrypting and replacing sensitive data using a randomly generated or pre-determined key.

By leveraging best-in-the-class encryption and masking solutions, we were able to protect sensitive data elements in hyperscalers/cloud natives of our clients and their customers.

Deep diving into De-Identification

Different De-identification methods are described in the table below with an example:

Understanding the Encryption Approach

Data Privacy Encryption is achieved through leveraging a new encryption technique, FPE (Format Preserving Encryption), which preserves the format of the sensitive data fields while providing an advanced encryption standard level of encryption strength.

In a typical scenario, data from the source systems will land on the landing zone over a secure channel, and data encryption, masking, and other security measures will be applied, depending on data structure and other library integrations.

Data encryption involves the installation and configuration of 3rd party encryption and key management solution on the Cloud (AWS/GCP) platform. Encryption keys are stored and managed within the Key Management Server, and only authorized users/resources are granted access to the same.

In the case of structured data, sensitive data (specific PII / PHI attributes) will be encrypted without altering the original format. This will be done using the 3rd party Format Preserving Encryption (FPE) solution to preserve business value and referential integrity across distributed data sets when the data moves from un-trusted to a trusted zone at the landing layer. In the case of unstructured data, either the entire file or specific PII/PHI data will be encrypted during the transition from un-trusted to trusted zone on a case-by-case basis, based on the requirements.

Users requiring access to sensitive data (PII/PHI) will be made part of the relevant IAM roles, user credentials will be validated against the IAM, and the data will be transparently decrypted using the keys. Validated users will be able to access the sensitive data in the clear, whereas users who do not have the necessary privileges will see the data only in an encrypted format.

Data Encryption on AWS and GCP – How they Differ

Whenever data is written to the storage platform, AWS will apply encryption on it, and conversely, when the data is read from the storage, decryption will happen transparently. In addition to encryption of specific PII/PHI data, AWS native transparent encryption & KMS features shall be leveraged to protect the data in the AWS cloud. These features will provide protection to data stored in any potential storage mechanism, such as S3, Kinesis, Redshift, Dynamo DB, etc.

Google adds differential privacy to Google SQL for BigQuery, building on the open-source differential privacy library that is used by Ads Data Hub and the COVID-19 Community Mobility Reports. Differential privacy is an anonymization technique that limits the personal information that is revealed by an output. Differential privacy is commonly used to allow inferences and to share data while preventing someone from learning information about an entity in that dataset.

With BigQuery differential privacy, we can:

Anonymize results with individual-record privacy.
Anonymize results without copying or moving your data, including data from AWS and Azure with BigQuery Omni.
Anonymize results that are sent to Dataform pipelines so that they can be consumed by other applications.
Anonymize results that are sent to Apache Spark stored procedures.

Let’s take the example of a person opening a bank account on a web portal. They have to fill in their age, telephone number, and country. Let’s look at how their Data Privacy can be protected while gathering the necessary information.

Points to note:

The age field is sensitive, and its actual value is usually not required for any analysis/processing by downstream systems.
The telephone number is sensitive in nature, and its value in the same format (not actual) is required by the data analytics platform for further analysis. The actual value of the telephone number is required only on-premise.
Country value is not classified as sensitive; however, its value is encrypted while sending the data to the cloud.
Data is transmitted between two systems on-premise, on cloud and on-premise to cloud or cloud to on-premise over HTTPS.
BU-specific encryption keys are used for encryption while moving to the cloud. Data gets decrypted on the premise using the BU-specific keys.
On-Cloud data at rest is implemented using the cloud provider’s key by applying the techniques of transparent DB encryption, Volume encryption, or Disk encryption.
Required governance controls at process (for example, approvals for access), people (for example, trainings, background checks, etc.), and technical tools (for example, authentication and access control) are created on-premise and as well as cloud.

It is assumed that age is not required for further processing, and telephone number is required for analysis and processing by the Data Analytics platform.
Telephone number is sensitive in nature, and its value in the same format (not actual) is required by the Data Analytics platform for further analysis. The actual value of the Telephone number is required only on-premise.
The ‘Country’ value is not classified as sensitive; however, its value is encrypted while sending the data to the cloud.
Based on the data classification, the age value gets anonymized. This is a one-way process.
The Telephone number gets De-identified using the de-identification method or algorithm by preserving the referential value.
The country value is encrypted using format preserving encryption algorithm and BU-specific encryption key.
Then age (anonymized), telephone number (de-identified), and country (encrypted) will be sent to Data Analytics Platform on cloud over HTTPS.
The data then gets stored in the cloud platform in an encrypted format using the cloud provider’s server-side encryption keys.
The data at rest on cloud is always in an encrypted format by using the cloud provider’s features like Transparent DB encryption, Volume Encryption, and/or Disk encryption techniques.
If the data needs to processing by the Analytics platform – the data first gets decrypted using the Cloud Provider’s specific key – complete processing
Once processing is completed, the data again gets encrypted and stored on Cloud.
The decryption and re-identification techniques are applied on-premise to retrieve the original values to be consumed by other applications such as call center, ESB, etc.

Final thoughts

“Digital trust is individuals’ expectation that digital technologies and services – and the organizations providing them – will protect all stakeholders’ interests and uphold societal expectations and values.” And ensuring the right privacy considerations, transparent communication, and intent will go a long way in building a mutually trustworthy exchange between organizations and individuals.

Sources:

The Digital Trust report: https://initiatives.weforum.org/digital-trust/about

https://fpf.org/blog/a-visual-guide-to-practical-data-de-identification/

https://docs.aws.amazon.com/whitepapers/latest/logical-separation/encrypting-data-at-rest-and–in-transit.html

https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-differential-privacy-with-tumult-labs?utm_source=twitter&utm_medium=unpaidsoc&utm_campaign=fy23q2-googlecloudtech-blog-data-in_feed-no-brand-global&utm_content=-&utm_term=-

https://github.com/priyankavergadia/GCPSketchnote

The post In Digital, We Trust: A Deep Dive into Modern Data Privacy Practices appeared first on Tiger Analytics.

Why India-Targeted AI Matters: Exploring Opportunities and Challenges

onemg — Wed, 11 May 2022 13:42:19 +0000

To understand the likely impact of India-centric AI, one needs to appreciate the country’s linguistic, cultural, and political diversity. Historically, India’s DNA has been so heterogeneous that extracting clear perspectives and actionable insights to address past issues, current challenges, and moving towards our vision as a country would be impossible without harnessing the power of AI.

The scope for AI-focused innovation is tremendous, given India’s status as one of the fastest-growing economies with the second-largest population globally. India’s digitization journey and the introduction of the Aadhaar system in 2010 – the largest biometric identity project in the world – has opened up new venues for AI and data analytics. The interlinking of Aadhaar with banking systems, the PDS, and several other transaction systems allows greater visibility, insights, and metrics that can be used to bring about improvements. Besides using these to raise the quality of lives of citizens while alleviating disparities, AI can support more proactive planning and formulation of policies and roadmaps. Industry experts concur a trigger and economic growth spurt, opining that “AI can help create almost 20 million jobs in India by 2025 and add up to $957 billion to the Indian economy by 2035.”

The current state of AI in India

The Indian government, having recently announced the “AI for All” strategy, is more driven than ever to nurture core AI skills to future-proof the workforce. This self-learning program looks to raise awareness levels about AI for every Indian citizen, be it a school student or a senior citizen. It targets meeting the demands of a rapidly emerging job market and presenting opportunities to reimagine how industries like farming, healthcare, banking, education, etc., can use technology. A few years prior, in 2018, the government had also increased its funding towards research, training, and skilling in emerging technologies by 100% as compared to 2017.

The booming interest has been reflected in the mushrooming of boutique start-ups across the country, as well. With a combined value of $555 million, it is more than double the previous year’s figure of $215 million. Interestingly, analytics-driven products and services contribute a little over 64% of this market -clocking over $355 million. In parallel, the larger enterprises are taking quantum leaps to deliver AI solutions too. Understandably, a large number of them use AI solutions to improve efficiency, scalability, and security across their existing products and services.

Current challenges of making India-centric AI

There is no doubt that AI is a catalyst for societal progress through digital inclusion. And in a country as diverse as India, this can set the country on an accelerated journey toward socio-economic progress. However, the socio, linguistic and political diversity that is India also means more complex data models that can be gainfully deployed within this landscape. For example, NLP models would have to adapt to text/language changes within just a span of a few miles! And this is just the tip of the iceberg as far as the challenges are concerned.

Let’s look at a few of them:

The deployment and usage of AI have been (and continues to be) severely fragmented without a transparent roadmap or clear KPIs to measure success. One of the reasons is the lack of a governing body or a panel of experts to regulate, oversee and track the implementation of socio-economic AI projects at a national level. But there’s no avoiding this challenge, considering that the implications of AI policy-making on Indian societies may be irreversible.
The demand-supply divide in India for AI skills is huge. The government initiatives such as Startup India as well as the boom in AI-focused startups have only contributed to extending this divide. The pace of getting a trained workforce to cater to the needs of the industry is accelerating but unable to keep up with the growth trajectory that the industry finds itself in. Large, traditionally run institutions are also embracing AI-driven practices having witnessed the competitive advantage it brings to the businesses. This has added to the scarcity that one faces in finding good quality talent to serve today’s demand.
The lack of data maturity is a serious roadblock on the path to establishing India-centric AI initiatives – especially with quite a few region-focused datasets being currently unavailable. There is also a parity issue with quite a few industry giants having access to large amounts of data as compared to the government, let alone start-ups. There is also the added challenge of data quality and a single source of truth that one can use for AI model development
Even the fiercest AI advocates would admit that its security challenges are nowhere close to being resolved. There is a need for security and compliance governance protocols to be region-specific so that unique requirements are met and yet there is a generalisability that is required to rationalize these models at the national level.
There is also a lot of ongoing debate at a global level on defining the boundaries that ethical AI practices will need to lean on. Given India’s diversity, this is a challenge that is magnified many times over

Niche areas where AI is making an impact

Farming

The role of AI in modern agricultural practices has been transformational – this is significant given that more than half the population of India depends on farming to earn a living. In 2019-2020 alone, over $1 billion was raised to fuel agriculture-food tech start-ups in India. It has helped farmers generate steadier income by managing healthier crops, reducing the damage caused by pests, tracking soil and crop conditions, improving the supply chain, eliminating unsafe or repetitive manual labor, and more.

Healthcare

Indian healthcare systems come with their own set of challenges – from accessibility and availability to quality and poor awareness levels. But each one represents a window of opportunity for AI to be a harbinger of change. For instance, AI-enabled platforms can extend healthcare services to low-income or rural areas, train doctors and nurses, address communication gaps between patients and clinicians, etc. Government-funded projects like NITI Aayog and the National Digital Health Blueprint have also highlighted the need for digital transformation in the healthcare system.

BFSI

The pandemic has accelerated the impact of AI on the BFSI industry in India, with several key processes undergoing digital transformation. The mandatory push for contactless remote banking experience has infused a new culture of innovation in mission-critical back-end and front-end operations. A recent PwC-FICCI survey showed that the banking industry has the country’s highest AI maturity index – leading to the deployment of the top AI use cases. The survey also predicted that Indian banks would see “potential cost savings up to $447 billion by 2023.”

E-commerce

The Indian e-commerce industry has already witnessed big numbers thanks to AI-based strategies, particularly marketing. For retail brands, capturing market share is among the toughest worldwide – with customer behavior being driven by a diverse set of values and expectations. By using AI and ML technologies – backed by data science – it would be easier to tap into multiple demographics without losing the context of messaging.

Manufacturing

Traditionally, the manufacturing industry has been running with expensive and time-consuming manually driven processes. Slowly, more companies realize the impact of AI-powered automation on manufacturing use cases like assembly line production, inventory management, testing and quality assurance, etc. While still at a nascent stage, AR and VR technologies are also seeing adoption in this sector in use cases like prototyping and troubleshooting.

3 crucial data milestones to achieve in India’s AI journey

1) Unbiased data distribution

Forming India-centric datasets starts with a unified framework across the country so that no region is left uncovered. This framework needs to integrate with other systems/data repositories in a secure and seamless manner. Even private companies can share relevant datasets with government institutions to facilitate strategy and policy-making.

2) Localized data ownership

In today’s high-risk data landscape, transferring ownership of India-centric information to companies in other countries can lead to compliance and regulatory problems. Especially when dealing with industries with healthcare or public administration, it is highly advised to maintain data control within the country’s borders.

3) Data ethics and privacy

Data-centric solutions that work towards improving human lives require a thorough understanding of personal and non-personal data, matters of privacy, and infringement among others. The responsible aspect to manage this information takes the challenges beyond the realms of deployment of a mathematical solution. Building an AI mindset that raises difficult questions about ethics, policy, and law, and ensures sustainable solutions with minimized risks and negative impact is key. Plus, data privacy should continue to be a hot button topic, with an uncompromising stance on safeguarding the personal information of Indian citizens.

Final thoughts

India faces a catch-22 situation with one side of the country still holding to its age-old traditions and practices. The other side embraces technology change, be it using UPI transfers, QR codes, or even the Aarogya Setu app. But sheer size and diversity of languages, cultures, and politics dictate that AI will neither fail to find areas to cause a profound impact nor face fewer challenges while implementing it.

As mentioned earlier, the thriving startup growth adds a lot of fuel to AI’s momentum. From just 10 unicorns in India in 2018, we have grown to 38. This number is expected to increase to 62 by 2025. In 2020, AI-based Indian startups received over $835 million in funding and are propelling growth few countries can compete with. AI is a key vehicle to ring in the dawn of a new era for India-centric AI– an India which despite the diversity and complex landscape, leads the way in the effective adoption of AI.

This article was first published in Analytics India Magazine.

The post Why India-Targeted AI Matters: Exploring Opportunities and Challenges appeared first on Tiger Analytics.

Ensuring Data Security Commitment: Tiger’s Security Milestones

onemg — Tue, 22 Mar 2022 12:18:11 +0000

Tiger Analytics is among the most trusted AI and analytics partners of numerous Fortune 500 companies. This can be directly attributed to our commitment to security and privacy, along with delivering superlative work. Our certifications and attestations stand to demonstrate the fact that securely handling the data shared by our clients and stakeholders is our top priority.

At the beginning of our triumphant journey to achieve security goals, we focused on building a robust information security framework. In order to achieve this goal, we adopted ISO/IEC 27001:2013, which is the industry’s revered information security standard from International Organization for Standardization and International Electrotechnical Commission. After we implemented the necessary framework, requirements, and control of the standard, we got ourselves certified by TÜV SÜD in 2017.

In the following years, we sustained our rigorous procedures whilst simultaneously improving our information security posture. Effectively managing Personally Identifiable Information was an important goal. In order to achieve this objective, we implemented and obtained the attestation for GDPR and HIPAA in 2019. The most recent addition to this list is the international standard for privacy information management ISO 27701:2019, which is an extension of ISO 27001:2013. The ISO 27701 has helped us define a strong foundation to manage all Personally Identifiable Information under our custody as Controller and Processor.

We undergo an annual evaluation of ISO 27001 and ISO 27701, which demonstrates our continued commitment to information security. We also undergo the SOC 2 Type II assessment, which is a rigorous inspection of information security controls from the standpoint of objective, effectiveness, and compliance.

ISO/IEC 27001:2013

Specifies the requirements for establishing, implementing, maintaining, and continually improving an information security management system within the context of the organization. It also includes requirements for the assessment and treatment of information security risks tailored to the needs of the organization.

ISO/IEC 27701:2019

It is a data privacy extension to ISO 27001. This provides guidance for organizations looking to put in place systems to support compliance with GDPR and other data privacy requirements.

General Data Protection Regulation (EU) 2016/679 (GDPR)

It is a regulation in EU law on data protection and privacy in the European Union (EU) and the European Economic Area (EEA).

Health Insurance Portability and Accountability Act (HIPAA)

This sets the standard for sensitive patient data protection. Companies that deal with Protected Health Information (PHI) must have physical, network, and process security measures in place and follow them to ensure HIPAA Compliance.

SSAE 18 SOC 2 Type II

This report focuses on the American Institute of Certified Public Accountant’s (AICPA) trust service principles. It examines a service provider’s internal controls and systems related to security, availability, processing integrity, confidentiality, and privacy of data.

All of these credentials make us a trusted partner to protect the security, confidentiality, integrity, and privacy of data handled in the due course of our business. Along with these, we also have various awareness initiatives to keep our employees updated on the information security and privacy objectives and processes. Along with the above-mentioned credentials, these initiatives help keep information security an integral part of Tiger’s unique culture.

The post Ensuring Data Security Commitment: Tiger’s Security Milestones appeared first on Tiger Analytics.