Infosys experts share their views on how digital is significantly impacting enterprises and consumers by redefining experiences, simplifying processes and pushing collaborative innovation to new levels

« March 2020 | Main | July 2020 »

June 6, 2020

Is Homomorphic Encryption a game changer in the Data Privacy Space?

Homomorphic Encryption (HE):

 

                "Homomorphic encryption is one form of encryption that allows computations to be executed on ciphertexts and the encrypted output obtained will be same as the encrypted result of computations performed on the plain text."

In other words, we can encrypt some data and then perform any operation on encrypted data. The result of operation if decrypted gives the data result which would be identical to the result which we get if we had performed the said operation on plain text.

A public key is used to encrypt data using Homomorphic encryption, this encrypted data can only be unencrypted using a private key. Whoever owns the private key has access to unencrypted data.

 

Why do we need Homomorphic Encryption?

 

Nowadays most major industries are taking help from cloud-based service providers to help support their data storage and other various services like computation on data or analyzing the data etc. Now, this data that is on the cloud might contain sensitive or personal information (PII) that needs to be kept private. Now on the cloud, there is always a risk of data leakage or data hack which depends on the level of security the cloud provider is having.

To facilitate these situations, data on the cloud are mostly stored in encrypted form. If there is an arithmetic or logical operation to be applied on the data which resides on the cloud, the data first has to be fetched back, Decrypted, and apply the computation.

Now, this creates a risk of data leakage again at the computation or maybe in-transit back to the cloud. If the PII or any sensitive data gets leaked anywhere the organization must bear all the blames and associated penalties etc if it is obligated to follow strict regulations like GDPR, CCPA and HIPAA. Now that's where the idea of Homomorphic encryption came in the picture. 

 

HE is a type of encryption that does not need to decrypt the data before you using it. This also ensures that the data integrity and privacy taken care off while the data can still be processed.

 

In other words, HE can enable individuals or a third party to use the encrypted data without having access to OR knowing the actual contents of the encrypted data.

 

Homomorphic Encryption (HE) types

 

The main difference between variations of HE algorithms is the ability to cater to various Mathematical OR Analytical operations performed on the encrypted text and no of times they can be applied,

Partially Homomorphic Encryption (PHE)

               In PHE, it allows only one mathematical operation to be performed on cyphered data while keeping the sensitive data confidential.

Somewhat Homomorphic Encryption (SHE)

                SHE supports limited number of computations up to certain level of complexity but can be performed only limited number of times (Addition and Multiplications are few example)

Fully Homomorphic Encryption (FHE)

                FHE allows multiple computations on the cyphered data, also the type of computation that can be applied are unlimited.

Pros

1.       Security of Data that is stored in the Cloud: Using HE, the data stored in the cloud can be secured, while also be able to perform calculations and also search for ciphered text that can be later decrypted without losing the originality of the data.

 

2.       Data Analytics: HE encrypts data and thus this data can be outsourced to third party for statistical analysis, research and third party sharing purposes while privacy of the actual subject is protected. 

Cons

One of the major drawbacks of HE is that it comes at the expense of speed. The computation overhead on HE data is higher than that is of doing on plain text.

Conclusion

1.       The concept of HE is convincingly essential in implementation in today's environment where data privacy is one of the main concerns of any industry.

2.       HE ensures we get the advantage of leveraging computation on data at the same time ensure that the data is secure.

3.       It is a good idea to leverage HE algorithms to ensure your data is secure on Cloud. Thus, you can take the advantage of storing the data in cloud with lower cost and be able to provision that data whenever needed.


Author: Mustafa Saeed

June 5, 2020

Contextual Data Generation for Secure Quality Assurance

Why do you need Contextual Data Generation in your testing process?

 

In today's world, quality assurance is an integral part of the IT delivery process which that ensures that the final product is ready to be shipped to the customer. Testing in production-like test environments is an essential part of quality assurance. 

While production data is the best data to test the application, many organizations are not allowed to use production data for testing purposes due to privacy concerns and key global regulations such as GDPR and CCPA. The alternatives are to use anonymized data or synthetically generated data.

In Today's post-pandemic world, the key for a successful testing exercise using contextual test data which enables the organization to simulate production-like use cases devoid of PII (Personally Identifiable Information) / SI (Sensitive Information) to ensure there are no data privacy or regulation breaches.  

 

Contextual Test Data as a Pivot of Data Privacy in Application Development and Testing

 

The process of generating the test data can be achieved through one of the following methods: 

  1. Test data can be manually generated
  2. Mass copy of data from production to testing environment,
  3. Mass copy of test data from legacy client systems and
  4. Automated Test Data Generation using tools.

 

The synthetic data generation falls under the last method where we can leverage the power of recent technologies such as Machine Learning to train models to identify the different kinds of fields present. This can be done by reading through the schema details of the requirement. Once categorized, we can identify a set of algorithms designed specifically for data generation purposes of that specific category and generate production-like data for that field. A similar procedure can be followed for all the fields, tables in the schema for the generation of mock data required in testing. 

 

We can also train the model to follow references and use data generated already at the parent field to regenerate data at some referential field to avoid any errors related to the referential integrity of tables in the schema. While generating the PII/SI fields, we can follow notation conventions and generate mock data that is dummy, thus helping comply with the regulations in place.

 

How do we use contextual data generation for our testing activities?

 

There is a gamut of products available in the market for data generation - Mockaroo (supports generation in SQL, Delimited files, JSON & Excel), SQL Data Generator by Redgate (As the name suggests used for SQL Server Management Studio), Test database generator by IBM for DB2, Generate Data (MySQL 4 or higher). One of the key products on Contextual Data Generation is from Infosys - Infosys Enterprise Data Privacy Suite, or iEDPS. iEDPS is an intelligent data generation product which caters to a wide range of requirements making life easier for the users in need of generating test data. The inputs needed can be simplified to being as minimal as only schema details of requirement and number of records needed. It contains more than 35 algorithms designed specifically for data generation purposes.

 

iEDPS is an easy-to-use, high performance, scalable, and cost-effective data privacy and protection solution that automates the data protection and privacy across an enterprise. Loaded with deterministic, selective, dynamic and static masking tools along with the data generation tool, the best part about iEDPS is that it can be deployed on any platform, both On-Premise systems as well as cloud environments for organization-wide usage of the tool in the enterprise and supports all major databases and file systems. Before choosing a data generation tool we should have following things under consideration like data generation methods provided, support for different datatypes, databases and various operating system among many other factors. iEDPS checks most, if not all, of the boxes. Here's a video explaining about iEDPS. More details about iEDPS and its product suite available at iEDPS Microsite.

Author:- Pranay Sharma R