Infosys experts share their views on how digital is significantly impacting enterprises and consumers by redefining experiences, simplifying processes and pushing collaborative innovation to new levels

« June 2020 | Main | August 2020 »

July 31, 2020

Differential Privacy - A milestone in data privacy

Confidentiality and Integrity are two of the prime data privacy goals in today's scenario. Though there have been continuous enhancements in the cryptographic mechanism, there occurs a proportionate growth in the security attacks both 'active' and 'passive'. When it comes to a statistical database say big data, it is indeed a great matter of concern in protecting the sensitive information which may or may not be individual specific, while still unrevealing the actual PII data. There comes the differential privacy for rescue.

The need for Differential Privacy

Netflix in 2007 released a data set of movie rating by users to conduct a competition after anonymizing PII information about the users. But still, analysts had cleverly linked the Netflix anonymized training database with the auxiliary data from the IMDB database to partly de-anonymize the Netflix training database.

The present world is going through a tough time battling the pandemic COVID-19. Governments across the world are trying to figure out the source and route maps of the affected individuals to have a track on the outbreak of the virus. Governments are also releasing statistical data about COVID patients to the public.  At the same time, Governments must make sure that their PII is protected. One of the traditional ways to do so is by anonymizing the PII information. But as we saw above from the case of Netflix anonymizing the PII is not enough. Since auxiliary information and other sources of information are available in the Public domain this can be combined with the statistical data and do reverse engineering to rediscover the actual PII data. This may lead to a privacy breach. Here comes the need of Differential Privacy.

What is Differential Privacy (DP)?

        Differential Privacy redefines "privacy" for statistical databases. Differential Privacy is a mathematical framework that provides privacy for statistical databases. A statistical database in this sense is any database that provides large-scale information about a population without revealing the individual-specific information. The sensitive data in the statistical database is secure such that it is devoid of any third party potential privacy attacks. In other words, it is difficult to reverse engineer a differentially private data. This is already being used by several organizations some of which are Apple, Uber, US Census Bureau, Microsoft.

Goals of Differential Privacy

1.      To make sure that the data is not compromised at the same time maximize the data accuracy.

2.      To eliminate potential methods that may distinguish an individual from a large set of data.

3.      To ensure the protection of an individual's PII under any circumstance.

 

The Mechanism behind Differential Privacy

The Conventional way of preserving data privacy is by anonymizing the data sets. But the main mechanism behind differential privacy is to shield the dataset by introducing carefully tuned random noise to the data set while using it to perform any analysis. The amount of noise that is to be added to a data set is controlled by a privacy parameter called the privacy loss parameter represented by the Greek letter ɛ.  ɛ computes the effect of each discrete information on the respective analysis of the output. This parameter ɛ determines the overall privacy provided by a differentially private study. The smaller value of ɛ indicates better protection causing low privacy risks. Conversely, a larger value indicates worse protection causing high privacy risks. A value of ɛ = 0 gives complete data privacy but its usability will be zero. Privacy loss is independent of the database as well as the database. Larger the database greater the accuracy amount for a differentially private algorithm.

 

The privacy loss parameter is proportional to the "privacy budget". It is up to the different analysis performed on the data set one can decide how much privacy budget is to be utilized on a given data. This means one can exactly define how much of your privacy budget you can use until the data is not considered as anonymous anymore.

View image

The above picture shows how differential privacy works.

If someone who is a data expert was depending on the databases which are having a single data entry difference, then the chance of that reflected change in the result will not get affected by the variation of that particular entry. The only probability of that change would be that of a multiplicative factor. This means the expert cannot differentiate one database from others depending on the output when differential privacy is made utilized.

Conclusion

Differential privacy is one of the most anticipated research topics now in the field of data privacy. The adoption of differential privacy is still in its early stages. There is no doubt that differential privacy provides a guarantee of privacy and security to one's data. Yet the limitation is that if we have a high dimensional data and we need to provide more privacy to that, then we might end up adding lots of noise. This may make the data unworthy. But still, it is a far better approach in protecting the personal data against the privacy breach for high dimensional data while comparing with the traditional data privacy techniques. We at iEDPS Product Team have been analyzing the market, where our customers require data sets that have to be reconstruction resistant. This is one area where we feel iEDPS Differential Privacy Module will add a lot of value to solve these business use cases.

Author: Jessmol.paul

Differential Privacy: The Privacy Guarantee

We are living in times where "data" has become the driving force of our lives. Organizations, Governments, and individuals are continuously collecting information to draw insights and provide the best user experience possible. As the world was embracing intelligence derived from data as the primary tool to make all the decisions, there came the growing concerns about the privacy of individuals. And to keep things in check, a host of regulations and laws have come into effect (GDPR, CCPA, ISO 27001), to ensure responsible usage of the data. In this scenario, where both data privacy and data analytics are required, differential privacy enables data access for analysis without the risk of privacy violations.

The traditional approaches to data privacy predominantly involve the removal of PII (personally identifiable information) using techniques like anonymization, pseudonymization, data obfuscation, etc. Although effective to an extent, these techniques have certain limitations. An analyst performing computational analysis on a dataset, in which the date of birth was anonymized, would get misleading results as the anonymized data doesn't give a guarantee of retaining the original statistical value of the dataset. Sometimes, only a few fields that are deemed as PII are protected/anonymized and other fields are left as is to preserve the statistical values of the data set. In this scenario, there is a threat of revealing the user's identity using linkage attacks by combining this data with data available in another dataset.

 

What is Differential Privacy 

Differential Privacy of data leverages a statistical framework for provable privacy protection against potential privacy attacks.

A differentially private algorithm is one wherein, by analyzing the output, we cannot ascertain if a data subject was part of the analysis or not. Thus, a differentially private algorithm provides a guarantee that its behavior remains unaffected when a data subject is included or removed from the analysis. This ensures that the output obtained by performing a differentially private analysis on a dataset containing a particular data subject will give similar results to an analysis performed on the same dataset after excluding that particular data subject. This gives a formal guarantee that individual-level information about participants in the database is not leaked. This is in alignment with the generally perceived concept of privacy by individuals, i.e., their data should not be able to be singled out by specific queries. 

dp.png

How Differential Privacy Works

Traditional data protection techniques work on the notion that privacy is characteristic of the result of an analysis. However, it must be considered as an attribute of the analysis itself.

Differential Privacy safeguards an individuals' privacy by introducing random noise into the dataset while performing the analysis. By the introduction of noise, it would not be possible to identify an individual based on the outcome of any analysis. However, due to the introduction of noise, the output of the analysis is an approximation and not the exact result that would've resulted if performed on the actual data set. It is also very likely that a differentially private analysis if performed multiple times might result in different outcomes each time due to the randomness of the noise introduced.

The privacy loss parameter, Ɛ (epsilon) determines the amount of noise to be introduced. This parameter is derived from the probability distribution called Laplace Distribution. It determines how much the computation can deviate if one of the data subjects was removed from the data set. Smaller values of Ɛ results in smaller deviations in the computations where a users data was to be excluded from the data set. Hence, a smaller value of Ɛ will result in stronger data protection but the computational results will be less accurate. An optimal value of Ɛ has not yet been identified, which will guarantee the required level of protection and accuracy. We're still in the early stages of adoption of differential privacy. It's a trade-off between privacy and accuracy that users must make.

 

Use Cases and Implementation

Differential Privacy techniques can be used to perform a wide range of statistical or computational analyses. Below are some of the broad categories of computation, which can leverage differential privacy:

·         Count queries

·         Histograms

·         Cumulative distribution functions

·         Linear regression

·         Statistical and Machine Learning techniques that involve clustering and classification

·         Synthetic data generation and other statistical disclosure limitation techniques

 

Different approaches to differential privacy have been implemented by various organizations. Some of the approaches taken are below:

o   Interactive Mechanism: Users can perform analysis like custom linear regressions on a dataset and get differentially private results.

o   Non-Interactive Mechanism: Providing data that is differentially private, such as synthetic data, which can be used for performing analysis.

o   Curator Based: Assigning a database administrator to provide datasets that are differentially private.

o   Local Model: Consider the example of a survey conducted in a differentially private manner. In this method, users do not provide their personal information to a trusted third-party, but instead provide responses to questions involving their own personal information in a differentially private manner. The individual differentially private answers are not useful by themselves, but an aggregation of these responses can be leveraged to perform meaningful statistical analysis.

 

We at iEDPS (Enterprise Data Privacy Suite from Infosys), are building an interactive mechanism for differential privacy. We will be leveraging differentially private algorithms to perform computations and enable users to query data without the risk of leaking personal information.

 

Food for Thought

It has been proven that risk of privacy loss increases as the frequency of analysis of data increases. The privacy loss parameter can be implemented as a "privacy budget" to be consumed by multiple analyses of individuals' data. If there is only a single analysis to be performed on a given dataset, we can allocate the entire privacy budget for this single analysis. However, multiple analyses will be run on a given dataset, and hence the cumulative usage of the privacy budget by all the analyses must be computed.

Differential Privacy provides robust data protection which is not usually possible with traditional data privacy techniques. However, mechanisms must be developed that ensures differential privacy meets the legal requirements and are able to identify suitable privacy loss parameter Ɛ based on such regulations. There should be a synergy between data providers and legal entities while choosing differential privacy tools for protecting privacy to ensure the data privacy implementations adhere to the mandated data privacy regulations.

 

References

·         https://www.winton.com/research/using-differential-privacy-to-protect-personal-data

·         https://privacytools.seas.harvard.edu/differential-privacy

·         http://www.jetlaw.org/journal-archives/volume-21/volume-21-issue-1/differential-privacy-a-primer-for-a-non-technical-audience/



what exactly is quantum key management

 

QUANTUM KEY MANAGEMENT

All of us are familiarized with the word encryption. Encryption is the technique of encoding the message to a different pattern using some key. Usually, the key is generated by some complex mathematical combination of numbers. Greater the complexity of the method of key generation, lesser the probability of tapping the channel and getting the decrypted message. In the earlier times, it used to take years to do the eavesdropping and key cracking.

In the present scenario, we see the world evolving tremendously in terms of science and technology. This is the era of quantum computing. As the computers are subjected to this revolutionary change, they become much faster and they can break a highly complex key within days or may be hours. So there is only one way left in order to fight with this, and the motto is "Fight quantum with quantum".

In the method of Cryptography, we need three factors.

  1. Encryption key

  2. Sharing the key between the source and target i.e. Key Exchange

  3. A very strong encryption algorithm

Consider the widely used Encryption method called RSA. It was invented in 1977 and thought to be broken with an effort of 40 quadrillion years. But in 1994, the code was broken. As the computers evolve, this breaking of code has become much easier. Today we are used to 2048-bit or even 4096-bit computing.

Before 10 to 15 years ago when quantum computing was introduced, it only took a significantly less amount of time to crack a code which has high mathematical complexity. Hence, quantum computing will make our castle of security collapse into house of cards. Researches have been growing significantly in past few years in the direction of using the behavior of quantum particles and their effects to make the encryption stronger. The interesting fact is that there has been a breakthrough in this area.

The random numbers that we use in cryptographic methods are not completely random. We generate the keys from a set of random sequence which is called pseudo random numbers. The numbers when generated using mathematical method, will contain a more or less a subtle amount of pattern in them. The less entropy (which is a measure of randomness) it contains; easier it is to break.

Some casinos were attacked recently. There, the output of the sequence machine was recorded continuously for a significant time period and analyzed. Engineers reverse engineered the pseudo random numbers and they were able to predict the spin of the wheels. This resulted in enormous financial gains. This is one example from many. When it comes to business, administrations etc., the severity of the issue becomes gross.

Researchers were actually looking to generate numbers which are truly random. Mostly, for the numbers generated, they found that, they were not fast enough or they were not truly random or they aren't repeatable.

According to Heisenberg's principle, it's impossible to measure the exact measures of a quantum particle and hence the Quantum world is truly random. So the researchers concentrated on taking the advantages of this intrinsic randomness. They developed machines containing optical fibers which will generate truly random numbers with a larger size of machine in the earlier times. Later as the optical fiber technology evolved, the size of the random generator reduced to size of your palm. Here onwards developed the security in terms of quantum particles like applying quantum effects on laser to carry the codes.

Now we shall look into what exactly is the scenario of quantum key management:

Take the scenario as Alice sends a message and Bob receives it. Eve is the eavesdropper who listens to the message and tries to get the key. Alice sends photons to Bob in 4 different polarizations or vibrations. The 4 types of polarizations will be like horizontal, vertical, left or right polarizations. In these polarizations, two of them can be selected to represent a 1 and two of them for a zero. Bob then measures the direction in which they are polarized. By using 2 differently polarized filters, Bob measures each photon one at a time back and forth. He gets the bits when polarizations match. He gets a 1 for a matched filter he used and 0 for a mismatch. Later Alice call Bob to tell the information about the filter. Instead of telling the key, Alice would say right or wrong for the filters Bob used. After they got this public check of the order of the detectors used, they discard the incorrect messages obtained by Bob. Thus Bob will get the secret key of encryption. If the Eve listens to the channel, filters he should use for detection which results in changing the photons sent by Alice which can be detected then by the public check between Alice and Bob. For Eve it's nearly impossible to get the correct filter order and hence security here is improvised against eavesdropping.

The studies are taking place in order to facilitate this quantum based or photons based key exchange. In recent times, the length of the optic fibers used has grown to 150 km from a few kilometers that was used in the earlier times.  Researches are also growing faster in the area for exploring different aspects of the quantum key management which all contributes to a burgeoning security.

We as a part iEDPS Product Team are working on a pilot to leverage Quantum Key Distribution for a Next Gen Data Protection for your organization.

 

Author,

Aswathi Valsan 

IEDPS TEAM, ICETS

https://www.infosys.com/services/incubating-emerging-technologies/offerings/enterprise-data-privacy-suite.html 


July 28, 2020

Futuristic usage of Chatbots in Hotel, Insurance industries

What a start to the Decade, the beginning was rather normal and hardly there was any clue on how it's going to be, with only China under the clutches of COVID - 19 but as the days went by, everything went topsy-turvy. The whole World has been gasping for breath and Nature has once again stamped its supremacy driving each and everyone to have their own individual and collective realizations on how to go about living.

Somehow very few Countries came out as obedient Students of Life and they are enjoying the benefits of it, sans lock down and freedom much to the envy of rest of the World's population. But rest of the World is still grappling with the question of how to overcome it and watching the tracker daily.

The current Pandemic has taught us many things right from going back to old Culture of washing hands/legs after coming back to Home, covering face, reaching out to others to provide comfort, etc. and also forcing innovations to happen at all levels. Who would have thought that railway coaches would be converted into temporary wards, migrants forced to walk thousands of kilometres, Police using drones to locate people who are brewing local alcohol in the outskirts, neither rich nor poor spared and equally susceptible!!!

Our lives have changed for sure and in that context, would like to share my views on how Chatbots can be used in Hotel and Insurance Industries in the future.


Hotel Industry:

  • Through IoT (internet of Things), Customer to be guided for Car parking.
  • An on-premise Chatbot mobile app which can be downloaded and installed while in the premise of Restaurant alone and will automatically get uninstalled as soon as the respective bill is settled.
  • This app to guide guests to vacant appropriate table based on number of people.
  • Both text and voice input command and output to be supported.
  • Customers can browse through menu in the app and order online. Order confirmation and periodic updates on the order to be shown in the app.
  • Provision to chat in Customer's native language would be supported.
  • Enquire on Ingredients, energy value can be done.
  • Through IoT, integration with Health tracker app for food consumed.
  • For Physically challenged people, touch based ordering and voice control to come to rescue.
  • Drones to keep surveillance of the premise to ensure that cleanliness is maintained and sanitizing is performed as and when any spillage happens or a table is vacated.
  • Even more sophisticated Restaurants can deploy Humanoid robots with Chatbot integrated which can read out menu, take order and deliver it to table. It also can socialize with Customer by talking about weather, general affairs based on the configuration done in Chatbot (Small Talk feature).


Insurance Industry:

Integrated mobile app that consists of Location Based Services (LBS) and Chatbot. LBS to send precise location (accident area) details to Police Station, Insurer, Family members and to send precise pictures of accident and vehicle damaged area. Chatbot to answer queries on steps to be taken post-accident.


Benefits for Customers:

  • Chatbot to alert nearest Police station on the accident post taking consent from Customer.
  • Alerts Insurer on the accident post taking consent from Customer.
  • Nearest Hospital / Clinic suggestion based on with (sustained minor injuries) or without (emergency situation) answering few queries.
  • Suggestion on possible pictures to be taken on the vehicle damaged.
  • To check whether Vehicle Insurance is still valid.
  • How much Claim can be availed?
  • Alerts Family members based on Emergency Contact setting.
  • Suggestions on nearest Vehicle Service Center.


Benefits for Insurer:

  • Check Claim History
  • Gauge potential damage to Vehicle and arrive at Claim issuance amount
  • Processing of images digitally

 

 

 

 

 

How would I like my Banking Chatbot to answer!!!

Having worked in Banking domain for more than a Decade and moved to Nia Chatbot's Product Marketing team last year and learning how an Artificial Intelligence (AI) powered Chatbot works made me to think on how I would really wish to have a Chatbot address my Banking queries.

Take any Bank's website and there would be host of Products, details listed out in so elaborate fashion however, it takes time to move from one page to another and it could be a mentally draining exercise too. Details are well laid out however, in the fast-paced life, hardly I would like to browse a single page fully, let alone have the patience to go through different pages to figure out which section of the page that I should browse. At any give point of time, I would like to interact with the website differently, sometimes for Funds transfer, on another day to check the account's balance and on a different day to look for latest Credit card offers.

Banks used to invest heavily till few years back to make their websites appealing to Customers and recent trend is to have a Chatbot integrated to answer in a basic and standard way. Once a Customer starts interacting with such Chatbots, it can be easily figured out that they are equipped to answer only standard set of queries and most likely hit the button "Connect to Live Agent" or any other similar one to have a conversation with a Human agent. I often, hit this button directly knowing very well that it's easier to get my query answered by a Human Agent by waiting for few minutes instead of having a chat with a machine in a meaningless fashion.

Why does this happen and what would I like to see as improvements in the couple of Chatbots that I have seen?

  •   First and foremost, it's hardly visually appealing and thereby make it a dull interaction for me (the Customer).
  •   They seem to be configured as rule-based ones and hardly have any learning capability so that it can pick up the cues from my  previous interactions and answer me in the very first instant.
  •   Personalization is missing and always have the standard ways of interactions (same set of options shown at all times).

 

 Let's see how Nia Chatbot can address these challenges:

  • Through UI Customization, 'Avatars' can be created and an option to choose the one I like for each session can be provided. Infosys ETA team has come up with Lex bot, built on Nia Chatbot platform, having "Avatars" https://lex.infosysapps.com/page/home.
  • Know the reason for logging into the website with the past history (depending on the date) whether it's to clear my Credit card due/ Insurance payment, etc. and accordingly show me the relevant options.
  • Remind me by looking at various payment options that I have been frequently using in the last 6 months and inform which one would be relevant this time to avail a discount while making a bill payment.
  • Auto suggestion to schedule fund transfer to a different account based on the date of login. 
  • Approve Payment requests initiated by me in a different solution via a voice command.
  • Suggest to look for a new Investment option after looking at currently linked ones and also highlight how it could be beneficial to me considering my spending pattern, investment portfolio.
I can keep adding to the list and these are truly wish list for me compared to the current features that I have seen. I am hopeful that soon I would be able to see them as a reality considering how fast AI based Chatbots are getting developed and taking the World by storm.


As concluding remarks, would like to touch upon on few capabilities of Nia Chatbot.  

  • Help Banks to automate help desk operations, answer Product queries, raise Service Requests, hand over the chat to a Live Agent in case it's unable to answer.
  • FAQ (Frequently Asked Questions) based Chatbot can be deployed to answer chats revolving around set of Queries and Answers like Product's salient features, how to apply for a Product, etc.
  • Nia Chatbot can be integrated with any front-end channel namely website, mobile, Social messaging platforms like Whatsapp, Twitter, Facebook, etc. and can integrate with any back-end Enterprise Applications via Representational State Transfer Application Program Interfaces (REST APIs).
  • Context is maintained and thereby Chatbot will be able to answer queries across various web pages of Bank that Customer has logged into.
  • Multi-lingual Chatbots can be configured thereby Customers can chat via regional languages too instead of standard English language.
  • It's built on Open Source Technologies and hence latest innovations are leveraged. Instead of static responses, Chatbot will be able to learn based on previous interactions and will be able to answer properly to the same query next time.
  • Visually appealing images can be added as responses and even personalized videos also can be rendered dynamically as output. There are multiple widgets available for response configuration.
  • Typos also can be corrected based on the configuration done.
  • Bot Lifecycle management is possible through which Chatbots can be managed right from designing till deployment and even updates in Production environment.