Is big data just a buzzword?
Big data has been a popular buzzword in the banking industry for some time. Banks that are always on the forefront of technological innovation have long recognized the need for harnessing the information captured daily through hundreds and millions of customer transactions and interactions. As competition becomes intense and need for customer engagement becomes the bedrock for sustainability, banks are desperately looking for help from technology to extract maximum value from their core data assets.
Over the past decade, banks have closely observed the development and successful deployment of big data solutions by new-age enterprises like Google, Amazon, Facebook, and Linkedin, enabling them to provide highly personalized and immersive user experience. Banks have waited for this technology to mature and become commercially available to take it to the next frontier of innovation in the financial industry. So is big data now ready to meet expectations of the banking industry?
Can big data scale up to meet expectations from banks?
Let's look at key challenges faced by banks today.
1. More regulations mean banks need to store more data for a longer period of time. Banks have a problem with the archival and timely retrieval of this data that sometimes runs into terabytes. Big data provides a cost-efficient and scalable solution of storing these terabytes, or if needed even petabytes of data in Hadoop File Systems (HDFS), distributing the data across multiple commodity hardware. The Hadoop-based storage solution is horizontally scalable and many banks have already implemented this solution.
Industry news: Morgan Stanley, with assets worth US$300 billion, has started with a 15-node Hadoop cluster that the enterprise is planning to grow.
2. Another problem faced by most banks is the existence of data silos. Even though most banks have enterprise data warehouses (EDWs) they are expensive and don't allow the flexibility to make modifications easily. One of the fast emerging use of big data is the concept of the data lake or the logical data warehouse. The data lake acts as an enterprise repository to store data of any format, schema, and type. It is quite inexpensive and is massively scalable solution for enterprise data needs.
The data lake can support the following capabilities:
a) Capture and store high volume of raw data across the enterprise at a fairly low cost
b) Store variety of data types in the same repository
c) Provide the ability for schema definition on read enabling generic structure for data storage
With information being available in a single place, banks can leverage association and predictive techniques on this data to generate insights about customer behavior, churn, and identify cross-selling opportunities.
To overcome the technical complexity of retrieving information from data lake, Hadoop has introduced Pig and Hive. Hive provides an SQL-like interface to the data stored in HDFS while Pig provides a high-level platform for creating MapReduce programs to process data stored in HDFS.
Industry news: HSBC implemented a Hadoop-based data lake platform to support their ongoing and future regulatory needs, thus eliminating restrictions related to data availability.
3. The techniques described earlier process data in batches but in banking a lot of functionalities require high throughput of data. To solve this problem Apache developed Cassandra - a fully scalable distributed database system with high throughput. Many companies have benefitted from successful deployment of Apache Cassandra. The benefits include enterprises being able to identify fraudulent transactions or determine suitable offers for customer at real-time.
Industry news: Real-time offers through online channels needed a high throughput database. Bank of America supports this high volume and high throughput data through Cassandra.
4. Big data is associated with two important capabilities - storing high data volume and generating insights. Thus, it is not only important to store these petabytes of data but also derive key business intelligence at real-time.
Apache Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop using the MapReduce paradigm. Banks can use Mahout on a huge amount of customer information stored in HDFS to have a customer 360˚ view and provide need-based customer offers.
Apache Spark provides similar functionalities in real-time as it runs in-memory in clusters. Spark analyzes data at real-time to generate time-sensitive business intelligence; for e.g., identifying fraud based on outlier behavior pattern or providing location-based offers.
Industry news: Deutsche Bank has recently implemented Apache Spark to support its real-time data needs for fraud detection.
Can banks afford to ignore big data?
We are witnessing that big data platforms are maturing rapidly to meet the demands of the financial industry. Tools are becoming less complex, reducing learning curve and resulting in the availability of more skilled personnel.
As most of these tools become commercially available, this is an ideal time for banks to invest in big data and set up the right platforms. If not, they may have to play catch-up as other industries surge ahead with the knowledge and use of big data platforms.
Discover how Infosys can transform the way you do business>>