Open Source has revolutionized IT sector as it harnesses the intelligence and efforts of the community to develop and maintain software faster. Our experts discuss the latest happenings and give their view points on how open source can be leveraged for your organization’s transformation.

Main | March 2018 »

October 30, 2017

Query-driven data modeling methodology for Apache Cassandra

This blog explains how to use query-driven data modeling in Apache Cassandra. NoSQL data modeling is a process that identifies entities as well as the relationships between them. It can be used to determine patterns when accessing data as well as the types of queries to be performed. In doing so, it reveals how data is organized and structured along with how database tables are designed and created. It is important to note that indexing the data can degrade the performance of queries. Hence, understanding indexing is essential in the data modeling process.

Data modeling in Cassandra focuses on the query-driven approach whereby specific queries are the key to organizing data. Let me first quickly explain these terms: Queries retrieve data from tables and schema defines how the data is arranged in the table. Thus, a query-driven database design facilitates faster reading and writing of data, i.e., the better the model design, the more rapid data is written and read.

Now, first, we must create a conceptual data model that will define all known entities, relationships, attribute types, keys, cardinality, and other constraints. This data model should be created in collaboration with business stakeholders and analysts. For example, a conceptual data model could be presented as ER-diagram.     

The next step is logical data modeling. Here, the conceptual data model is mapped to a logical data model based on queries that are defined in an application workflow. The logical data model corresponds to a keyspace where table schemas define columns as well as primary, partition and clustering keys. Thus, the query-driven approach provides a logical data model using data modeling principles, mapping rules, and mapping patterns.

Here are some rules for query predicates that ensure stability and efficiency:

    Only primary key columns should be used in the query predicate

    All partition key columns in the query predicate must have distinct values

    Clustering columns may be omitted in the query predicate

    All partition key(s) must be used in the predicate

Besides these query predicate rules, there are additional data modeling principles to map to logical data models. It is important to note that violating these principles and rules will affect the ability to support query requirements and may lead to loss of data and performance degradation.

Here are the fundamental principals of logical data modeling:

1.    Know your data, particularly entity and relationship type keys that are needed to be preserved and relied on to organize the data properly

2.    Know your queries such that all columns are preserved at the logical level

3.    Enable data nesting to merge multiple entities together based on a known criterion

4.    Minimize data duplication to ensure space and time efficiency

5.    Use equality search attributes to map to the prefix columns of the primary key

6.    Use inequality search attributes to map to the table clustering key column

7.    Use ordering attributes to map to clustering key columns with ascending or descending clustering order

8.    Use key attribute types to map to primary key columns and uniquely identify table rows

Finally, we must analyze and optimize this logical data model to create the physical data model. The above-mentioned modeling principles, mapping rules, and mapping patterns ensure correct and efficient logical schema. However, efficiency can still be impacted by database engine constraints or finite cluster resources such as typical table partition sizes and data duplication factors. There are some standard optimization techniques that can be used, including partition splitting, inverted indexes, data aggregation, and concurrent data access optimization. These methods can be used for optimizing the physical data model, although we will not be covering this topic in detail in this particular blog entry.

This is the way to go about enabling query-driven data modeling in Apache Cassandra.

October 27, 2017

The Rise of Open Source


"[Open source] gives customers control over the technologies they use instead of enabling the vendors to control their customers through restricting access to the code behind the technologies."

- Eric S Raymond, The Cathedral and the Bazaar


Open source adoption began long ago, in the 1990s, with the introduction of the Linux Kernel. Then, there were only about 100 developers who contributed code to Linux. Since then, the Linux community has exploded. Today, over 8000 developers and 800 companies contribute code to Linux. What has steered such rapid development and adoption of Open Source technologies ? I believe there are three key reasons for this. Open Source helps businesses:


  • Improve agility and accelerate business innovation
  • Gain greater flexibility and scalability
  • Realize higher cost savings


We are witnessing open source adoption across all the enterprise layers like experience, application, business, integration ,database and infrastructure layer. The adoption begins with the experience layer which is primarily driven by the need to quickly enhance customer experience. The adoption of open source JavaScript frameworks like Angular and React has made it easier and faster to develop responsive web designs. While increasing the customer experience, it has also helped lower maintenance needs. Open Source has also provided freedom of choice and helped move to a polyglot environment. It is no longer necessary to stick to a single technology vendor to realize the benefits of economies of scale. Easy access and reduced cost of usage has made it possible to leverage an ecosystem of technologies across the layers based on the use case and the application attributes.


Today, more than ever, open source adoption is critical for business success and many organizations are realizing how open source helps them stay ahead of the curve. For instance, a global retail and wholesale giant replaced their Siebel CRM with a new microservices-based application developed on an open source stack. The adoption of open source helped them accelerate their transformation towards an environment which is more agile and adaptive. While the architectural principles were decided and enforced centrally, this organization gave the individual LOBs the freedom to choose their technology stack as long as it is open source


Despite its advantages, open source also has its own challenges. Customers struggle to identify the right open source technology. Scaling adoption across the organization is difficult considering the dearth and high cost of talent. Finally, large transformational programs means managing a partner ecosystem to get the best of breed solution.


Infosys open source offerings helps clients address these challenges for seamless open source adoption.




More importantly, our expertise in delivering such transformation has been gained from vast experience across organizations with varied maturity levels. For example, while some of our clients want ad-hoc open source implementations to prove a concept or a technology, others want it only occasionally for selective lines of businesses. Then, there are those who consistently and strategically leverage open source across all lines of business to differentiate themselves and gain a clearer advantage.


While achieving 100% open source adoption may not always be feasible, we will see a confluence of hosted, proprietary and open source technologies. Open source will gain more traction and play a major part within enterprise architecture even as it co-exists with non-open source technologies. It is recommended to choose an enterprise-supported version of open source to ensure enterprise-grade security, reduce risk of failure, and guarantee response time.


We are already seeing the adoption of open source across industries including financial service organizations that were deemed the slowest to adopt open source. I believe that, in future, the demand for rapid digitization and cloud adoption will only accelerate the demand for and adoption of open source. Open source adoption is going to accelerate and organizations should assess how and where they can leverage it for competitive advantage.


Subscribe to this blog's feed

Follow us on

Blogger Profiles

Infosys on Twitter