« Configuring Azure development storage for team development scenarios | Main | Changing partition key in Windows Azure Table Storage »

Performance of range queries on a Windows Azure Table Vs a Normal Database Table

Most of the times in a normal database a full table scan occurs when an index is either not used or there is no index on the table(s) being used by the statement (SQL or Oracle statements). Full table scans usually return data much slower than when an index is used. The larger the table, the slower that data is returned when a full table scan is performed.

In case of a Windows Azure Table, every row (entity) will have a Partition Key and a Row Key (defined below) which will be defined during creation of the row (entity)


Partition Key Property

Tables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the Partition Key property. The partition key forms the first part of an entity's primary key.

Row Key Property

The second part of the primary key is the row key, specified by the Row Key property. The row key is a unique identifier for an entity within a given partition. Together the Partition Key and Row Key uniquely identify every entity within a table.

The Azure Table Service would use the Partition Key to identify which partition servers to send the query to and would then use the Row Key as an index to identify the data. Provided your range query identifies only a subset of the data this would not result in an entire table scan.

EXAMPLE:

Let's say my table stores blog entries. The entries have reversed ticks generated from the time the blog entry was posted as the row key. As for the partition key, all the blogs from 2007 have the letter 'A' as the partition key, those from 2008 have letter 'B' and so on...

Now depending upon the query we get different cases:

Case1: (Partition Key = C and Row Key = SomeTimeStampValue)
This provides the entire index which enables it return the single row

Case2: (Partition Key = B and Row Key > SomeTimeStampValue and Row Key< SomeTimeStampValue)

Looks up partition key B and returns rows where row key ranges from the two time stamp values 

 

Case3: (Partition Key between B and D)
Looks up partition key B through D and returns all rows. So it does scan the partitions ranging from B
through D but not the entire table

Case4: (Partition Key between B and D & Row Key > SomeTimeStampValue and Row Key< SomeTimeStampValue)

This results in scanning all rows (entities) for partitions between B and D.

Case5 (Worst case Scenario): (Row Key > SomeTimeStampValue and Row Key < SomeTimeStampValue)
This results in scanning all partitions too since the partition key has not been used.

For all the above cases in a normal database, it involves an entire table scan.

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/4487

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.

Subscribe to this blog's feed

Follow us on

Blogger Profiles

Infosys on Twitter