The commoditization of technology has reached its pinnacle with the advent of the recent paradigm of Cloud Computing. Infosys Cloud Computing blog is a platform to exchange thoughts, ideas and opinions with Infosys experts on Cloud Computing

« RDS - Scaling and Load Balancing | Main | AWS Cloudformation: An underrated service with a vast potential »

S3- Managing Object Versions

S3 has been one of the most appreciated services in AWS environment, launched in 2006, it provides 99.999999999 % (eleven nines) of durability. As of now, it handles over a million requests per second and stores trillions of documents, images, backups and other data.

Versioning is one of the S3 feature which makes it even more useful. Once versioning is enabled, successive uploads or PUTs of a particular object creates distinct named and individually addressable versions of it. This is a great feature as it provides safety against any accidental deletion due to human or programmatic error. Therefore, if versioning is enabled, any version of object stored in S3 can be preserved, retrieved or restored.

However, this comes with an additional cost as each time a new version is uploaded, it adds up to S3 usage which is chargeable. This cost can multiply very quickly if the versions which are not in use are managed improperly. So how to suitably manage current as well as old versions?

This is easy, there are two options: -
1)  Use of S3 Lifecycle Rules
2)  S3 Versions-Manual Delete 

Use of S3 Lifecycle Rules

When versioning is enabled, a bucket will have multiple versions of same file i.e. current and non-current ones.
Lifecycle rules can be applied to ensure object versions are stored efficiently by defining what action should be taken for non-current versions. Lifecycle rules can define transition and expiration action.
Below example will create a lifecycle policy for the bucket which says that all non-current versions should be transitioned to Glacier after one day and should be permanently deleted after thirty days.


S3 Versions-Manual Delete
Deleting versions manually can be done simply from console. Because all the versions are visible/accessible from console so specific version of the object can be selected and deleted.


However, while using command line interface, a simple delete object command will not permanently delete the object named in delete command, instead S3 will insert a delete marker in the bucket. That delete marker will become the current version of that object with new Id and all subsequent GET object request will return that delete marker resulting a 404 error. 
So even though that object is not erased, it's not accessible and can be confused with deletion. However, the object with all versions along with a delete marker still exists in bucket and keeps on consuming the storage which results in additional charges.

So what is the delete marker? When delete command is executed for a versioned object, a delete marker get inserted in the bucket which is like a placeholder for that versioned object. Due to this delete marker, S3 behaves as if object is erased. Like any object, delete marker also has key name and Id, however it differs from an object as it does not have any data and that is the reason it returns 404 error. 

The storage size of a delete marker is equal to the size of its key name which adds one to four byte of bucket storage for each character in key name. It is not that huge; then why should we get concerned about it? This is because the size of objects it blocks or hides can be huge and pileup enormous bills.

Point to be noted here is that delete marker is also inserted in version suspended buckets, so if versioning is enabled and then suspended (because we know that versioning can't be disabled ever if once enabled) even then all simple delete commands will insert delete marker. 

Removing delete markers is tricky. If a simple delete request is executed to erase a delete marker without specifying its version Id, it won't get erased instead another delete marker gets inserted with a new unique version Id. All subsequent delete request will insert additional delete markers. It is possible to have several delete markers for same object in a bucket.

To permanently remove delete marker, simply include version Id in delete object version Id request.

Once this delete marker is removed, a simple GET request will now retrieve the current version (e.g. 20002) of the object. 

This solves the problem of unintended storage consumption. But how to deal with that object at first place so that we don't have to go through this complication? 
To get rid of an object permanently, we need to use specific command "DELETE Object versionId". This command will permanently delete that version.


Conclusion: S3 provides virtually unlimited storage in cloud and versioning makes it even more secure by protecting objects from accidental deletion. However, it comes with a cost and should be managed cautiously. Above is a rational explanation for a scenario where the user deleted S3 object but still struggled with its charges in AWS bill. 

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.