<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>ESP – Product Engineering – Platform and Software Products</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/" />
    <link rel="self" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software/atom.xml" />
   <id>tag:www.infosysblogs.com,2010:/engineering-software/1</id>
    <link rel="service.post" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1" title="ESP – Product Engineering – Platform and Software Products" />
    <updated>2010-02-18T05:45:31Z</updated>
    <subtitle>Infosys delivers concept-to-market software engineering services across the engineering value chain. Our blog will discuss the latest trends in software product engineering, outsourcing, technologies, and address business challenges. </subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.2ysb5-20051201</generator>
 
<entry>
    <title>True Integration with VSTS 2010</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/02/true_integration_with_vsts_201_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=30" title="True Integration with VSTS 2010" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.30</id>
    
    <published>2010-02-18T05:11:16Z</published>
    <updated>2010-02-18T05:45:31Z</updated>
    
    <summary><![CDATA[Microsoft (India) conducted the &quot;VSTS 2010 Launch&quot; session at the Infosys Bangalore campus this week to familiarize the software community with VSTS 2010 before the official launch (planned in April, 2010). The session included a keynote followed by an intensive...]]></summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
            <category term="General" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p>Microsoft (India) conducted the &quot;VSTS 2010 Launch&quot; session at the Infosys Bangalore campus this week to familiarize the software community with VSTS 2010 before the official launch (planned in April, 2010). The session included a keynote followed by an intensive hands-on demonstration of the various facilities that are on offer in the latest edition of Visual Studio Team System Suite. There were a number of features on demo, but what was of special importance was the support for the architect community in this release.</p>]]>
        <![CDATA[<p>The Visual Studio environment, in all its various avatars over the years, has always been known to be an IDE (Integrated Development Environment) - but as a tool the focus has been more on its use as a 'Development Environment' used primarily by the developer community. The term 'Integrated' seemed to be a misfit with regard to how Visual Studio was generally used. However, with the VSTS 2005 suite Microsoft had signaled its intentions of developing Visual Studio into a complete Application Lifecycle Management solution providing tools for everyone involved in the lifecycle - be it the project manager, the architect, the developer or the tester. The Team Foundation Server along with its assorted client side solutions helped achieve this to a large extent. Microsoft&rsquo;s new VSTS 2010 release builds on the same platform and provides a lot more.&nbsp; A lot of what VSTS 2010 promises has to do with aligning to current technology trends (APIs for multicore programming, cloud computing etc.) and business needs (cutting costs by introducing more avenues for automation). Though you may argue that this has been true for the various Visual Studio releases over the decade, with VSTS 2010 this seem to have been a foremost thought considering that it was a &lsquo;work under progress&rsquo; during the year of the 'Great IT Depression' - 2009.</p><p>There are a number of invaluable features built into VSTS 2010, but what is heartening is the importance that has been accorded to the architect community in this release. Microsoft has realized the need to support UML within VSTS acknowledging that UML is a hugely popular modeling language. As a result, one will no longer have to look at using other detached tools like Microsoft&rsquo;s Visio or tools from Rational (Rational Software Architect for example) to model architecture and class diagrams - though the support for DSL (Domain Specific Language) is still retained in VSTS 2010.Significantly, VSTS 2010 also provides facilities to generate source code from UML diagrams and vice versa - a feature that has been considered an important USP of other like tools. VSTS 2010 allows a &lsquo;work item&rsquo; or task (from the&nbsp;project management&nbsp;perspective) to be automatically generated from a use case diagram - for example, and assigned to a particular developer &ndash; an example of the integration of an &lsquo;architect&rsquo; responsibility with a &lsquo;project management&rsquo; responsibility.</p><p>With VSTS 2010, a lot of importance has been accorded to support maintenance/reverse engineering. Other than the ability to generate UML diagrams (sequence diagrams etc.) from code, VSTS2010 also provides a facility to generate dependency graphs based on assemblies, namespaces, classes etc. Dependency graphs help understand the dependencies between assemblies. In addition, it goes beyond to show details of the exact dependency in terms of the function call being made, the class object being instantiated across dependencies etc. This is of great importance today with a number of customers wanting to re-engineer legacy applications to modern technologies but having absolutely no relevant documentation in place to guide in the initial understanding. A recent project involved the need to re-engineer a native VB/C++ based application to .NET with nothing but the source code made available. In a world where time-to-market is all important, such projects do not involve a complete migration of the application. Rather, complex algorithmic engines are maintained &lsquo;as-is&rsquo; in the native code base with the remaining parts of the application migrated to latest technologies to provide a more modern perspective to the end user. (Such native implementations are typically invoked using facilities available in the platform -JNI for Java, interop for .NET) The availability of dependency graphs helps understand the various components and the interfaces to those components without spending time on code reading.</p><p>Another neat feature provides the facility to allow the architect to monitor if the source code being developed is on the lines of the architectural decisions that had been written down.&nbsp; For example, in a 3-tiered architecture, the normal rule is for the presentation layer to access the data layer only through the business layer APIs, but during development it is quite normal that such rules are ignored under pressure or as a result of bug fixes resulting in code which works but defies architectural decisions. Such deviations which break architectural rules will result in compile time violations in the VSTS 2010 Error List, thus allowing corrections early in the development life cycles without the reviewer having to point this out. According to Somasegar in his blog <a href="http://blogs.msdn.com/somasegar/archive/2009/08/29/architecture-tools-in-vsts-2010.aspx">here</a>&nbsp;- <em>&quot;The Layer designer enables you to define the logical layers and valid communication paths between layers of your project.&nbsp; Once you have associated assemblies, namespaces, and classes with layers in the Layer diagram, you can validate existing or new code against the layering constraints.&quot;</em></p><p>While wholesome support for the Architect community is a novel addition, there are a number of features supporting configuration management and the build process. In addition, Microsoft has deeply focused on testing by providing a suite of facilities to bridge gaps between the developer and the tester. A bug <em>&quot;that manifests on the testers machine but is not reproducible on the developer's machine&quot;</em> is a common incident that repeats in projects across companies. Using VSTS 2010, a tester can now save the environment in which the bug had manifested as a virtual machine instance (using virtualization facilities provided on the OS platform) for the developer to reproduce and investigate better. </p><p>To a critic, the sheer number of features available might seem be overkill besides of course, the enormous demands on system RAM requirements when installing and using the complete suite. Microsoft has been making videos of various features available online, in addition to detailed documentation already being made available. It is necessary to install only those components of VSTS2010 that one would really be using to overcome system RAM demands. </p><p>VSTS 2010 is truly meant to be an &quot;Integrated Development Environment&rdquo;. So, go ahead &ndash; irrespective of what role you play in the project team - and get your hands dirty using VSTS 2010!</p>]]>
    </content>
</entry>
<entry>
    <title>Globalization and the Japanese Software Industry</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/02/globalization_and_the_japanese_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=28" title="Globalization and the Japanese Software Industry" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.28</id>
    
    <published>2010-02-03T06:27:51Z</published>
    <updated>2010-02-03T08:29:09Z</updated>
    
    <summary>Japanese products in the manufacturing sector have most often than not withstood intense competition from competitors all around the world. Except for the recent glitch, Japanese cars have been the most sought after the world over for various engineering attributes...</summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[Japanese products in the manufacturing sector have most often than not withstood intense competition from competitors all around the world. Except for the recent <a href="http://edition.cnn.com/2010/WORLD/asiapcf/01/29/japan.toyota.apology/index.html" target="_blank">glitch</a>, Japanese cars have been the most sought after the world over for various engineering attributes like performance, reliability, design etc. But, unfortunately, the same cannot be said about the Japanese software industry &ndash; though there are efforts currently underway to change the structure of this industry.]]>
        <![CDATA[<p>Tatsuo Tanaka in his <a href="http://www.glocom.ac.jp/e/publications/Tanaka_1.pdf" target="_blank">paper</a> &ldquo;The Competitiveness of Japan&rsquo;s Software Industry&rdquo; attempts to clearly bring out reasons why Japan has not been able to lead the pack in the manner it does in the manufacturing industry. Tanaka&rsquo;s research indicates that Japan excels in producing custom and embedded software, but lags when dealing with packaged business and online software.<br />Japan&rsquo;s area of strength has always been hardware and considering that embedded software is incorporated in hardware is an obvious reason for its quality to match the hardware. Japan&rsquo;s focus on custom software can be attributed to the reason that most Japanese company&rsquo;s handover software development to subsidiaries to develop software for in-house use, and that ends up being customized with specific extensions.&nbsp; Besides, the tendency to form long term business partnerships with their customers result in software being developed customized to that particular customer. If there were indeed short term relationships, companies would try to generalize on the product that they developed so that the initial investment is not wasted and moulded for newer and newer customers, besides inculcating the use of standardized third-party software in the products that they develop. As the global market is more inclined towards business/general purpose software, companies in Japan which concentrate more on custom software are not in a position to compete in the software export market. </p><p>Software products offered by organizations in Japan are developed in bits and pieces (modules) by various subsidiaries who may in turn request the services of other first tier contractors who in turn may forward that to other contractors. Despite intense quality focus (as is the norm in Japan), the end product would contain software developed using varied technologies, processes and standards with pieces involving very less generalization. <br />The need is to develop a process model which is going to help make Japanese products viable globally &ndash; the technical aspect of which involves the strategies for internationalization and subsequent localization. To quote from a communication with an Infosys employee who has spent a major part of his career interfacing with Japanese customers &ndash; &ldquo;The American and European markets have products which are widely used. The product companies in Japan too have their own products which are in the same space. However, they lack competiveness in terms of features. The problem is that most software products are mangled to meet Japan specific requirements like specific encoding types (SJIS, EUC etc), specific hardware platforms, specific cultural expectations etc. What is required is a methodology which can be used to guide these Japan specific products to compete globally with established products and at the same time maintain their very Japan characteristics- and this includes aspects like product positioning, differentiation etc in addition to internationalization and localization&rdquo;</p><p>Considering the effort and investments that have already gone into making some of the products, and the fact that these products have done exceedingly well in the domestic market, Japanese customers typically would expect the effort to &lsquo;globalize&rsquo; their software to be minute and there lies the challenge. Though such projects are often driven by a set of initial requirements, it is important to look beyond just what has been specified. The base requirement of &quot;internationalization&quot; would mean a lot of proactive contribution based on experience of how products work across continents, countries and cultures and there is a possibility that the current product would need a &quot;re-architecting&quot; to achieve this flexibility. With years of experience in developing customized software for the domestic market, it is important that a good case study is prepared in a way that could convince the customer of the need to adapt to the change. One related challenge is to get customers to adapt to new technology. In a product that we were expected to globalize a couple of years ago, it required a lot of effort to convince the customer of the need to use the latest technologies available with .NET. The customer insisted on using C++/MFC and Visual Basic simply because most of the company's resources worked on these technologies in supporting their other legacy customized products. Their C++ based products having been highly successful in the domestic market, it required a lot of convincing to explain the advantages of using .NET technologies in the context of this product and the global market.</p><p>There seems to be a wind of change blowing across the Japanese software industry and a realization that custom products for the domestic market will not provide a wide market share in the face of alternatives available with software from companies out of Japan, whose products have modern features. Such companies are able to&nbsp;access a wider market across geographies because I18N is an inherent characteristic of their software&nbsp;thus allowing them to&nbsp;build on&nbsp;their initial investment. In the face of the changing world it has become inevitable that the Japanese software industry is now retrospecting on how it has to position itself in the global market. We have often interacted with Japanese Fortune 500 companies who want us to contribute to help place their software in the global market and such requests are increasing ever year.&nbsp;&nbsp; <br /></p>]]>
    </content>
</entry>
<entry>
    <title>Internationalization and Performance considerations</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/01/internationalization_and_perfo.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=27" title="Internationalization and Performance considerations" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.27</id>
    
    <published>2010-01-28T06:54:44Z</published>
    <updated>2010-01-28T07:19:01Z</updated>
    
    <summary><![CDATA[Almost always, during the design discussion of any Internationalization project, one of the questions asked by the client is, &ldquo;So, will Internationalization have any impact on the performance of the application?&rdquo;. No matter what you think, there is no denying...]]></summary>
    <author>
        <name>Aviraj Singh</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p align="justify">Almost always, during the design discussion of any Internationalization project, one of the questions asked by the client is, &ldquo;<em>So, will Internationalization have any impact on the performance of the application</em>?&rdquo;. No matter what you think, there is no denying the fact that Internationalization does have a performance impact on the application, whether it is big or small. There may be situations where the business benefits of Internationalization will outweigh the performance criteria and in such situations it makes sense to go ahead with Internationalization even at the cost of some amount of performance degradation. However a good design can help you in avoiding severe performance hits.</p>]]>
        <![CDATA[<p align="justify">In order to come up with a good design to minimize the performance impact, it is important to first understand the areas which contribute most towards performance degradation. Some of the areas are listed below,</p><ul><li><div align="justify"><strong>Character set conversions:</strong></div></li></ul><blockquote><p align="justify">During Internationalization, the application is designed to support a particular encoding which may be <a title="UTF-8" href="http://en.wikipedia.org/wiki/UTF-8" target="_blank">UTF-8</a>, <a title="UTF-16" href="http://en.wikipedia.org/wiki/UTF-16/UCS-2" target="_blank">UTF-16</a> or any other encoding typically used. However this does not guarantee that the incoming data will be encoded using the same character set. In order to process the incoming data, the application has to first convert the data to support the same encoding as the application; else the string processing within the application will go haywire. In case of a networking application where huge amounts of data might reach the application from various kinds of devices, encoding conversions at the applications&rsquo; interfaces will definitely degrade performance. This will also happen if the application is expected to write its output to files having a different encoding scheme or pass data to third party applications which support a different encoding. In general, character set conversions- whether done at the application&rsquo;s interfaces or within the application- contribute towards performance degradation.</p></blockquote><blockquote><p align="justify">Choosing an appropriate encoding for the application can minimize the impact due to character set conversions. If majority of the incoming data is in a particular encoding, it makes more sense for the application to support the same encoding in order to minimize the character set conversions at the interfaces. E.g. if the incoming data is UTF-8 or <a title="US-ASCII" href="http://en.wikipedia.org/wiki/ASCII" target="_blank">US-ASCII</a> encoded, the application should internally support UTF-8 for better performance. If majority of the input data is in a native encoding like <a title="Shift JIS" href="http://en.wikipedia.org/wiki/Shift_JIS" target="_blank">Shift-JIS</a> or <a title="EUC" href="http://en.wikipedia.org/wiki/Extended_Unix_Code" target="_blank">EUC</a>, it would probably be better to support native encoding within the application, even though supporting Unicode might be the ideal choice. The choice of encoding also depends on many other factors, so a tradeoff is often required.</p></blockquote><ul><li><div align="justify"><strong>Memory requirements:</strong></div></li></ul><blockquote><p align="justify">The memory requirements of an application generally change after it is internationalized. This impacts both secondary and primary memory. E.g. keeping all the message strings in separate properties files increases space usage on the hard disk as well as increases the access time for the messages. Also since all encodings have different memory requirements, the application may use more RAM considering the data structures will need more memory to store the internationalized data. E.g. for English and Western European languages, choosing UTF-16 over UTF-8 generally doubles the memory requirement while choosing UTF-32 uses 4 times more memory. While UTF-16 and <a title="UTF-32" href="http://en.wikipedia.org/wiki/UTF-32/UCS-4" target="_blank">UTF-32</a> make string processing easier to some extent, it is also important to consider the performance hit due to the added memory requirements. The tradeoff is generally between memory and ease of processing.</p></blockquote><blockquote><p align="justify">Choice of an appropriate encoding depends on the locales which are to be supported. It is observed that UTF-8 generally takes 50% less space than UTF-16 for English or Western European languages, but it might take 50% more space than UTF-16 for some Asian scripts like Chinese. So if the majority of the data is in a Western European language, choosing UTF-8 would be a better option. Moreover since UTF-8 is backward compatible with ASCII, it is easier to make internationalization changes. Similarly for far eastern locales, choosing UTF-16 would give a better performance since it will use almost 50% less space than UTF-8. By choosing the appropriate encoding, the memory requirements can be optimized which leads to better performance.</p></blockquote><ul><li><div align="justify"><strong>Message catalogs:</strong> </div></li></ul><blockquote><p align="justify">Typically during the process of Internationalization, all user visible strings are moved into a message catalog or resource file. The application has to retrieve the required strings from the message catalog using the corresponding string IDs. This loading of strings can happen during application start, which increases the loading time of the application or it might happen when the application is running, which decreases the response time of the application. In an application having a huge number of strings, this can contribute to performance degradation.</p></blockquote><blockquote><p align="justify">While creating message catalogs is an essential part of Internationalization, the application can be designed in such a way so as to minimize the performance impact. Instead of loading the entire message catalog when the application starts, the application can be designed to load only the essential strings at load time and load the rest as and when the need arises. Along with this a message caching mechanism can be implemented to enable faster access to frequently used message strings.</p></blockquote><ul><li><div align="justify"><strong>Sorting multilingual strings:</strong></div></li></ul><blockquote><p align="justify">Sorting of strings in an application is a very common feature and does have a performance impact in a Unicode environment due to the difference between Unicode sorting rules and Non-Unicode sorting rules.</p></blockquote><blockquote><p align="justify">Though this impact is not very significant in most cases, the use of proper Collations and Collation keys can improve the performance of the application. This can be done at the application level (as in Java) or even at database level. The collation you choose can significantly impact the performance of queries in the database. Collation also impacts substring matching in queries. A collation can be chosen for the quickest possible performance or for the most accurate results. Both have their pros and cons. If you want accuracy, you can choose to go with the Unicode Collation Algorithm, but it will have some performance overhead.</p></blockquote><p align="justify">In general reducing the amount of encoding conversions and string formatting can help in minimizing the performance impact due to Internationalization. The choice of an appropriate encoding also plays a very important role. In the end it is always a tradeoff between the performance and ease of processing. What are your views on this subject and how does your design team deal with it? It would be good to share some best practices for handling these issues.</p>]]>
    </content>
</entry>
<entry>
    <title>Handling Data in Enterprise Mashups</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/01/handling_data_in_enterprise_ma_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=26" title="Handling Data in Enterprise Mashups" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.26</id>
    
    <published>2010-01-27T14:12:23Z</published>
    <updated>2010-01-27T14:54:47Z</updated>
    
    <summary><![CDATA[Mashups are always ever-green, hence gets the attention from all the stakeholders, be it a creator of&nbsp;the mashup or the user of&nbsp; the mashup. Thanks to Google Maps which has taken the popularity to next level. A Typical mashup application...]]></summary>
    <author>
        <name>Jayanti Vemulapati</name>
        
    </author>
            <category term="Web 2.0" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<span style="color: #333333">Mashups are always ever-green, hence gets the attention from all the stakeholders, be it a creator of&nbsp;the mashup or the user of&nbsp; the mashup. Thanks to Google Maps which has taken the popularity to next level. A Typical mashup application comprises of a web application that combines data or functionality from two or more external sources to create a new service. The term <strong>Mashup</strong> implies easy, fast integration, frequently using open APIs and data sources. An example of a mashup is the use of cartographic data to add location information to real estate data, thereby creating a new and distinct web API that was not originally provided by either source. These mashups have also got its foot into enterprise business and the termed coined is &ldquo;Enterprise Mashups&rdquo;. Here in addition to just data the process also comes into picture. If the enterprise is SOA enabled then we can directly use the BPM engine for process orchestration. Enterprise Mash up consists of: <p>&nbsp;</p></span><ul style="margin-top: 0in"><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l0 level1 lfo1; tab-stops: list .5in; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">&nbsp;Web services <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l0 level1 lfo1; tab-stops: list .5in; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">&nbsp;RSS Feeds <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l0 level1 lfo1; tab-stops: list .5in; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">&nbsp;Platform services in a cloud&nbsp; <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l0 level1 lfo1; tab-stops: list .5in; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">&nbsp;Data <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l0 level1 lfo1; tab-stops: list .5in; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">&nbsp;Client Application&nbsp; <p>&nbsp;</p></span></li></ul><span style="font-size: 10pt; color: #333333"><p>&nbsp;</p></span><span style="color: #333333">In this blog, I will quickly touch upon on Data part of the mash-ups. Data in Enterprise Mashups can be in the form of: <p>&nbsp;</p></span><ul style="margin-top: 0in"><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l1 level1 lfo2; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">XML data residing in RSS feeds or in webservices. <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l1 level1 lfo2; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">DB data <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l1 level1 lfo2; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">Unstructured data <p>&nbsp;</p></span></li><li class="MsoNormal" style="margin: 3pt 0in 0pt; color: #333333; mso-list: l1 level1 lfo2; mso-line-height-alt: 7.0pt"><span style="font-size: 10pt">JSON <p>&nbsp;</p></span></li></ul><span style="color: #333333">In Mashups the processing of data is a dynamic activity hence the time taken to process the data may increase the overall execution of the mashup application. To tackle this problem distributed computing can be applied on different kinds of data as mentioned above. <p>&nbsp;</p></span><span style="color: #333333">For XML and JSON data, the parallel parsers can be used to create the Mash up. This could be multithreaded or use Multicore architecture of Intel chip at hardware level <a href="http://www.intel.com/cd/software/products/asmo-na/eng/406212.htm">http://www.intel.com/cd/software/products/asmo-na/eng/406212.htm</a>. On other hand we can use hadoop&rsquo;s HDFS and MapReduce for un-structured data. <br />Hadoop is a framework based on java that supports distributed computing scale very well for data intensive applications. Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes <a href="http://wiki.apache.org/hadoop/">http://wiki.apache.org/hadoop/.</a> One good example of an enterprise mashup is &ldquo;CRM-gadget&rdquo; <a href="http://www.programmableweb.com/tag/enterprise">http://www.programmableweb.com/tag/enterprise</a> , which searches new account or validate accounts on oracle on demand over Google local search.&nbsp; This mashup can tap the potential of Hadoop HDFS and Mapreduce and reduce the time to search the accounts.&nbsp;</span><span style="font-size: 5.5pt; color: #333333"> <p>&nbsp;</p></span><span style="font-size: 11pt; color: #333333">&nbsp;</span><span style="color: #333333">To conclude, we need to build POCs&nbsp;and see the dynamic&nbsp;dissection/split of data&nbsp;on parallel/distributed nodes&nbsp;to achieve almost linear speed-up. This will in-turn reduce the total time of executing an Enterprise Mashup application.</span><span style="color: #333333; mso-ansi-language: EN"> </span><span style="color: #333333"><p>&nbsp;</p></span><p>&nbsp;</p>]]>
        
    </content>
</entry>
<entry>
    <title>Google File System</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/01/google_file_system_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=25" title="Google File System" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.25</id>
    
    <published>2010-01-24T10:29:49Z</published>
    <updated>2010-01-24T12:07:58Z</updated>
    
    <summary>Describes Google File System.</summary>
    <author>
        <name>Manjunath Ballur</name>
        
    </author>
            <category term="General" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p><span style="font-family: Arial; font-size: 9pt">The Google File System (GFS) is a scalable distributed file system designed and developed by Google for distributed data intensive applications. GFS was born out of the need to meet the rapidly growing data processing needs of Google. The design of the GFS shared many of the same goals (e.g. concurrency, scalability, availability and reliability) as previous distributed file systems, but differed from earlier file systems to meet the demands of application workloads and technological environment at Google. Almost a decade later, most of Google&rsquo;s applications rely on GFS to store and process data. Although Google has not published the GFS code, the design of GFS is discussed in detail, in a paper (titled &ldquo;The Google File System&rdquo;) published by Google engineers. To explore more about the design of GFS, one needs to read the original paper present at <a href="http://labs.google.com/papers/gfs.html">http://labs.google.com/papers/gfs.html</a>. </span></p><p><span style="font-family: Arial; font-size: 9pt" /></p>]]>
        <![CDATA[<span>During early days, engineers at Google felt that the existing distributed file systems could not satisfy the demands of their applications. Hence they decided to design a new file system. Several key assumptions guided the architecture of GFS. Some of the assumptions were:<br /></span><ol><li class="MsoNormal"><span>Google uses thousands of storage machines built from cheap commodity hardware (typical Linux machines) and any of these machines could fail and never recover from failures. Hence the file system had to incorporate monitoring, error detection and recovery mechanisms. </span></li><li class="MsoNormal"><span>Google&rsquo;s Web applications generate and consume files of sizes varying from few hundred megabytes to several terabytes. Files with small size were almost non-existent. For e.g. web crawlers employed by Google&rsquo;s Search Engine, continuously scan internet and store information related to millions of web pages. Hence they needed a file system which could handle huge blocks of data. </span></li><li class="MsoNormal"><span>Most of operations on the files involved either large streaming reads (very few random reads) or large sequential appends to the end of file. Random writes within the file were almost non-existent. In large streaming reads, clients typically read 1MB or more of data. Overall they needed a file system optimized for reading and writing huge chunks of data in streaming mode.</span></li><li class="MsoNormal"><span>The files were often used as producer-consumer queues, with multiple producers writing into the same file. Hence the file system had to provide APIs to support concurrent appends to files, with minimum synchronization overhead.&nbsp;<span>&nbsp;</span></span></li><li class="MsoNormal"><span><span><span>During early days, Google had only search engine, for which huge amount of data was processed in the background (e.g. generating inverted indexes for web pages).<span>&nbsp; </span>At that time, they did not have any user facing web applications like Gmail or Youtube which are sensitive to latency (i.e. require low latency).<span>&nbsp; </span>Hence they needed a file system tailored for batch oriented operations (higher throughput) than for latency oriented operations (lower latency)</span><span><span>&nbsp;</span></span> </span></span></li></ol><span><span><span><span>With the assumptions listed above, they designed and developed a file system which consisted of mainly 3 components:</span></span></span></span><span><span><span><span><span> <ol><li class="MsoNormal"><span>A master server for maintaining the file system metadata. There would be one active master per cluster. </span></li><li class="MsoNormal"><span>Chunk servers for storing the actual chunks of data. Each file is divided into several chunks and each chunk is of size 64 MB. And each chunk is replicated at least 3 times. For e.g. let&rsquo;s assume that a file &ldquo;a.txt&rdquo; has chunks c1, c2 and c3. Each of these chunks will have at least 2 more replicas, e.g. c1&rsquo; and c1&rsquo;&rsquo;, c2&rsquo; and c2&rsquo;&rsquo;, c3&rsquo; and c3&rsquo;&rsquo;.<span>&nbsp; </span>These chunks are usually placed on different machines in order to ensure availability in case of machine failures. Users can override this default replication factor of 3 and specify their own replication factor for each of the files.</span></li><li class="MsoNormal"><span>GFS client, which will be used by applications for reading, writing or deleting data. GFS client provides APIs like create, delete, open, close, read and write. Apart from these standard APIs, GFS provides snapshot API for creating replicas and record append API for concurrent appends to a same file. </span></li></ol><p>&nbsp;</p><p class="MsoNormal"><span><span>Master maintains the file system metadata. For e.g. file namespaces, mapping between file names to chunk locations. Chunk servers send regular heartbeat messages to the master indicating their health and changes in the chunk status (if any). For e.g. a chunk could get corrupted (this is determined using a checksum) or could have outdated data (outdated chunk is determined using chunk version number). Whenever a chunk server dies, all the chunks present on that chunk server need to be re-replicated.<span>&nbsp; </span>Master comes to know about the death of a chunk server if it does not receive a heartbeat message within a configured interval. Master places chunks in such a way that the data is distributed evenly across all the machines within a cluster.</span></span></p><span><span>Certain metadata, e.g. file namespaces and file to chunk mapping, is kept in persistent state on the master&rsquo;s disk. In case of a crash, master recovers by reading the metadata stored on the disk. This data is also replicated to shadow masters at regular intervals of time. If master machine itself crashes and it is not possible to restart the master, then one of the shadow masters takes over. </span><span><span><p><span>GFS implements lazy garbage collection mechanism for removing the deleted data. Deleted files are not removed immediately. They are garbage collected at a later point of time. This helps in undoing accidental deletes, which could be costly considering the size of data. </span></p><p><span><span>Leasing mechanism is used to maintain data consistently across all the chunks. The GFS client has to obtain a lease on a chunk to do any data mutation on that chunk. Till the lease on that chunk expires, other clients cannot access that chunk for any data mutation. Any mutation to a chunk, is replicated to all the chunk replicas and the mutations are applied in a consistent order to all the replicas.<span>&nbsp; </span>For e.g. if data blocks A, B and C are written to primary chunk c1, then secondary chunks c1&rsquo; and c1&rsquo;&rsquo; also get the data in the same order, i.e. A, B and C. This ensures data consistency on all the chunks.&nbsp;</span><span>&nbsp;</span></span></p></span><span><span><span><span>Application code is linked with GFS client library. For any operation, client first contacts the master for getting the chunk location and lease on that chunk (in case of mutations). Once the chunk location is obtained, the client directly contacts the chunk servers to read, write or delete the data (by bypassing the master). </span><span><span><span><span><span><p><span>Google&rsquo;s publications on GFS and MapReduce (a programming model for distributed data processing) have inspired an open source project named Hadoop (<a href="https://xnet.infosys.com/owa/redir.aspx?C=9b2a7fdf04e441e2ab200cf214084eb2&amp;URL=http%3a%2f%2fwiki.apache.org%2fhadoop%2fHDFS%3faction%3dshow%26redirect%3dDFS" target="_blank"><span>http://wiki.apache.org/hadoop/HDFS?action=show&amp;redirect=DFS</span></a>). If you want to explore Hadoop, check: <a href="http://hadoop.apache.org/">http://hadoop.apache.org/</a>.</span></p></span><span><span>Exponential growth of internet and proportionate growth in data has exposed some of the drawbacks of GFS. This has prompted Google to rethink on some of the initial design decisions. Some of the drawbacks of earlier system are:</span> </span><span><ul><li class="MsoNormal"><span>It was designed mainly for batch centric applications, i.e. the applications which need to process huge amount of data in batch mode and are not sensitive to latency. With Google Search Engine becoming immensely popular, Google added other applications like Gmail, Youtube etc, which are sensitive to latency. Hence if these applications were to use GFS, certain adjustments had to be made to the file system.</span></li><li class="MsoNormal"><span>To simplify the design, GFS was implemented with a single master node, which maintains the file system metadata for entire cluster. By initial estimates, GFS was expected to handle few million files with sizes up to few terabytes. But the demands for data grew from terabytes to petabytes. This increased the size of metadata maintained by the single master. This in turn increased the processing time at master node and limited the number of client requests that a master can handle within a specified period of time.</span></li></ul><p class="MsoNormal"><span><span>Over the years, some of these drawbacks have been managed by tweaking the file system or tweaking the applications which used this file system. Engineers at Google have been working on a new distributed master system (as opposed to single master design) to solve some of the problems of GFS. If you are interested in knowing how the file system has evolved over the years, you can check this recently published ACM link: <a href="http://queue.acm.org/detail.cfm?id=1594206">http://queue.acm.org/detail.cfm?id=1594206</a>.</span></span></p></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>]]>
    </content>
</entry>
<entry>
    <title>Is Big Bang the right approach to Internationalization?</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2010/01/is_big_bang_the_right_approach.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=22" title="Is Big Bang the right approach to Internationalization?" />
    <id>tag:www.infosysblogs.com,2010:/engineering-software//1.22</id>
    
    <published>2010-01-11T09:12:11Z</published>
    <updated>2010-01-11T09:28:38Z</updated>
    
    <summary>Over the years our project teams have matured in the way they handle the implementation of an Internationalization project, however things were not always so smooth. There were times when the project was tested and delivered to the client, but...</summary>
    <author>
        <name>Aviraj Singh</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p align="justify">Over the years our project teams have matured in the way they handle the implementation of an Internationalization project, however things were not always so smooth. There were times when the project was tested and delivered to the client, but it refused to work on the client&rsquo;s machines. The offshore team just couldn&rsquo;t figure out the reason for this to happen. A lot of fire fighting effort was then required to get things back on track and take corrective actions. Most of the problems were due to wrong planning, lack of technical understanding and incorrect assumptions. Things are pretty much streamlined now with an i18n Center of Excellence (CoE), i18n frameworks, analysis tools, POC&rsquo;s and best practices in place. Here I am going to recollect my earliest Internationalization experience and what we learnt from it.</p>]]>
        <![CDATA[<p align="justify">Almost a decade back, during one of our assignments we were engaged with a Japanese client. They had an English product which they wanted us to <em>internationalize</em> and subsequently <em>localize</em> to Japanese. Internationalization was a known concept around that time but we did not have adequate practical experience with such work. We had a team of people who were familiar with the concept of Localization and they were brought into the team. Some more relatively lesser experienced people were also added to the project. The product was written in Visual C++ and so the objective of choosing the team was to get people with adequate understanding of C++ and train them on Internationalization and Localization concepts.</p><p align="justify">The requirements were gathered, process documents were created and the team came up with the implementation and release plan. As with all Japanese projects, time to release was a critical factor and the offshore team did not have much time to ramp up their i18n skills. At the concept level the team had an understanding that anything that is shown in English on the UI must now be shown in Japanese. So the approach was to find all the hard coded strings in the source code and move them to an external resource file. Secondly since native C++ functions and data types do not have support for Unicode, they had to be replaced with their wide char equivalents. This means a &lsquo;char&rsquo; variable should be replaced with &lsquo;wchar_t&rsquo; and functions like &lsquo;strcpy&rsquo; should be replaced with &lsquo;wcscpy&rsquo;. None of the team members had an understanding of the repercussions of making these changes and since time was ticking away like a bomb, it was decided to follow the Big Bang approach and do a find-replace on the entire source code since analyzing the data flow in the source code to find impacted areas would have taken too much time. Subsequently scripts were written to automate the whole process and substitute all the data types and functions with their wide char equivalents and substitute all hard coded strings with resource bundle calls. With the approach neatly lined up the team got busy making the substitutions and compiling the individual modules. Finally all the changes were complete and the source code was compiled. Since the Localization to Japanese was not yet done, the product was tested using the English resource files and everything worked as expected on all the offshore machines. The product was delivered right on time to the customer. It was now time to sit back and wait for the appreciation mails to flow in.</p><p align="justify">The customer installed the product on one of their Japanese machines and tried to launch it. The application crashed. No matter what combinations they tried the application refused to launch. The customer pressed the panic button. The offshore team could not figure out the reason for the crash. They got a machine with Japanese OS and tried running the application on it. It worked fine. After understanding the customer&rsquo;s environment, it was decided to install the product in a folder having a Japanese name. The product failed to launch and crashed. The code was debugged and it was found that one of the replaced wide char functions was the culprit. Pointer arithmetic on data bytes was not modified to reflect the fact that a character could now be represented by multiple bytes; and so at some point this resulted in incorrect processing, corrupt data and eventually a crash. This happened as the team had followed the Big Bang approach and just replaced all the impacted functions with the wide char equivalents without analyzing the data processing logic. It is not just enough to use wide char functions; a thought has to be given to the usage as well. Subsequently an extension was sought and corrective measures were taken and the project was eventually delivered in perfect working condition for the Japanese environment. The initial approach had backfired and quite a few lessons were learnt from the experience,</p><ol><li><div align="justify"><strong>Have the right team</strong> - Your team might comprise of people with 5+ years of experience, but when it comes to Internationalization, it is important to have a team which understands the concepts and technical aspects of Internationalization. It will shorten the development cycle and the end product will have lesser defects.</div></li><li><div align="justify"><strong>Have the right processes in place</strong> - A Big Bang approach is always dangerous to start off with. A more mature implementation methodology is required. Checklists must be in place to ensure that when a particular change is made; all other changes related to that change are also dealt with. Internationalization changes can have cascading effects on other areas of the code. Changes should be done module wise or feature wise so that defects are caught earlier and in a localized manner instead of taking the&nbsp; Big Bang approach and messing up the entire code.</div></li><li><div align="justify"><strong>Analysis is more important than development</strong> - It is very important to have a team of experts who will analyze the source code to find all areas which need to be modified to support Unicode. It is quite possible that some functions and data types need no change because they will not be handling any Unicode data. In such cases replacing them with their wide char equivalents is an overhead and could contribute to a performance hit. It is also important to understand the data flow in the application so that the required changes can be done in the code to handle encoding conversions etc in the functions or external interfaces. The memory usage of the application also increases when you support Unicode, hence the code must be analyzed to increase memory allocations only in the impacted areas. The Big Bang approach doesn&rsquo;t check for all these things and it mostly leads to bloated code which uses more memory than desired and under-performs at runtime.</div></li><li><div align="justify"><strong>Use the right Tools</strong> - Using the right set of tools during development can speed up the development process. There are a lot of commercial tools available in the market which can help in static analysis of the source code. Infosys has developed a set of in-house tools for Internationalization and Localization. Among other features in the tool set, it helps reduction in analysis time by auto-detecting all areas in the source code where i18n changes are possibly required. It can also help later in assessing the i18n readiness of the product. However it should be kept in mind that tools are not a substitute for experienced people. While they can help increase productivity, the developers should still have an understanding of the i18n concepts in order to interpret the output of the tools correctly.</div></li><li><div align="justify"><strong>Do not make assumptions regarding the input data</strong> - In the scenario above, the team assumed that since the product was working with English inputs, it should also work with Japanese inputs. It is wrong to make such assumptions. A Japanese user can input filenames in Japanese or try saving a file in a folder with a Japanese name. The code should anticipate such use-cases.</div></li><li><div align="justify"><strong>Have the right test environment</strong> - Just because there is no language translation expert in the team, it is inadequate to test the product with English data. This will definitely spring some nasty surprises later when the product is deployed in a pure Japanese environment. You should either plan for localization at the time of testing or use alternate approaches like Pseudo-localization and make sure the product is tested with Japanese strings as well.</div></li></ol><p align="justify">The Big Bang approach is similar to cooking a dish by mixing all the ingredients into the pan at the same time. The outcome is unpredictable and in most cases will not get you the desired result. It is better to follow a systematic approach which will guarantee success as well as allow you to take corrective actions as and when something appears to be going wrong, rather than waiting for disaster to happen and start cooking all over again.</p>]]>
    </content>
</entry>
<entry>
    <title>Deciding Optimal Unicode Solution for Globalization Database</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/12/deciding_optimal_unicode_solut_1.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=21" title="Deciding Optimal Unicode Solution for Globalization Database" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.21</id>
    
    <published>2009-12-31T10:31:25Z</published>
    <updated>2010-01-04T04:51:35Z</updated>
    
    <summary><![CDATA[The concept of Globalization and the estimation model has been explained very well by Aviraj Singh in his post Effort estimation for a Globalization project.&nbsp;&nbsp; Being a database person I always look at it from a different perspective, giving a...]]></summary>
    <author>
        <name>Vishal Parashar</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p align="justify">The concept of Globalization and the estimation model has been explained very well by Aviraj Singh in his post <em><u><span>Effort estimation for a Globalization project</span></u></em>.&nbsp;&nbsp; Being a database person I always look at it from a different perspective, giving a bit extra weightage to database. There are lots of granular intricacies that one has to think of before deciding the solution for supporting<span> </span>Unicode data in databases.&nbsp; It can be achieved though <em>Unicode database</em> i.e. <span>&nbsp;</span>Upgrading database character set to one that supports UTF-8 encoded characters as SQL datatypes like CHAR/VARCHAR2 etc. Another option can be <em>Unicode</em> <em>Datatype</em> i.e. to support multilingual data only for certain columns by using Unicode national character set <span>&nbsp;</span>to store multilingual data in SQL NCHAR datatype attributes , without making any changes to database character set. The&nbsp; most confusing and key&nbsp;&nbsp; decision for Globalization project is&nbsp; whether <em>one should opt for <strong>Unicode database</strong> or <strong>Unicode data types</strong></em> for supporting multiple languages in database. This is a key decision for the success of any Globalization project and will also have a considerable impact on effort estimations.&nbsp; An incorrect choice at this stage can lead to a lot of rework and end hour surprises.&nbsp;&nbsp; </p><p align="justify">&nbsp;</p>]]>
        <![CDATA[<p align="justify">It is always better to clarify business and technical requirements&nbsp; especially&nbsp; as regards the need&nbsp; for globalization ,&nbsp; details of&nbsp; languages/geographies to be supported , the size of database&nbsp; and&nbsp; data distribution ,&nbsp; application downtime available for upgrade(for existing application) , application code language. In addition, one should also understand the&nbsp;future&nbsp;business growth plan&nbsp;&nbsp; and&nbsp;geographies to be supported.&nbsp; Once specific details are available,&nbsp;a team of experts should&nbsp;study the requirements closely and brainstorm to arrive at an optimal Unicode solution to be implemented.&nbsp; Following are few key things that should be considered for decision making:</p><p align="justify"><span><span>&Oslash;<span>&nbsp; </span></span></span><strong>Business Requirements <br /></strong></p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Do I need to provide multilingual support for existing database or need to create it from scratch?&nbsp; Which languages do I need to support?</p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>How large is the existing database? If an incremental upgrade is required, then Unicode data type may be the better option. In case it needs a complete overhaul in a big bang, then changing database character set may be the better option. </p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Application/Database downtime and future business requirements may also be factors that influence the final decision. Examples could be as regards the languages&nbsp; that application may require to support in near future etc. </p><p align="justify"><span><span>&Oslash;<span>&nbsp; </span></span></span>&nbsp;&nbsp;<strong>Business Data <br /></strong></p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Type of data&nbsp; </p><p align="justify">Does the application needs to support Asian, European etc languages; do these languages have supplementary characters?&nbsp; This will help in deciding optimal Unicode encoding. For example, <span>&nbsp;</span>UTF-16 provides more compact storage for Asian languages whereas European scripts are more efficient with UTF-8 encoding.</p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Data distribution</p><p align="justify">If multilingual fields are distributed across all databases, it&rsquo;s better to go for Unicode database than Unicode data types.</p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Data volume&nbsp; and application downtime</p><p align="justify">If the data to be migrated is huge and the application downtime does not allow for a big bang migration, it&rsquo;s always better to opt for an incremental upgrade with Unicode datatypes; especially for a <span>&nbsp;&nbsp;</span>existing DB where database character set (say WE8ISO8859P1) is not a subset of any UTF-8 encoding. In this case database character set upgrade will require additional overhead for converting data from existing database character set to Unicode character set.</p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Binary data ( BLOBS and CLOBS)</p><p align="justify">If there is a requirement to store different types of multilingual documents and search their content in BLOB data types, one should go for a Unicode database. BLOB data is converted into the database character set before being indexed. Hence, if your database character set is non-Unicode then there will be data loss if the documents contain characters that cannot be converted to the database character set.</p><p align="justify"><span><span>&Oslash;<span>&nbsp; </span></span></span><strong>Performance<br /></strong></p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>A Unicode database comes with a performance overhead due to the non-optimal use of data storage (depending on language and database/national character set compatibility) and conversion that may be required before storing data in database.&nbsp; If <em>feasible</em>, it may be better to store multilingual (Unicode) data in&nbsp;&nbsp; NCHAR/NVARCHAR data types, without changing the database character set.</p><p align="justify"><span><span>&Oslash;<span>&nbsp; </span></span></span><strong>Application</strong> <strong>code<br /></strong></p><p align="justify"><span><span>o<span>&nbsp;&nbsp; </span></span></span>Application code too plays an important role in deciding the suitable optimal Unicode solution.&nbsp; For example VC /VC++ applications on MS windows may perform better with Unicode datatypes as the data lengths of wchar_t buffer in VC/VC++ match the length of SQL NCHAR data types in database. This will make data comparison more efficient and may avoid buffer overflow in client applications.</p><p align="justify">One can also consider selecting combination of Unicode database and Unicode datatypes depending on project requirements. This can be a ideal situation where the database character set (US7ASCII) of the existing database is an exact sub-set of Unicode database character set (AL32UTF8) and you have Java application code running on Windows. Both Java and Windows being better compatible with NCHAR data types of UTF-16 encoding will give better performance and will be easy to manage so national character set may be set to AL16UTF16. Database character set upgrade to superset Unicode character set will also be quite easier and faster as this will not require any conversions. </p><p align="justify">Unicode solution has a huge impact on design, implementation approach and hence has an impact on the effort estimates of a globalization project. Though it is bit difficult to generalize the globalization effort estimation framework as scope and intensity of application/database code changes will be largely driven by business requirements, it will still be better to focus on areas discussed above in the initial stage and then review the project estimates and implementation strategy accordingly.&nbsp;&nbsp; </p><p align="justify">I have tried to cover most of key focus areas for driving Unicode solution for globalization database. Any other thoughts on this are most welcome.</p>]]>
    </content>
</entry>
<entry>
    <title>Google Public DNS</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/12/google_public_dns.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=20" title="Google Public DNS" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.20</id>
    
    <published>2009-12-30T13:37:32Z</published>
    <updated>2010-01-04T06:23:20Z</updated>
    
    <summary>The blog describes some of the salient features of Google&apos;s Public DNS product.</summary>
    <author>
        <name>Manjunath Ballur</name>
        
    </author>
            <category term="General" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p class="Normal"><span class="Normal__Char" style="font-size: 10pt">A month ago,&nbsp;Google announced the release of Google Public DNS (Domain Name System), which is a free DNS resolution service. DNS is used to translate human friendly computer names into IP addresses. When a user types the name of a website, the Domain Name Servers convert this name into an IP address, and this IP address is used by your machine to send requests. A DNS network contains a set of servers which maintain a cache of domain name to IP address mappings. Usually these Domain Name Servers are maintained by your Internet Service Providers (ISP). With Public DNS service, Google wants to provide an alternative to your ISP&rsquo;s service. Public DNS leverages the existing infrastructure used by Google&rsquo;s search engine, which uses crawlers to scan through millions of websites. The DNS information cached by these web crawlers is used by Public DNS. Already a company by name Open DNS offers a similar popular DNS resolution service.</span></p><p class="Normal"><span class="Normal__Char" style="font-size: 10pt" /></p><!-- Copyright (c) 2006 Microsoft Corporation.  All rights reserved. --><!-- OwaPage = ASP.webreadyviewbody_aspx --><!--Copyright (c) 2006 Microsoft Corporation. All rights reserved.-->]]>
        <![CDATA[<p class="Normal" style="margin-top: 12pt"><span class="Normal__Char" style="font-size: 10pt">These DNS services claim to provide faster (by caching relevant DNS information and hence speeding up page retrieval) and safer (preventing spoofing and denial of service (DoS) attacks) service as compared to your ISPs. </span></p><p class="Normal" style="margin-top: 12pt"><span class="Normal__Char" style="font-size: 10pt">Delay in loading a webpage could be caused by factors like geographical distance between the client and resolving servers (which could result in longer round trip time, or loss in packets due to network congestion etc.), cache misses (in this case, a resolving server does not have information about the requested domain name and needs to recursively query other servers to get the information) and heavy load on resolving servers due to under provisioning of servers or denial of service attacks (deliberate overloading of servers by malicious users, to deny service to genuine users). Public DNS claims to mitigate these delays with following approaches:</span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">1.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">Adequate provisioning of servers to handle both the genuine requests and denial of service attacks.</span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">2.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">Usually DNS lookup queries are load balanced amongst several name resolving servers. If there is over provisioning of resolving servers (as described in point 1, over provisioning is necessary to prevent DoS attacks) and if the load balancer randomly selects the servers, it could result in different servers having entirely different set of cached information (fragmented cache). This results in high percentage of cache misses and hence increased traffic between the servers, especially for popular domain names (remember that whenever a server cannot find the requested information in its cache, it has to query other servers). Public DNS handles this problem by splitting servers into 2 categories. One category of servers uses a global cache which contains popular domain names (e.g. Google.com). Since popular names are requested frequently, this global cache remains refreshed at all the times, resulting in quicker resolution. Other category of servers uses a local cache (i.e. each server maintains its own cache), which caches less popular domain names. Since these less popular domain names are requested infrequently, cache misses will not result in increased network traffic. But to service these less popular domain names as efficiently as popular domain names, Public DNS optimizes the request resolution by always forwarding requests for a domain name to the same server. For e.g. if the request is for www.indya.com, it is always forwarded to server A. If the request is for www.sify.com, it is always forwarded to server B. So, if user requests </span><a href="http://www.infosysblogs.com/engineering-software-mt/redir.aspx?C=5018ac95fee34c5e9a18bcc124ae7c9a&amp;URL=http%3a%2f%2fwww.indya.com%2f" target="_blank"><span class="Hyperlink__Char"><span class="Hyperlink__Char" style="font-size: 10pt">www.indya.com</span></span></a><span class="Normal__Char" style="font-size: 10pt"> repeatedly, the cached information at server A would result in quicker resolution.</span> </span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">3.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To ensure faster resolution of domain names, Public DNS pre-fetches and periodically refreshes the names irrespective of whether user requests these names. This is implemented using an offline component which periodically selects and ranks the domain names based on factors like popularity and hit rate (number of times it is requested).&nbsp; Another runtime component resolves these pre-fetched names and refreshes them based on their time to live value. This ensures that frequently requested domain names are served quickly (even if they are not universally popular domain names like www.google.com).</span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">4.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">Google hosts Public DNS in its data centers across the world and routes the requests to the geographically closer mirror sites (e.g. google.co.in for requests from India), thus resulting in faster browsing experience.</span></span></p><p class="Normal" style="margin-top: 12pt"><span class="Normal__Char" style="font-size: 10pt">Another consideration for a DNS service is security. DNS servers could become targets of spoofing (redirect users to malicious sites) and denial of service (DoS) attacks. Public DNS has implemented following approaches to prevent above mentioned security threats:</span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">1.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To prevent the DoS attacks:</span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 72pt"><span class="Normal__Char" style="font-size: 10pt">a.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">Public DNS enforces rate control over the amount of traffic that could be directed to other name servers. Thus it will not be possible for attackers to flood name servers with high volume of malicious traffic. The rate control is also enforced on the responses that are sent back. </span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 72pt"><span class="Normal__Char" style="font-size: 10pt">b.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To prevent amplification attacks (amplification attacks exploit high response to request ratio of name servers. Attackers can inject large responses into name server&rsquo;s cache, thus flooding the network with traffic), the response traffic is limited by applying &ldquo;maximum average amplification factor&rdquo; to each client IP. </span></span></p><p class="Normal" style="margin-left: 54pt"><span class="Normal__Char" style="font-size: 10pt">If requests/responses exceed any of the above mentioned parameters, the error is returned. In some cases, no response is sent for such requests.</span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">2.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To prevent cache poisoning, basic validity checks, like rejecting the malformed responses or responses which don&rsquo;t match the attributes of the requests (e.g. source IP, port), are enforced.</span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">3.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To make it difficult for the attackers to easily predict and match a combination of name servers, ports and query names, these attributes are randomized. For e.g. the requests are sent out on different port numbers and to different name servers (not to the nearest name server always) to add some unpredictability to the requests. Also, the cases in the queried domain names are changed to prevent forged responses. For e.g. </span><a href="http://www.infosysblogs.com/engineering-software-mt/redir.aspx?C=5018ac95fee34c5e9a18bcc124ae7c9a&amp;URL=http%3a%2f%2fwww.google.com%2f" target="_blank"><span class="Hyperlink__Char"><span class="Hyperlink__Char" style="font-size: 10pt">wwW.gooGLE.com</span></span></a><span class="Normal__Char" style="font-size: 10pt"> or </span><a href="http://www.infosysblogs.com/engineering-software-mt/redir.aspx?C=5018ac95fee34c5e9a18bcc124ae7c9a&amp;URL=http%3a%2f%2fwww.google.com%2f" target="_blank"><span class="Hyperlink__Char"><span class="Hyperlink__Char" style="font-size: 10pt">WwW.gOoGlE.cOm</span></span></a><span class="Normal__Char" style="font-size: 10pt">.</span></span></p><p class="Normal" style="text-indent: -18pt; margin-left: 36pt"><span class="Normal__Char" style="font-size: 10pt">4.</span><span style="letter-spacing: 0pt">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Normal__Char" style="font-size: 10pt">To prevent attackers from injecting multiple duplicate requests for the same name resolution, Public DNS does not allow more than one request with same query attributes (port number, destination IP).</span></span></p><p class="Normal" style="margin-top: 12pt"><span class="Normal__Char" style="font-size: 10pt">If you want to try out Public DNS, follow the instructions mentioned at: </span><a href="http://code.google.com/speed/public-dns/docs/using.html" target="_blank"><span class="Hyperlink__Char"><span class="Hyperlink__Char" style="font-size: 10pt">http://code.google.com/speed/public-dns/docs/using.html</span></span></a></p><p class="Normal"><span class="Normal__Char" style="font-size: 10pt">To try out free basic version of Open DNS, check </span><a href="http://www.infosysblogs.com/engineering-software-mt/redir.aspx?C=5018ac95fee34c5e9a18bcc124ae7c9a&amp;URL=http%3a%2f%2fwww.opendns.com%2fstart%2f" target="_blank"><span class="Hyperlink__Char"><span class="Hyperlink__Char" style="font-size: 10pt">http://www.opendns.com/start/</span></span></a><span class="Normal__Char" style="font-size: 10pt">.</span></p>]]>
    </content>
</entry>
<entry>
    <title>Don’t think local, think locale</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/12/dont_think_local_think_locale.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=19" title="Don’t think local, think locale" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.19</id>
    
    <published>2009-12-11T12:25:24Z</published>
    <updated>2009-12-11T12:39:47Z</updated>
    
    <summary>Imagine yourself going to Japan to open a restaurant. Your market research says that your burgers are going to sell like hot cakes there, so you have planned a major investment there and drawn up plans for expansions. You land...</summary>
    <author>
        <name>Aviraj Singh</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p align="justify">Imagine yourself going to Japan to open a restaurant. Your market research says that your burgers are going to sell like hot cakes there, so you have planned a major investment there and drawn up plans for expansions. You land at the Narita airport and are absolutely clueless on how to get out of there. You look around and find that all directions and signs in Japanese. You try to ask for directions but all you get is blank stares because no one understands English. Somehow you manage to find your way out and get busy with your work. After a lot of hard work, you finally open your restaurant but you don&rsquo;t find many people walking in. Your business goes dry and it&rsquo;s difficult to survive with so much local competition around. What is really going wrong? Didn&rsquo;t your market research say that you are bound to succeed?</p>]]>
        <![CDATA[<p align="justify">This is a big dilemma for a lot of entrepreneurs when they try to enter emerging markets. You need to cross the language and cultural barrier in order to succeed beyond your neighborhood. If you open a restaurant in Japan, you have to ensure that your menu is customized for their tastes. You have to ensure that you have a menu in Japanese as well. All posters and signboards inside or outside your restaurant must be in Japanese, else how will the Japanese people know what you are trying to sell? Mc Donald&rsquo;s sells their burgers in many countries, but they have customized their burgers according to their target market. While you may find a vegetarian burger in India, you will probably not find it in Japan. Instead they have a teriyaki chicken burger in Japan which they don&rsquo;t sell in Canada. Over the years companies such a Mc Donald&rsquo;s, Microsoft, Apple, IBM etc have realized the importance of customizing their offerings for the global markets.</p><p align="justify">Localization is critical when entering new markets. Localization is more than just translation of your user interfaces of help documents. It also takes into consideration the cultural, legal, regulation issues etc. It makes sense to invest in the global markets only when you foresee an ROI from the opportunity. So which are the emerging markets in 2009 and beyond? Which geographies should you target to increase your revenues? There are the most common questions which come up and firms like <a title="Forrester Research" href="http://www.forrester.com/" target="_blank">Forrester</a> and <a title="Gartner Technology Business Research Insight" href="http://www.gartner.com/" target="_blank">Gartner</a> have extensive market research data to answer all these questions. A research by <a title="Byte Level Research: Think Outside the Country" href="http://www.bytelevel.com/" target="_blank">Byte Level Research</a> says that non-English speakers will represent 79% of all the internet users by 2010. So which language will dominate the internet in future? German is currently the most popular language on the internet, but Spanish is expected to overtake it and Chinese (simplified) is quickly gaining ground. According to the World Intellectual Property Organization (<a title="World Intellectual Property Organization " href="http://www.wipo.int/" target="_blank">WIPO</a>) and the International Telecommunications Union (<a title="International Telecommunications Union " href="http://www.itu.int/" target="_blank">ITU</a>), Chinese will outrank English as the most-used language on the internet. Today the number of online users from China and Europe far exceed those from the United States.</p><p align="justify">The numbers don&rsquo;t lie. All these statistics have been generated by collecting information and data from hundreds of small to medium to large corporations across the globe. Many companies are expanding their already established businesses, into other geographies. New players into the market are already making expansion plans into other geographies. According to a Byte Level Research done in 2007; on an average 80% of the companies interviewed, see their competitors taking their business global. It is imperative for them to be pro-active in such a scenario and make plans for going global themselves. It&rsquo;s a case of &lsquo;<em>Go global or perish</em>&rsquo;. Intel generates around 70% of their revenues from outside the US. Microsoft makes around one third of their revenue from outside the US. Google had already reached the 50% mark by 2008. As I have mentioned in one of my previous blogs, it&rsquo;s not enough being the best in your neighborhood anymore. <em>Don&rsquo;t think local, think locale&hellip;</em></p>]]>
    </content>
</entry>
<entry>
    <title>Green Computing and Virtualization</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/12/green_computing_and_virtualiza.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=18" title="Green Computing and Virtualization" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.18</id>
    
    <published>2009-12-09T11:55:20Z</published>
    <updated>2009-12-09T13:17:31Z</updated>
    
    <summary><![CDATA[While&nbsp;contemplating about&nbsp;the importance of virtualization in achieving green computing standards especially in organizations hosting data centers, I came across an interesting article here on how energy emissions from data centers can be used to warm homes in Scandinavian countries....]]></summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
            <category term="Virtualization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p>While&nbsp;contemplating about&nbsp;the importance of virtualization in achieving green computing standards especially in organizations hosting data centers, I came across an interesting article <a title="Cloud Computers to warm homes" href="http://timesofindia.indiatimes.com/home/environment/developmental-issues/Cloud-computers-to-warm-homes-from-six-feet-under-/articleshow/5304474.cms" target="_blank">here</a> on how energy emissions from data centers can be used to warm homes in Scandinavian countries. </p>]]>
        <![CDATA[<p>The article mentions that in a typical data center only 40-45% of the energy is used in actual computing while the remaining is used in powering agents to cool these servers. Besides, data centers run by a&nbsp;search giant&nbsp;already seem to be using around 1% of the word's energy and their demands seem to be rising fast every year.</p><p>This is interesting information in the context of the fact that the world has finally woken up to the need to put their heads together to resolve issues related to global warming at the UN Climate Summit at Copenhagen, Denmark. </p><p>There is a general feeling that the summit highlights the importance that the world is conscious&nbsp;that &nbsp;technological advancements have contributed to increased emissions and the need of the hour is to bring in another set of eco-friendly technology advancements. Which means 'Virtualization' is a word that is going to be part of every day conversation in enterprises as the advantages of 'virtualizing' the data center are well known.&nbsp;As a by-product, the focus is also going to be on development of tools that are going to make the process of virtualizing the infrastructure and managing it much easier.</p>]]>
    </content>
</entry>
<entry>
    <title>Testing Cycles and Product Stability</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/12/testing_cycles_and_product_sta.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=17" title="Testing Cycles and Product Stability" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.17</id>
    
    <published>2009-12-08T09:49:15Z</published>
    <updated>2010-01-04T06:40:10Z</updated>
    
    <summary>Years of experience in software development have not helped reduce anxiety levels whenever a project enters the &apos;Testing&apos; phase of the Software Development Life Cycle. It feels the same as one would feel when parents accompanied you to school to...</summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
            <category term="Software Product Testing" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p>Years of experience in software development have not helped reduce anxiety levels whenever a project enters the 'Testing' phase of the Software Development Life Cycle. It feels the same as one would feel when parents accompanied you to school to collect your academic results at the end of term examinations. There is always the anxiety of whether&nbsp;the&nbsp;output of the design and coding phase will be able to successfully sustain the&nbsp;test case bombardment. Besides, you would also be anxious&nbsp;&nbsp;to know if&nbsp;there are&nbsp;enough test cases to traverse all paths of the software while testing functionality&nbsp;and QoS parameters&nbsp;-&nbsp;so as to&nbsp;be confident that all loose ends are&nbsp;covered. An even more difficult pill to swallow is a situation where you realize that a number of your tests are failing, and you will have to get back to the customer with the bad news and request an extension. But even after that, how do you gurantee that your product will be defect free ? How do you gurantee that you have not introduced defects unknowingly while fixing the known ones ?&nbsp;Experts would ofcourse&nbsp;recommend a 'thorough peer code review' - but even after that, you would still need a 'tested and passed' certificate before the software is passed to the customer for his acceptance tests.</p>]]>
        <![CDATA[<p>I have often felt the need to be able to estimate upfront the number of test cycles that should be executed during the 'Testing' phase. What projects often do is estimate for two cycles of test as an approximation. In Test-Cycle 1, all the test cases are run and all the defects in the sofware are assumed to have been detected. Fix them. Test-Cycle 2 is run to make sure that all the defects detected in Test-Cycle 1 are fixed. It is quite common that some defects are not completely fixed, especially when the number of defects detected is large. In addition, new defects may have also crept in as a by-product of the defect fixes. Naturally, this would call for another test cycle, once the Test-Cycle 2 defects are fixed ..and you end up squeezing in a Test-Cycle 3 with the hope that Test-Cycle 3 would be defect-free. (By-product defects are often seen in GUI intensive products where for example, regression defects are common while controlling state of change&nbsp; of UI elements.)</p><p>One can relate the above situation to what Bruce Powell Douglas mentioned during his July 2009 visit to the Bangalore campus. With regard to a different context Bruce had said that - &quot;The number of defects in the software is proportional to the number of defects that you know about&quot; which means testing and quality has to be a continuous process. </p><p>Every project in it&rsquo;s estimation phase has to dwell deeper into the possible ways to counter deficiencies in development during the SDLC. One such problem is to estimate the number of defects that would need to be ironed out during the various phases of the project and the effort required to counter them so that the goal of a zero defect product is achieved before the delivery to the customer. The measure is a method of checking if the output of the particular phase has been reviewed and tested and that it will contribute to the final quality goal. </p><p>Most leading companies have metrics with regard to software quality management which are created as a result of observing trends in defect data across projects in similar technologies having similar complexities. One such metric would be for example, the number of defects per KLOC of code that are expected to detected during the complete life cycle (In today's world of function point based estimation, this might be a little primitive). </p><p>Let us assume that, as per the expected standards, the total number of defects per KLOC for both GUI and non-GUI applications (developed in C or C++) could range between 15-20 defects. Considering the pessimistic choice of 16 defects per KLOC, for a given application of an estimated size 25 KLOC, the total number of defects that could be expected in the complete life cycle of the project is (25 *16 ) 400 defects. </p><p>The total number of defects estimated to be detected during the SDLC is divided across various phases of the project. The confidence that justice has been done to the review of the product and that the product is measuring up to required quality can be obtained by measuring defects at the end of the phase. Assuming a general spread of defects detected to be 30%:40%:30% across the Requirements and Design, Coding and Testing phases (please check your company metrics for the spread if any), 30% of the estimated defects still exists when the software enters the Component/Integration Testing phase.</p><p>The Component Testing or the Integration Testing phase is high on the list of defect detection stages simply because this integrates a number of independently developed modules &ndash; and is more or less the environment which the user is expected to use the application. This is the last stage of validation and verification &ndash; before the application passes hands outside the development team.</p><p>Ideally, the CT testing should be divided into cycles of CT to make sure that defects creeping in because of defect fixes - are caught and cleaned. </p><p>It is quite common to see three cycles of CT testing being planned for certain products. The first cycle should aim at detecting 60-65% of the defects estimated in the CT stage. Assuming that your defect estimate calculations reveal that there are still&nbsp; 120 defects still lingering in the software as you begin your CT,&nbsp; you should be detecting 120 in the first cycle of CT testing and 68 defects in the second cycle of CT testing. The third cycle of CT testing should be a more confirmation cycle to confirm that there are no regression defects. </p><p>But, is this approach enough to gurantee good software ? Time pressed, you are assuming and hoping that you would have neatly fixed the 68-70 odd defects that are lingering in the software at the end of the second cycle of CT thus allowing CT-3 to be more of a confirmation cycle - which is quite a risk!</p><p>A better approach would probably be to consider the &quot;divide by two&quot; approach to determining the number of cycles of CT. In this approach, the number of defects estimated to be lurking in the software at the start of CT, may be recursively divided by 2 (until the defects are single digit) to determine the number of cycles.</p><p>Hence, in this case &ndash; 120, 60, 30, 15, 7 are the number of defects expected in each cycle with the sixth cycle being a confirmation cycle. However, in this approach care must be taken to change the test cases being executed &ndash; and also to focus more on problem areas (considering the 80-20 rule) so as to not make testing redundant and wasted by repeatedly testing defect free areas.</p><p>The &quot;divide by two&quot; principle might not be acceptable in all situations - and it is understandably difficult to convince about its need in a development process which contains strong design and code review phases. But, it is a good method to forsee upfront how much time and effort you would need during your testing phase to provide an efficient and defect free product.</p><p>P.S - External Testing is a phase where the development team completes their CT testing and provides the application to an external team within the unit for independent testing &ndash; which will surely comprise unbiased testing. This actually means that even on receiving an OK from the development team, there could still be defects lingering in the system &ndash;hidden from familiar eyes-which can only manifest in the eyes of an unrelated tester. This team should not be in anyway involved in the design/development of the product. The test cases developed by this team should be based completely on the FS/FD.<br /></p>]]>
    </content>
</entry>
<entry>
    <title>Internationalization and the development life cycle</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/11/internationalization_and_the_d.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=16" title="Internationalization and the development life cycle" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.16</id>
    
    <published>2009-11-25T08:10:02Z</published>
    <updated>2009-11-25T08:15:54Z</updated>
    
    <summary>The ideal time to start thinking about Internationalization or Localization of your product is at the conceptual stage of the product development life cycle. </summary>
    <author>
        <name>Aviraj Singh</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p align="justify">As a product company your team has come up with a brilliant concept which has tremendous marketing potential in your country. Your marketing survey shows that the concept will soon catch up with other countries across the globe and you can capture the overseas market too. The only catch is that the product will be required to be globalized before it is launched in the international markets. A global launch is still 5-6 months away, so what product strategy will you adopt? Develop the product in English and later when you have access to the global markets, think of internationalizing it or start developing an internationalized version of the product right from conceptualization stage so that you are ready to penetrate the global markets when the time comes? This is a question most product managers will face while developing a product.</p>]]>
        <![CDATA[<p align="justify">The ideal time to start thinking about Internationalization or Localization of your product is at the conceptual stage of the product development life cycle. The product management has to be clear about their vision for the product. If the product is meant for international audiences, it is a good idea to plan for it earlier than later since it will definitely be more expensive to do the same thing later and you may also lose out on market share due to a delayed launch. Internationalization is often more important than localization in the development stage since localization will normally not involve code re-engineering whenever it is done. Think about a product which is not internationalized and you want to introduce multilingual support to it. Changes will have to be made in multiple layers in order to achieve this. These changes will be costly in terms of the time taken, bugs introduced and additional testing required. However if the product architecture had made the same provision at design time, things would have been much simpler and all the development team had to do was get the string resources translated in order to support a new language.</p><p align="justify">What does it take to internationalize your product right from requirements to design to development and finally testing? The requirements gathering team must understand the typical i18n requirements and they must evaluate the product requirements from an i18n perspective too. The design team must understand the typical i18n aspects and ensure that the product design takes care of all i18n issues along with the intended architecture and design. The development team should be experienced with making i18n related code changes and it is important that they understand the i18n best practices in order to avoid rework later. The development of the core components and internationalization must go hand in hand. Pseudo-localization testing must be planned for the product to identify potential localization issues. If these practices are followed, it is more or less assured that it will be easy to adapt the product to different regions or countries as and when required with minimum cost and time to market.</p>]]>
    </content>
</entry>
<entry>
    <title>Trading off with Design Patterns</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/11/trading_off_with_design_patter.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=15" title="Trading off with Design Patterns" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.15</id>
    
    <published>2009-11-24T10:37:51Z</published>
    <updated>2009-11-24T10:49:57Z</updated>
    
    <summary>Over the last decade or so, any queries as regards designing object oriented software systems would lead to one being advised to read the Go4&apos;s book on Design Patterns (Design Patterns: Elements of Reusable Object-Oriented Software). It is without doubt...</summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p>Over the last decade or so, any queries as regards designing object oriented software systems would lead to one being advised to read the Go4's book on Design Patterns (Design Patterns: Elements of Reusable Object-Oriented Software). It is without doubt a wonderfully written book and should be in the possession of most software designers involved in the world of object oriented design. But, what happens when an over-enthusiastic reader ends up seeing patterns in every software problem he encounters ? </p>]]>
        <![CDATA[<p>I had recently come across the a wonderfully hilarious but true write up on when software designers and architects go overboard (<a href="http://www.joelonsoftware.com/articles/fog0000000018.html">http://www.joelonsoftware.com/articles/fog0000000018.html</a>) trying hard to resolve a future problem which may or may not exist - thus compromising on what needs to be resolved NOW. I have been a victim of the same phenomenon - the obsession with &quot;what if tomorrow .....&quot;. No, this is not a criticism of the need to look ahead - but it is just to drive home the point that it is important that one does not lose focus on what needs to be solved right now.</p><p>The same applies to the choice of design patterns. Over-enthusiastic designers end up creating a problem to implement a design pattern which they may find savy and also as a result of the &quot;what if tomorrow ....&quot; phenomenon. A manifest of this was a case where a product was redesigned to enhance maintainability. The end result was beautifully maintainable code but seemingly not performant enough to be launched in the market. The designers had looked far ahead to the problems of tomorrow to actually think about it's current deployment requirement.It is thus extremely important to 'trade off' between what a customer wants today as an immediate market requirement as against what he is willing to compromise on at a future time. </p><p>One needs to be extremely conscious when trying to fit a design solution or pattern to resolve a problem. The&nbsp; Singleton pattern is a very commonly used pattern - and admittedly one of the easiest to understand. You will not have to look too hard into the problem that you are trying to solve to find reasons to implement the Singleton pattern and lot of enthusiastic new designers will jump at the prospect of a chance to implement that chapter in the Go4 book. But, you have got to be sure that there is no thread-safety requirement that the customer has passed on without anyone realizing - which might end up making the 'Singleton' choice look amateurish.</p><p>Bruce Powel Douglass spoke at a session at the Infosys campus in Bangalore in July this year where he stressed on how architects and designers need to make a thorough analysis of the choice of design patterns and the best fit necessitated in the system that was under built. He called it 'the selection of patterns using design trade-off analysis'where he explained how the choice of design patterns needed to be weighed against the design criteria that the product was expected to achieve. For example, it was important to weigh the following typical design criteria against the needs of the product.</p><p>1. Worst case performance<br />2. Time to market<br />3. Memory<br />4. Reusability<br />5. Simplicity<br />6. Safety and Reliability</p><p>Now, of a choice of multiple design patterns, each design pattern needed to be rated as against how much it would help achieve each of the above design criteria to help provide some direction on the final choices. A nice elucidation of a structured approach to correct choices in design.</p><p>Though Bruce with reference to the world of embedded systems, this approach holds true no matter what system you intend to design. In today's world of instant solutions, doing this might seem slightly unwieldly but it is no doubt absolutely worth the effort. </p><p>So, go ahead and master the design patterns as espoused in so many books available, but remember to use it only as a tool to solve a problem that exists. Use it to solve a potential future problem when you are sure that it is not going to compromise the current solution.<br /></p>]]>
    </content>
</entry>
<entry>
    <title>Effort estimation for a Globalization project</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/11/effort_estimation_for_a_global.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=14" title="Effort estimation for a Globalization project" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.14</id>
    
    <published>2009-11-11T08:19:04Z</published>
    <updated>2009-11-11T08:41:31Z</updated>
    
    <summary>The blog talks about the considerations which needs to be applied while doing effort estimates for a Globalization project. It talks about the various decisions which need to be made, the kind of data which needs to be collected for estimation etc</summary>
    <author>
        <name>Aviraj Singh</name>
        
    </author>
            <category term="Internationalization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        <![CDATA[<p class="MsoNormal" style="margin: 0in 0in 0pt" align="justify"><span style="font-size: 10pt; font-family: Arial">Effort estimation is the first step to undertaking any software project and a Globalization project is no different. Effort estimation for a product or application which needs to be Globalized follows more or less the same estimation principles as regular maintenance projects, yet there are no defined methods specifically for estimating the amount of I18N or L10N changes required. While working on the proposal for a Globalization project for one of our clients we were faced with the dilemma of adopting standard methodologies like <em>SMC based</em> estimation, <em>FP based</em> estimation etc or trying to create a hybrid and come up with our own estimation model which follows the same estimation principles but is more tailored for globalization projects. Finally we came up with a raw estimation model which was fine tuned over time and gave us estimates which were statistically inline with the results from other maintenance projects.</span><span style="font-size: 10pt; font-family: Arial" /></p>]]>
        <![CDATA[<span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><p align="justify"><span style="font-size: 10pt; font-family: Arial">The first step to estimation is to understand the underlying product. Embarking on a project without complete information generally leads to disaster later. In the initial meetings with the client it is important to understand the current scope of the product. It will be useful to know the target geographies where the product is going to be sold, the current degree of internationalization if any, the platforms which need to be supported, the product architecture etc. Each requirement throws in more challenges in terms of estimation. The technical people involved in the estimation should have prior Globalization experience and understand the various I18N impact points in the code. They should be able to isolate code which needs I18N related changes with the rest of the code. Off course this is a very daunting task when the code base is huge, which is the typical scenario with a product; so we need tools and utilities which can find out all the impact points in the code. There are static analysis tools available which can do this to a certain degree. They can help in finding out the number of hard coded strings in the product, the number of non-Unicode API's and data types used etc and come out with reports which can be further analysed and used while estimation. At Infosys we use our in-house developed Internationalization tool which is rule based and helps in analysing code based on the specific set of rules that we set. This way the reports contain very relevant information which can be directly used in the estimation model. </span></p><p align="justify"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial">At the time of estimation, it is important for the architect to decide on the encoding which will be supported by the product. This decision has a direct binding to the impact points in the code. In case the application has to support UTF-16, most of the API's and data types in a C++ application have to be replaced with their wide char equivalent, while if the application has to support UTF-8, only a handful of string related API's are impacted. The decision to use a particular encoding can prove to be very important since deciding to use a different encoding later at the implementation stage can prove to be very expensive and introduces risks in the quality and schedule of the project. Every encoding has its pros and cons and it must be well debated before going ahead with the decision. If there is database support in the product, the database layer should be analysed so that data that flows in and out of the database is in the required encoding. All internal and external interfaces of the application must be analysed so that the data flowing between modules or applications has an encoding which the communication layer can understand. The tools which help in estimation have a limited scope and the rest depends on the expertise of the person analysing the code and design documents. </span></span></span></p><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial" /></span></span><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial">The software estimation process breaks down the requirements into sub requirements which are made as granular as possible. At a very granular level if we know the number of API's or data types we need to change, we can roughly estimate the effort required to make those changes. If we know the third party tools the application interfaces with, we can estimate the effort required to internationalize the external interfaces or upgrade the third party tools to their Unicode supported version. A simple requirement like Unicode support for the UI translates to creating resource files for all locales, getting the number of strings which need to be externalized into those resource files, creating a library for reading and writing to the resource files etc. In this way we estimate at the very granular levels always taking into account our past experiences while making similar changes and the organization wide <em>PCB</em> (Process Capability Baseline) metrics. This estimation model is based on the bottom-up approach where estimates at the very root level finally add up to give the total development effort. To this we add the usual project management and testing efforts and come up with a final estimate.</span></span></span></span></span></span></span></span></span></span><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"> </span></span></span></span></span></span></span></span></span></span></span><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial" /></span></span></span></span></span></span></span></span></span></span><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><p align="justify"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial"><span style="font-size: 10pt; font-family: Arial; mso-fareast-font-family: 'MS Mincho'; mso-ansi-language: EN-AU; mso-fareast-language: JA; mso-bidi-language: AR-SA">The key to the whole estimation process is understanding the product and coming up with an exhaustive list of I18N impact areas and breaking them down into measurable entities which can be analysed manually or using tools. Like any other estimation process, this may or may not be very accurate, but after applying this to several Globalization projects, the model gets more and more well defined and the estimates are much more accurate. I am sure there are other estimation models people have experimented with while estimating effort for Globalization projects. It will be interesting to discuss alternate models and understand the pros and cons of each.</span></span></span></p></span></span></span></span></span></span></span></span></span></span></span></span></span>]]>
    </content>
</entry>
<entry>
    <title>Embrace Parallelism with Virtual Machines</title>
    <link rel="alternate" type="text/html" href="http://www.infosysblogs.com/engineering-software/2009/11/embrace_parallelism_with_virtu.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.infosysblogs.com/engineering-software-mt/mt-atom.cgi/weblog/blog_id=1/entry_id=13" title="Embrace Parallelism with Virtual Machines" />
    <id>tag:www.infosysblogs.com,2009:/engineering-software//1.13</id>
    
    <published>2009-11-10T14:21:15Z</published>
    <updated>2009-11-10T14:27:36Z</updated>
    
    <summary>For solutions in a wider canvas involving large software, virtual machines can be used to parallelize solutions - especially in today&apos;s cloud computing environment.</summary>
    <author>
        <name>Suraj Nair</name>
        
    </author>
            <category term="Virtualization" />
    
    <content type="html" xml:lang="en" xml:base="http://www.infosysblogs.com/engineering-software/">
        Parallelism has until recently been a term associated with the world of high performance computing. Though humans have been endowed naturally with the ability to &apos;parallelize&apos; worldly activities (one dangerous manifestation of which is the tendency to talk on the cell while driving your car), designing systems to embrace parallelism has always required that extra bit of mental effort. 
        <![CDATA[<p>Software architects and designers have been spoilt by the scale up in processor speeds over the years thanks to Moore's Law. But, that's all changing at a very fast pace today. Today's hardware designs (multicore based processors) require software to be designed with parallelism built into the system for performance scaling over the next generation hardware (with the promise of multiplying cores) . Herb Sutter has very beautifully explained the reasons for the need for change in software design strategy in his famous article &quot;The Free Lunch is Over&quot; - published sometime in 2005.</p><p>Let us consider software system that is running on a given server on modern multicore hardware. The system is considered to be well scaled if it is able to spawn enough threads to keep all the available cores busy and hence speeds up execution (for now, let's ignore specifics like overhead of thread management as against the benefit of parallelized execution etc.). Is there a limit to the ability to scale here ? Yes and it is the number of cores that are available on the server. Ideally, you would want to be able to dynamically make more and more cores available to be able to consume the tasks that are parallelizable but waiting for a core to be freed - but then you are limited by the number of cores available.</p><p>Moving the above problem to a wider canvas on a larger scale (the cloud computing environment for example), let's understand that software that can scale across multiple servers exhibits parallelism in it's own way. In this context, the processing element is a 'server' as against a 'core'. Ofcourse, making more and more server machines available to feed the software's hunger for parallelism is easier said that done (considering the high costs of server procurement, server management overhead, power management etc.) - unless you have considered virtualization.</p><p>Virtualization solutions to such problems would involve having the ability to to distribute incoming requests across a number of virtual machines - instead of physical servers. Such solutions have the ability to generate additional processing elements (virtual machines) based on increasing workload so that all available parallelizable tasks are being catered to. Besides, with reduced workloads, unused virtual machines can be made dormant thus saving on power costs etc . Notice, the ability to truly scale efficiently is not hindered by processor configurations (as seen in the standalone software system illustration above). Considerations along these lines are probably important considerations in architecting cloud computing solutions.</p><p>The concept of a 'processing element' in parallel programming patterns- which was usually a processing core - can now be extended to include virtual machines too . With that, it becomes easier to relate common parallel design patterns like Task Decomposition, Data Decomposition etc. to solving large problems using virtual machines. Embracing parallelism with virtual machines is a reality today.</p><p><br />&nbsp;</p>]]>
    </content>
</entry>

</feed> 

