Is Big Bang the right approach to Internationalization?
Over the years our project teams have matured in the way they handle the implementation of an Internationalization project, however things were not always so smooth. There were times when the project was tested and delivered to the client, but it refused to work on the client’s machines. The offshore team just couldn’t figure out the reason for this to happen. A lot of fire fighting effort was then required to get things back on track and take corrective actions. Most of the problems were due to wrong planning, lack of technical understanding and incorrect assumptions. Things are pretty much streamlined now with an i18n Center of Excellence (CoE), i18n frameworks, analysis tools, POC’s and best practices in place. Here I am going to recollect my earliest Internationalization experience and what we learnt from it.
Almost a decade back, during one of our assignments we were engaged with a Japanese client. They had an English product which they wanted us to internationalize and subsequently localize to Japanese. Internationalization was a known concept around that time but we did not have adequate practical experience with such work. We had a team of people who were familiar with the concept of Localization and they were brought into the team. Some more relatively lesser experienced people were also added to the project. The product was written in Visual C++ and so the objective of choosing the team was to get people with adequate understanding of C++ and train them on Internationalization and Localization concepts.
The requirements were gathered, process documents were created and the team came up with the implementation and release plan. As with all Japanese projects, time to release was a critical factor and the offshore team did not have much time to ramp up their i18n skills. At the concept level the team had an understanding that anything that is shown in English on the UI must now be shown in Japanese. So the approach was to find all the hard coded strings in the source code and move them to an external resource file. Secondly since native C++ functions and data types do not have support for Unicode, they had to be replaced with their wide char equivalents. This means a ‘char’ variable should be replaced with ‘wchar_t’ and functions like ‘strcpy’ should be replaced with ‘wcscpy’. None of the team members had an understanding of the repercussions of making these changes and since time was ticking away like a bomb, it was decided to follow the Big Bang approach and do a find-replace on the entire source code since analyzing the data flow in the source code to find impacted areas would have taken too much time. Subsequently scripts were written to automate the whole process and substitute all the data types and functions with their wide char equivalents and substitute all hard coded strings with resource bundle calls. With the approach neatly lined up the team got busy making the substitutions and compiling the individual modules. Finally all the changes were complete and the source code was compiled. Since the Localization to Japanese was not yet done, the product was tested using the English resource files and everything worked as expected on all the offshore machines. The product was delivered right on time to the customer. It was now time to sit back and wait for the appreciation mails to flow in.
The customer installed the product on one of their Japanese machines and tried to launch it. The application crashed. No matter what combinations they tried the application refused to launch. The customer pressed the panic button. The offshore team could not figure out the reason for the crash. They got a machine with Japanese OS and tried running the application on it. It worked fine. After understanding the customer’s environment, it was decided to install the product in a folder having a Japanese name. The product failed to launch and crashed. The code was debugged and it was found that one of the replaced wide char functions was the culprit. Pointer arithmetic on data bytes was not modified to reflect the fact that a character could now be represented by multiple bytes; and so at some point this resulted in incorrect processing, corrupt data and eventually a crash. This happened as the team had followed the Big Bang approach and just replaced all the impacted functions with the wide char equivalents without analyzing the data processing logic. It is not just enough to use wide char functions; a thought has to be given to the usage as well. Subsequently an extension was sought and corrective measures were taken and the project was eventually delivered in perfect working condition for the Japanese environment. The initial approach had backfired and quite a few lessons were learnt from the experience,
- Have the right team - Your team might comprise of people with 5+ years of experience, but when it comes to Internationalization, it is important to have a team which understands the concepts and technical aspects of Internationalization. It will shorten the development cycle and the end product will have lesser defects.
- Have the right processes in place - A Big Bang approach is always dangerous to start off with. A more mature implementation methodology is required. Checklists must be in place to ensure that when a particular change is made; all other changes related to that change are also dealt with. Internationalization changes can have cascading effects on other areas of the code. Changes should be done module wise or feature wise so that defects are caught earlier and in a localized manner instead of taking the Big Bang approach and messing up the entire code.
- Analysis is more important than development - It is very important to have a team of experts who will analyze the source code to find all areas which need to be modified to support Unicode. It is quite possible that some functions and data types need no change because they will not be handling any Unicode data. In such cases replacing them with their wide char equivalents is an overhead and could contribute to a performance hit. It is also important to understand the data flow in the application so that the required changes can be done in the code to handle encoding conversions etc in the functions or external interfaces. The memory usage of the application also increases when you support Unicode, hence the code must be analyzed to increase memory allocations only in the impacted areas. The Big Bang approach doesn’t check for all these things and it mostly leads to bloated code which uses more memory than desired and under-performs at runtime.
- Use the right Tools - Using the right set of tools during development can speed up the development process. There are a lot of commercial tools available in the market which can help in static analysis of the source code. Infosys has developed a set of in-house tools for Internationalization and Localization. Among other features in the tool set, it helps reduction in analysis time by auto-detecting all areas in the source code where i18n changes are possibly required. It can also help later in assessing the i18n readiness of the product. However it should be kept in mind that tools are not a substitute for experienced people. While they can help increase productivity, the developers should still have an understanding of the i18n concepts in order to interpret the output of the tools correctly.
- Do not make assumptions regarding the input data - In the scenario above, the team assumed that since the product was working with English inputs, it should also work with Japanese inputs. It is wrong to make such assumptions. A Japanese user can input filenames in Japanese or try saving a file in a folder with a Japanese name. The code should anticipate such use-cases.
- Have the right test environment - Just because there is no language translation expert in the team, it is inadequate to test the product with English data. This will definitely spring some nasty surprises later when the product is deployed in a pure Japanese environment. You should either plan for localization at the time of testing or use alternate approaches like Pseudo-localization and make sure the product is tested with Japanese strings as well.
The Big Bang approach is similar to cooking a dish by mixing all the ingredients into the pan at the same time. The outcome is unpredictable and in most cases will not get you the desired result. It is better to follow a systematic approach which will guarantee success as well as allow you to take corrective actions as and when something appears to be going wrong, rather than waiting for disaster to happen and start cooking all over again.