Infosys delivers concept-to-market software engineering services across the engineering value chain. Our blog will discuss the latest trends in software product engineering, outsourcing, technologies, and address business challenges.

« Embracing the open source world | Main | Internationalization and its dimensions in Product space »

Software service in the Japanese market

The Japanese market remains an important target in localization schedules of internationalized products. Despite China replacing Japan as the world's second largest economy, the number of requests to support 'Japanese localization' in internationalized software products as a priority does not see any immediate decline. Understanding Japanese encoding schemes requires a great deal of effort - especially for someone who has been spoilt by the simple elegance of ASCII encoded text. Though Unicode is THE way ahead, one must understand that there are still thousands of legacy products out in the domestic market - a complete rewrite of which is not a viable business option in the current economic climate in Japan. As more and more important Japanese businesses outsource legacy software maintenance /enhancements to service providers, quality handling of such software will require a decent understanding of Japanese text representation and encoding schemes.

The Japanese language has mainly three writing systems - hiragana, katakana and kanji. While kanjis are pictographical scripts borrowed from the Chinese alphabet, hiragana and katakana are 'alphabetical' characters representing syllables. While hiragana is used in words representing words of Japanese origin (you would use hiragana to write "nihon" which means "Japan"), katakana is used to represent words of foreign origin (you would use katakana to write "tabako" which means cigarette but derived from the word "tobacco").Kanjis are generally the pictographical representations and are very commonly used in Japan. Thus, the Japanese character set would ideally mean - all of the hiragana, katakana and kanji characters used in the Japanese writing system.

The English text (character set containing alphabets, punctuations etc) in computers, communications equipment and other devices that use text - has been represented by the ASCII character-encoding scheme. Similarly, all Japanese text (used on computerized interfaces) has been represented in the JIS (Japanese Industrial Standard) character set as per the standards defined by the Japanese Standard Association before the advent of Unicode.
Interestingly, the JIS character set is actually a combination of several standard character sets - JIS X 0201(deals with roman characters and half width katakana), JIS X 0208 (full width katakana, hiragana, punctuation and a number of kanji characters), JIS X 0210 (Rare kanji, non-English European characters etc), JIS X 0213 (a new encoding scheme introduced in the beginning of this decade).

There are essentially three JIS encoding schemes to represent the JIS character set - Shift JIS, EUC, and ISO-2022-JP. From a software point of view working on Japanese localized systems, one should be more concerned about the Shift JIS and EUC encoding schemes. Shift JIS is a common encoding of JIS on Windows platforms, while EUC is a common encoding standard on UNIX systems. However, you will find support for Shift-JIS on UNIX too (PCK on Solaris for example).CP-932 is a Microsoft's extension of Shift JIS to include some NEC special characters and IBM extensions.

The rules of parsing text in the language vary across the encoding schemes being used. As almost all Japanese characters are represented using multiple bytes, the rules for determination of what is a single byte character, what is a lead-byte and what is a trailing byte is different based on the encoding of the text.  A more viable option in today's Unicode world would be to convert received text to a defined Unicode encoding and then use popularly available Unicode libraries to perform necessary string processing, before converting back to the original native encoding for display/third party interfacing. 

If you have been working on maintaining legacy software originating from Japanese companies, you would surely require to know about the various encoding schemes that are used to represent Japanese text. A decent knowledge of a few Japanese words, and how to enter them as Hiragana, Katakana and Kanji - could help in performing some basic tests to validate correctness of source code changes in the application. In addition, the ability to generate the text (using editors like Hidemaru for example) as per a defined encoding would be extremely helpful in testing software for the Japanese market.

 

TrackBack

TrackBack URL for this entry:
http://www.infosysblogs.com/apps/mt-tb.cgi/3969

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Please key in the two words you see in the box to validate your identity as an authentic user and reduce spam.