20 resources on (migrating to) Unicode with Delphi

Are you maintaining an application that has been around for more than 5, 6 years? Using a pre-Unicode Delphi version (pre D2009) to do so?

Would you like to switch to a newer Delphi version to gain the advantages of Unicode, generics, extended RTTI, 64 bit, the REST client library and other such niceties? So you can build new features for your users in ways that are not possible, or not cost effective, with your current version? Or perhaps so you can offer your application as a multi-platform solution?

Or, are you already on a Unicode enabled Delphi version and now faced with having to deal with textual data coming at you from all sorts of sources in all sorts of different character encodings (ASCII / Ansi being just one of them)?

Is having to deal with the Ansi/ASCII to Unicode conversion holding you back?

Do you dread having to deal with all the string types when reading or writing your data?

You shouldn’t.

Strings are still strings albeit with a different encoding.

As long as you haven’t done any fancy tricks, or (ab)used arrays of chars where you should have used arrays of bytes, your application should make the transition from the pre-Unicode world without too much hassle.

The additions to the RTL to support Unicode make dealing with files using different character encodings and Unicode transformation formats relatively straightforward.

Below are 20 resources to help you deal with your data in a Unicode world.

1

Computerphile provides a thoroughly enjoyable explanation of why Unicode came to be in the first place. He also illustrates the “greatest hack” which nowadays is Unicode’s most ubiquitous transformation format, UTF-8:

Characters, Symbols and the Unicode Miracle – Computerphile (video)

2 and 3

The number one and two resources on Unicode and Delphi are Cary Jensen’s white paper Delphi Unicode Migration for Mere Mortals: Stories and Advice from the Front Lines (direct link to the pdf), and Marco Cantù’s white paper Delphi and Unicode (direct link to the pdf)

Both include a technical overview on how Delphi implements Unicode support and what parts of your application may be affected by it. Marco’s was the original white paper that accompanied the Delphi 2009 version. Cary’s was published about a year later and has the benefit of including advice based on practical experience.

4

A list on Unicode resources isn’t complete without a reference to Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

A must read for every developer, even if you are not that technically minded and leave the nitty gritty of reading and writing characters to your component and database vendors or colleagues that are more technically inclined.

5, 6 and 7

Nick Hodges wrote a triad of articles on Delphi and Unicode that accompanied the Delphi 2009 launch. They are not as comprehensive as Marco’s and Cary’s white papers. They do give a quick overview of Delphi’s Unicode capabilities and introduce pre-D2009 version users to a couple of interesting additions to the RTL.

8, 9, 10, 11, 12 and 13

Delphi’s Unicode implementation comes with a gotcha. Many text functions come in two flavors: the plain one and the Ansi one. For example CompareText and AnsiCompareText. CompareText compares text without giving any thought to the locale in which the text is used. AnsiCompareText is locale sensitive.

When these were introduced there names seemed like a good idea. After all, the way to deal with locale issues was to use the Ansi “extension” of the ASCII character encoding.

With the introduction of Unicode support it became obvious that naming locale sensitive functions for the implementation of that locale sensitivity wasn’t the brightest idea. Especially as the names needed to be kept for backwards compatibility reasons.

In the Unicode world, where you need to deal with the difference between Unicode and ASCII/Ansi character encodings, having to use Ansi named functions for locale sensitivity, is confusing to say the least.

Another couple of interesting tidbits and useful experiences can be found in these posts:

14

If converting to and from Unicode is something that you need to do a lot, then the DIConverters library (LGPL open source) may be of help to you. It is a Delphi character conversion library that provides conversion functions for a dozen dozen character encodings.

15

Working with Unicode in XML files can present some challenges. Guidelines by the World Wide Web Consortium (W3C) can be found in Unicode in XML and other Markup Languages.

16, 17, 18, 19 and 20

The Unicode specification is incredibly extensive. You can quite literally get lost in there. The pages and “entry points” I have found most useful are:

Bonus

The Unicode specification contains not only characters, but also punctuation marks, diacritics, mathematical symbols, technical symbols, arrows, dingbats, emoji, etc. Version 7.0 provides codes for 112,956 characters from the world’s alphabets, ideograph sets, and symbol collections.

A few examples from the character code charts are: braille, dingbats, chess, domino tiles, Mahjong tiles, playing cards, musical symbols, cuneiform, technical symbols, transport and map symbols, mathematical symbols and operators, and much much more.

You name it and Unicode probably has it. Just have a look through the Miscellaneous Symbols And Pictographs chart. It would seem that there is hardly anything left that you can’t depict with a single Unicode character.

Just one caveat: the font of your choice needs to support them if you want to use Unicode characters to display them.

Cases in point for example are a pile of poo and a dove of peace. My Chrome doesn’t render these characters so it is just as well that the webpage provides a server generated image.

Posted in Software Development
Tags: , , , , , , ,
2 comments on “20 resources on (migrating to) Unicode with Delphi
  1. Darian Miller says:

    Possible additions:
    source converter:
    http://www.innovasolutions.com.au/delphistuf/ADUGStringToAnsiStringConv.htm

    scanner with statistics on possible upgrade issues:
    http://cc.embarcadero.com/item/27398

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Show Buttons
Hide Buttons