Previous Up Next

Chapter 8  Code set conversion

omniORB supports full code set negotiation, used to select and translate between different character sets, for the transmission of chars, strings, wchars and wstrings. The support is mostly transparent to application code, but there are a number of options that can be selected. This chapter covers the options, and also gives some pointers about how to implement your own code sets, in case the ones that come with omniORB are not sufficient.

8.1  Native code set

For the ORB to know how to handle strings given to it by the application, it must know what code set they are represented with, so it can properly translate them if need be.

For Python 2.x, the default is ISO 8859-1 (Latin 1). A different code set can be chosen at initialisation time with the nativeCharCodeSet parameter. The supported code sets are printed out at initialisation time if the ORB traceLevel is 15 or greater. Some applications may need to set the native char code set to UTF-8, allowing the full Unicode range to be supported in strings.

In Python 3.x, all Python strings are Unicode, so it always behaves as if the native char code set is UTF-8.

wchar and wstring are always represented by the Python Unicode type, so there is no need to select a native code set for wchar.

8.2  Default code sets

The way code set conversion is meant to work in CORBA communication is that each client and server has a native code set that it uses for character data in application code, and supports a number of transmission code sets that is uses for communication. When a client connects to a server, the client picks one of the server’s transmission code sets to use for the interaction. For that to work, the client plainly has to know the server’s supported transmission code sets.

Code set information from servers is embedded in IORs. A client with an IOR from a server should therefore know what transmission code sets the server supports. This approach can fail for two reasons:

  1. A corbaloc URI (see chapter 7) does not contain any code set information.
  2. Some badly-behaved servers that do support code set conversion fail to put codeset information in their IORs.

The CORBA standard says that if a server has not specified transmission code set information, clients must assume that they only support ISO-8859-1 for char and string, and do not support wchar and wstring at all. The effect is that client code receives DATA_CONVERSION or BAD_PARAM exceptions.

To avoid this issue, omniORB allows you to configure default code sets that are used as a server’s transmission code sets if they are not otherwise known. Set defaultCharCodeSet for char and string data, and defaultWCharCodeSet for wchar and wstring data.

8.3  Code set library

To save space in the main ORB core library, most of the code set implementations are in a separate library. To load it from Python, you must import the omniORB.codesets module before calling CORBA.ORB_init().

8.4  Implementing new code sets

Code sets must currently be implemented in C++. See the omniORB for C++ documentation for details.


Previous Up Next