Pages

Wednesday, October 14, 2015

Dealing with German Characters in XML/Oracle

There are couple of ways to handle German characters like Ü,Ä,Ö,ä,ö,ü,ß in XML and Oracle queries.

In case of XML, we use the XMLserialize function with encoding options.

The most common encoding options for printing German characters are:






The commonly used encoding option UTF-8  does not support German characters.

The below are the programming way of explaining these encoding options:

SELECT XMLSerialize(DOCUMENT XMLType('<BODY>äÄßÜüÜberwachung</BODY>') as BLOB ENCODING 'UTF-8' VERSION '1.0' INDENT SIZE = 2)

  AS xmlserialize_doc FROM DUAL 
As you see the UTF-8 encoding option does not support German characters.

Lets see the other options.

SELECT XMLSerialize(DOCUMENT XMLType('<BODY>äÄßÜüÜberwachung</BODY>') as BLOB ENCODING 'ISO-8859-1' VERSION '1.0' INDENT SIZE = 2)
  AS xmlserialize_doc FROM DUAL;

This encoding is working :)

  SELECT XMLSerialize(DOCUMENT XMLType('<BODY>äÄßÜüÜberwachung</BODY>') as BLOB ENCODING 'windows-1252' VERSION '1.0' INDENT SIZE = 2)
  AS xmlserialize_doc FROM DUAL;

This option is working as well.

The only problem with this approach is that output will be in BLOB format and we need a function to convert that into CLOB to use it.

We can use the below approach as well to use it in the oracle objects like package or function:

SELECT REPLACE (
          REPLACE (
             REPLACE (
                REPLACE (
                   REPLACE (
                      REPLACE (REPLACE ('ÜäÖößü', 'ß', 'ss'),
                               'Ü',
                               'Ue'),
                      'Ö',
                      'Oe'),
                   'Ä',
                   'Ae'),
                'ä',
                'ae'),
             'ü',
             'ue'),
          'ö',
          'oe')
  FROM DUAL;

This will replace the German characters into the English equivalent.

This approach will be useful if the encoding option is not supported by systems like Interfaces.

The below are the HTML notation for the German characters:

  • ä -> &auml;
  • Ä -> &Auml;
  • ö -> &ouml;
  • Ö -> &Ouml;
  • ü -> &uuml;
  • Ü -> &Uuml;
  • ß -> &szlig;
  • € -> &euro;
  • & -> &amp;
  • < -> &lt;
  • > -> &gt;
  • „ -> &quot;
  • © -> &copy;
  • • -> &bull;
  • ™ -> &trade;
  • ® -> &reg;
  • § -> &sect;
  • | -> |








No comments:

Post a Comment