C++ Boost

Boost.Regex

Unicode Regular Expressions.

Boost.Regex Index


There are two ways to use Boost.Regex with Unicode strings:

Rely on wchar_t

If your platform's wchar_t type can hold Unicode strings, and your platform's C/C++ runtime correctly handles wide character constants (when passed to std::iswspace std::iswlower etc), then you can use boost::wregex to process Unicode.  However, there are several disadvantages to this approach:

Use a Unicode Aware Regular Expression Type.

If you have the ICU library, then Boost.Regex can be configured to make use of it, and provide a distinct regular expression type (boost::u32regex), that supports both Unicode specific character properties, and the searching of text that is encoded in either UTF-8, UTF-16, or UTF-32.  See: ICU string class support.


Revised  04 Jan 2005 

© Copyright John Maddock 2005

Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)