icu::final Class Reference

This class allows one to iterate through all the strings that are canonically equivalent to a given string. More...

#include <caniter.h>

Inheritance diagram for icu::final:
icu::UnicodeFilter icu::UMemory icu::UMemory icu::UMemory icu::UObject icu::UnicodeFunctor icu::UnicodeMatcher icu::UMemory icu::UObject icu::UMemory

Data Structures

struct  final
 Access to the list of edits. More...

Public Types

enum  { MIN_VALUE = 0, MAX_VALUE = 0x10ffff }
enum  ESerialization { kSerialized }

Public Member Functions

 CanonicalIterator (const UnicodeString &source, UErrorCode &status)
 Construct a CanonicalIterator object.
virtual ~CanonicalIterator ()
 Destructor Cleans pieces.
UnicodeString getSource ()
 Gets the NFD form of the current source we are iterating over.
void reset ()
 Resets the iterator so that one can start again from the beginning.
UnicodeString next ()
 Get the next canonically equivalent string.
void setSource (const UnicodeString &newSource, UErrorCode &status)
 Set a new source for this iterator.
virtual UClassID getDynamicClassID () const
 ICU "poor man's RTTI", returns a UClassID for the actual class.
 Char16Ptr (char16_t *p)
 Copies the pointer.
 Char16Ptr (uint16_t *p)
 Converts the pointer to char16_t *.
 Char16Ptr (wchar_t *p)
 Converts the pointer to char16_t *.
 Char16Ptr (std::nullptr_t p)
 nullptr constructor.
 ~Char16Ptr ()
 Destructor.
char16_t * get () const
 Pointer access.
 operator char16_t * () const
 char16_t pointer access via type conversion (e.g., static_cast).
 ConstChar16Ptr (const char16_t *p)
 Copies the pointer.
 ConstChar16Ptr (const uint16_t *p)
 Converts the pointer to char16_t *.
 ConstChar16Ptr (const wchar_t *p)
 Converts the pointer to char16_t *.
 ConstChar16Ptr (const std::nullptr_t p)
 nullptr constructor.
 ~ConstChar16Ptr ()
 Destructor.
const char16_t * get () const
 Pointer access.
 operator const char16_t * () const
 char16_t pointer access via type conversion (e.g., static_cast).
 Edits ()
 Constructs an empty object.
 Edits (const Edits &other)
 Copy constructor.
 Edits (Edits &&src) U_NOEXCEPT
 Move constructor, might leave src empty.
 ~Edits ()
 Destructor.
Edits & operator= (const Edits &other)
 Assignment operator.
Edits & operator= (Edits &&src) U_NOEXCEPT
 Move assignment operator, might leave src empty.
void reset () U_NOEXCEPT
 Resets the data but may not release memory.
void addUnchanged (int32_t unchangedLength)
 Adds a no-change edit: a record for an unchanged segment of text.
void addReplace (int32_t oldLength, int32_t newLength)
 Adds a change edit: a record for a text replacement/insertion/deletion.
UBool copyErrorTo (UErrorCode &outErrorCode) const
 Sets the UErrorCode if an error occurred while recording edits.
int32_t lengthDelta () const
 How much longer is the new text compared with the old text?
UBool hasChanges () const
int32_t numberOfChanges () const
Iterator getCoarseChangesIterator () const
 Returns an Iterator for coarse-grained change edits (adjacent change edits are treated as one).
Iterator getCoarseIterator () const
 Returns an Iterator for coarse-grained change and no-change edits (adjacent change edits are treated as one).
Iterator getFineChangesIterator () const
 Returns an Iterator for fine-grained change edits (full granularity of change edits is retained).
Iterator getFineIterator () const
 Returns an Iterator for fine-grained change and no-change edits (full granularity of change edits is retained).
Edits & mergeAndAppend (const Edits &ab, const Edits &bc, UErrorCode &errorCode)
 Merges the two input Edits and appends the result to this object.
 SimpleFormatter ()
 Default constructor.
 SimpleFormatter (const UnicodeString &pattern, UErrorCode &errorCode)
 Constructs a formatter from the pattern string.
 SimpleFormatter (const UnicodeString &pattern, int32_t min, int32_t max, UErrorCode &errorCode)
 Constructs a formatter from the pattern string.
 SimpleFormatter (const SimpleFormatter &other)
 Copy constructor.
SimpleFormatter & operator= (const SimpleFormatter &other)
 Assignment operator.
 ~SimpleFormatter ()
 Destructor.
UBool applyPattern (const UnicodeString &pattern, UErrorCode &errorCode)
 Changes this object according to the new pattern.
UBool applyPatternMinMaxArguments (const UnicodeString &pattern, int32_t min, int32_t max, UErrorCode &errorCode)
 Changes this object according to the new pattern.
int32_t getArgumentLimit () const
UnicodeStringformat (const UnicodeString &value0, UnicodeString &appendTo, UErrorCode &errorCode) const
 Formats the given value, appending to the appendTo builder.
UnicodeStringformat (const UnicodeString &value0, const UnicodeString &value1, UnicodeString &appendTo, UErrorCode &errorCode) const
 Formats the given values, appending to the appendTo builder.
UnicodeStringformat (const UnicodeString &value0, const UnicodeString &value1, const UnicodeString &value2, UnicodeString &appendTo, UErrorCode &errorCode) const
 Formats the given values, appending to the appendTo builder.
UnicodeStringformatAndAppend (const UnicodeString *const *values, int32_t valuesLength, UnicodeString &appendTo, int32_t *offsets, int32_t offsetsLength, UErrorCode &errorCode) const
 Formats the given values, appending to the appendTo string.
UnicodeStringformatAndReplace (const UnicodeString *const *values, int32_t valuesLength, UnicodeString &result, int32_t *offsets, int32_t offsetsLength, UErrorCode &errorCode) const
 Formats the given values, replacing the contents of the result string.
UnicodeString getTextWithNoArguments () const
 Returns the pattern text with none of the arguments.
UnicodeString getTextWithNoArguments (int32_t *offsets, int32_t offsetsLength) const
 Returns the pattern text with none of the arguments.
UBool isBogus (void) const
 Determine if this object contains a valid set.
void setToBogus ()
 Make this UnicodeSet object invalid.
 UnicodeSet ()
 Constructs an empty set.
 UnicodeSet (UChar32 start, UChar32 end)
 Constructs a set containing the given range.
 UnicodeSet (const uint16_t buffer[], int32_t bufferLen, ESerialization serialization, UErrorCode &status)
 Constructs a set from the output of serialize().
 UnicodeSet (const UnicodeString &pattern, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeString &pattern, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeString &pattern, ParsePosition &pos, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeSet &o)
 Constructs a set that is identical to the given UnicodeSet.
virtual ~UnicodeSet ()
 Destructs the set.
UnicodeSet & operator= (const UnicodeSet &o)
 Assigns this object to be a copy of another.
virtual UBool operator== (const UnicodeSet &o) const
 Compares the specified object with this set for equality.
UBool operator!= (const UnicodeSet &o) const
 Compares the specified object with this set for equality.
virtual UnicodeSet * clone () const
 Returns a copy of this object.
virtual int32_t hashCode (void) const
 Returns the hash code value for this set.
USettoUSet ()
 Produce a USet * pointer for this UnicodeSet.
const USettoUSet () const
 Produce a const USet * pointer for this UnicodeSet.
UBool isFrozen () const
 Determines whether the set has been frozen (made immutable) or not.
UnicodeSet * freeze ()
 Freeze the set (make it immutable).
UnicodeSet * cloneAsThawed () const
 Clone the set and make the clone mutable.
UnicodeSet & set (UChar32 start, UChar32 end)
 Make this object represent the range `start - end`.
UnicodeSet & applyPattern (const UnicodeString &pattern, UErrorCode &status)
 Modifies this set to represent the set specified by the given pattern, ignoring Unicode Pattern_White_Space characters.
UnicodeSet & applyPattern (const UnicodeString &pattern, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Modifies this set to represent the set specified by the given pattern, optionally ignoring Unicode Pattern_White_Space characters.
UnicodeSet & applyPattern (const UnicodeString &pattern, ParsePosition &pos, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Parses the given pattern, starting at the given position.
virtual UnicodeStringtoPattern (UnicodeString &result, UBool escapeUnprintable=FALSE) const
 Returns a string representation of this set.
UnicodeSet & applyIntPropertyValue (UProperty prop, int32_t value, UErrorCode &ec)
 Modifies this set to contain those code points which have the given value for the given binary or enumerated property, as returned by u_getIntPropertyValue.
UnicodeSet & applyPropertyAlias (const UnicodeString &prop, const UnicodeString &value, UErrorCode &ec)
 Modifies this set to contain those code points which have the given value for the given property.
virtual int32_t size (void) const
 Returns the number of elements in this set (its cardinality).
virtual UBool isEmpty (void) const
 Returns true if this set contains no elements.
virtual UBool contains (UChar32 c) const
 Returns true if this set contains the given character.
virtual UBool contains (UChar32 start, UChar32 end) const
 Returns true if this set contains every character of the given range.
UBool contains (const UnicodeString &s) const
 Returns true if this set contains the given multicharacter string.
virtual UBool containsAll (const UnicodeSet &c) const
 Returns true if this set contains all the characters and strings of the given set.
UBool containsAll (const UnicodeString &s) const
 Returns true if this set contains all the characters of the given string.
UBool containsNone (UChar32 start, UChar32 end) const
 Returns true if this set contains none of the characters of the given range.
UBool containsNone (const UnicodeSet &c) const
 Returns true if this set contains none of the characters and strings of the given set.
UBool containsNone (const UnicodeString &s) const
 Returns true if this set contains none of the characters of the given string.
UBool containsSome (UChar32 start, UChar32 end) const
 Returns true if this set contains one or more of the characters in the given range.
UBool containsSome (const UnicodeSet &s) const
 Returns true if this set contains one or more of the characters and strings of the given set.
UBool containsSome (const UnicodeString &s) const
 Returns true if this set contains one or more of the characters of the given string.
int32_t span (const char16_t *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t span (const UnicodeString &s, int32_t start, USetSpanCondition spanCondition) const
 Returns the end of the substring of the input string according to the USetSpanCondition.
int32_t spanBack (const char16_t *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t spanBack (const UnicodeString &s, int32_t limit, USetSpanCondition spanCondition) const
 Returns the start of the substring of the input string according to the USetSpanCondition.
int32_t spanUTF8 (const char *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t spanBackUTF8 (const char *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
virtual UMatchDegree matches (const Replaceable &text, int32_t &offset, int32_t limit, UBool incremental)
 Implement UnicodeMatcher::matches().
virtual void addMatchSetTo (UnicodeSet &toUnionTo) const
 Implementation of UnicodeMatcher API.
int32_t indexOf (UChar32 c) const
 Returns the index of the given character within this set, where the set is ordered by ascending code point.
UChar32 charAt (int32_t index) const
 Returns the character at the given index within this set, where the set is ordered by ascending code point.
virtual UnicodeSet & add (UChar32 start, UChar32 end)
 Adds the specified range to this set if it is not already present.
UnicodeSet & add (UChar32 c)
 Adds the specified character to this set if it is not already present.
UnicodeSet & add (const UnicodeString &s)
 Adds the specified multicharacter to this set if it is not already present.
UnicodeSet & addAll (const UnicodeString &s)
 Adds each of the characters in this string to the set.
UnicodeSet & retainAll (const UnicodeString &s)
 Retains EACH of the characters in this string.
UnicodeSet & complementAll (const UnicodeString &s)
 Complement EACH of the characters in this string.
UnicodeSet & removeAll (const UnicodeString &s)
 Remove EACH of the characters in this string.
virtual UnicodeSet & retain (UChar32 start, UChar32 end)
 Retain only the elements in this set that are contained in the specified range.
UnicodeSet & retain (UChar32 c)
 Retain the specified character from this set if it is present.
virtual UnicodeSet & remove (UChar32 start, UChar32 end)
 Removes the specified range from this set if it is present.
UnicodeSet & remove (UChar32 c)
 Removes the specified character from this set if it is present.
UnicodeSet & remove (const UnicodeString &s)
 Removes the specified string from this set if it is present.
virtual UnicodeSet & complement (void)
 Inverts this set.
virtual UnicodeSet & complement (UChar32 start, UChar32 end)
 Complements the specified range in this set.
UnicodeSet & complement (UChar32 c)
 Complements the specified character in this set.
UnicodeSet & complement (const UnicodeString &s)
 Complement the specified string in this set.
virtual UnicodeSet & addAll (const UnicodeSet &c)
 Adds all of the elements in the specified set to this set if they're not already present.
virtual UnicodeSet & retainAll (const UnicodeSet &c)
 Retains only the elements in this set that are contained in the specified set.
virtual UnicodeSet & removeAll (const UnicodeSet &c)
 Removes from this set all of its elements that are contained in the specified set.
virtual UnicodeSet & complementAll (const UnicodeSet &c)
 Complements in this set all elements contained in the specified set.
virtual UnicodeSet & clear (void)
 Removes all of the elements from this set.
UnicodeSet & closeOver (int32_t attribute)
 Close this set over the given attribute.
virtual UnicodeSet & removeAllStrings ()
 Remove all strings from this set.
virtual int32_t getRangeCount (void) const
 Iteration method that returns the number of ranges contained in this set.
virtual UChar32 getRangeStart (int32_t index) const
 Iteration method that returns the first character in the specified range of this set.
virtual UChar32 getRangeEnd (int32_t index) const
 Iteration method that returns the last character in the specified range of this set.
int32_t serialize (uint16_t *dest, int32_t destCapacity, UErrorCode &ec) const
 Serializes this set into an array of 16-bit integers.
virtual UnicodeSet & compact ()
 Reallocate this objects internal structures to take up the least possible space, without changing this object's value.
virtual UClassID getDynamicClassID (void) const
 Implement UnicodeFunctor API.

Static Public Member Functions

static void permute (UnicodeString &source, UBool skipZeros, Hashtable *result, UErrorCode &status)
 Dumb recursive implementation of permutation.
static UClassID getStaticClassID ()
 ICU "poor man's RTTI", returns a UClassID for this class.
static int32_t toLower (const char *locale, uint32_t options, const char16_t *src, int32_t srcLength, char16_t *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Lowercases a UTF-16 string and optionally records edits.
static int32_t toUpper (const char *locale, uint32_t options, const char16_t *src, int32_t srcLength, char16_t *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Uppercases a UTF-16 string and optionally records edits.
static int32_t toTitle (const char *locale, uint32_t options, BreakIterator *iter, const char16_t *src, int32_t srcLength, char16_t *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Titlecases a UTF-16 string and optionally records edits.
static int32_t fold (uint32_t options, const char16_t *src, int32_t srcLength, char16_t *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Case-folds a UTF-16 string and optionally records edits.
static void utf8ToLower (const char *locale, uint32_t options, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode)
 Lowercases a UTF-8 string and optionally records edits.
static void utf8ToUpper (const char *locale, uint32_t options, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode)
 Uppercases a UTF-8 string and optionally records edits.
static void utf8ToTitle (const char *locale, uint32_t options, BreakIterator *iter, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode)
 Titlecases a UTF-8 string and optionally records edits.
static void utf8Fold (uint32_t options, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode)
 Case-folds a UTF-8 string and optionally records edits.
static int32_t utf8ToLower (const char *locale, uint32_t options, const char *src, int32_t srcLength, char *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Lowercases a UTF-8 string and optionally records edits.
static int32_t utf8ToUpper (const char *locale, uint32_t options, const char *src, int32_t srcLength, char *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Uppercases a UTF-8 string and optionally records edits.
static int32_t utf8ToTitle (const char *locale, uint32_t options, BreakIterator *iter, const char *src, int32_t srcLength, char *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Titlecases a UTF-8 string and optionally records edits.
static int32_t utf8Fold (uint32_t options, const char *src, int32_t srcLength, char *dest, int32_t destCapacity, Edits *edits, UErrorCode &errorCode)
 Case-folds a UTF-8 string and optionally records edits.
static UnicodeSet * fromUSet (USet *uset)
 Get a UnicodeSet pointer from a USet.
static const UnicodeSet * fromUSet (const USet *uset)
 Get a UnicodeSet pointer from a const USet.
static UBool resemblesPattern (const UnicodeString &pattern, int32_t pos)
 Return true if the given position, in the given pattern, appears to be the start of a UnicodeSet pattern.
static UnicodeSet * createFrom (const UnicodeString &s)
 Makes a set from a multicharacter string.
static UnicodeSet * createFromAll (const UnicodeString &s)
 Makes a set from each of the characters in the string.
static UClassID getStaticClassID (void)
 Return the class ID for this class.

Friends

class number::impl::SimpleModifier
class USetAccess
class RBBIRuleScanner
class UnicodeSetIterator

Detailed Description

This class allows one to iterate through all the strings that are canonically equivalent to a given string.

A mutable set of Unicode characters and multicharacter strings.

Formats simple patterns like "{1} was born in {0}".

Records lengths of string edits but not replacement text.

const char16_t * wrapper with implicit conversion from distinct but bit-compatible pointer types.

char16_t * wrapper with implicit conversion from distinct but bit-compatible pointer types.

Low-level C++ case mapping functions.

For example, here are some sample results: Results for: {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 1: \u0041\u030A\u0064\u0307\u0327 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 2: \u0041\u030A\u0064\u0327\u0307 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 3: \u0041\u030A\u1E0B\u0327 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 4: \u0041\u030A\u1E11\u0307 = {LATIN CAPITAL LETTER A}{COMBINING RING ABOVE}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE} 5: \u00C5\u0064\u0307\u0327 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 6: \u00C5\u0064\u0327\u0307 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 7: \u00C5\u1E0B\u0327 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 8: \u00C5\u1E11\u0307 = {LATIN CAPITAL LETTER A WITH RING ABOVE}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE} 9: \u212B\u0064\u0307\u0327 = {ANGSTROM SIGN}{LATIN SMALL LETTER D}{COMBINING DOT ABOVE}{COMBINING CEDILLA} 10: \u212B\u0064\u0327\u0307 = {ANGSTROM SIGN}{LATIN SMALL LETTER D}{COMBINING CEDILLA}{COMBINING DOT ABOVE} 11: \u212B\u1E0B\u0327 = {ANGSTROM SIGN}{LATIN SMALL LETTER D WITH DOT ABOVE}{COMBINING CEDILLA} 12: \u212B\u1E11\u0307 = {ANGSTROM SIGN}{LATIN SMALL LETTER D WITH CEDILLA}{COMBINING DOT ABOVE}
Note: the code is intended for use with small strings, and is not suitable for larger ones, since it has not been optimized for that situation. Note, CanonicalIterator is not intended to be subclassed.

Author:
M. Davis
C++ port by V. Weinstein
Stable:
ICU 2.4
Stable:
ICU 59
Stable:
ICU 59
Stable:
ICU 59

Supports replacements, insertions, deletions in linear progression. Does not support moving/reordering of text.

There are two types of edits: change edits and no-change edits. Add edits to instances of this class using addReplace(int32_t, int32_t) (for change edits) and addUnchanged(int32_t) (for no-change edits). Change edits are retained with full granularity, whereas adjacent no-change edits are always merged together. In no-change edits, there is a one-to-one mapping between code points in the source and destination strings.

After all edits have been added, instances of this class should be considered immutable, and an Edits::Iterator can be used for queries.

There are four flavors of Edits::Iterator:

For example, consider the string "abcßDeF", which case-folds to "abcssdef". This string has the following fine edits:

and the following coarse edits (note how adjacent change edits get merged together):

The "fine changes" and "coarse changes" iterators will step through only the change edits when their `EditsIterator::next()` methods are called. They are identical to the non-change iterators when their `EditsIterator::findSourceIndex()` or `EditsIterator::findDestinationIndex()` methods are used to walk through the string.

For examples of how to use this class, see the test `TestCaseMapEditsIteratorDocs` in UCharacterCaseTest.java.

An Edits object tracks a separate UErrorCode, but ICU string transformation functions (e.g., case mapping functions) merge any such errors into their API's UErrorCode.

Stable:
ICU 59

Minimal subset of MessageFormat; fast, simple, minimal dependencies. Supports only numbered arguments with no type nor style parameters, and formats only string values. Quoting via ASCII apostrophe compatible with ICU MessageFormat default behavior.

Factory methods set error codes for syntax errors and for too few or too many arguments/placeholders.

SimpleFormatter objects are thread-safe except for assignment and applying new patterns.

Example:

 UErrorCode errorCode = U_ZERO_ERROR;
 SimpleFormatter fmt("{1} '{born}' in {0}", errorCode);
 UnicodeString result;
 // Output: "paul {born} in england"
 fmt.format("england", "paul", result, errorCode);
 

This class is not intended for public subclassing.

See also:
MessageFormat
UMessagePatternApostropheMode
Stable:
ICU 57

Objects of this class represent character classes used in regular expressions. A character specifies a subset of Unicode code points. Legal code points are U+0000 to U+10FFFF, inclusive.

The UnicodeSet class is not designed to be subclassed.

UnicodeSet supports two APIs. The first is the operand API that allows the caller to modify the value of a UnicodeSet object. It conforms to Java 2's java.util.Set interface, although UnicodeSet does not actually implement that interface. All methods of Set are supported, with the modification that they take a character range or single character instead of an Object, and they take a UnicodeSet instead of a Collection. The operand API may be thought of in terms of boolean logic: a boolean OR is implemented by add, a boolean AND is implemented by retain, a boolean XOR is implemented by complement taking an argument, and a boolean NOT is implemented by complement with no argument. In terms of traditional set theory function names, add is a union, retain is an intersection, remove is an asymmetric difference, and complement with no argument is a set complement with respect to the superset range MIN_VALUE-MAX_VALUE

The second API is the applyPattern()/toPattern() API from the java.text.Format-derived classes. Unlike the methods that add characters, add categories, and control the logic of the set, the method applyPattern() sets all attributes of a UnicodeSet at once, based on a string pattern.

Pattern syntax

Patterns are accepted by the constructors and the applyPattern() methods and returned by the toPattern() method. These patterns follow a syntax similar to that employed by version 8 regular expression character classes. Here are some simple examples:

[]

No characters

[a]

The character 'a'

[ae]

The characters 'a' and 'e'

[a-e]

The characters 'a' through 'e' inclusive, in Unicode code point order

[\u4E01]

The character U+4E01

[a{ab}{ac}]

The character 'a' and the multicharacter strings "ab" and "ac"

[\p{Lu}]

All characters in the general category Uppercase Letter

Any character may be preceded by a backslash in order to remove any special meaning. White space characters, as defined by UCharacter.isWhitespace(), are ignored, unless they are escaped.

Property patterns specify a set of characters having a certain property as defined by the Unicode standard. Both the POSIX-like "[:Lu:]" and the Perl-like syntax "\\p{Lu}" are recognized. For a complete list of supported property patterns, see the User's Guide for UnicodeSet at http://icu-project.org/userguide/unicodeSet.html. Actual determination of property data is defined by the underlying Unicode database as implemented by UCharacter.

Patterns specify individual characters, ranges of characters, and Unicode property sets. When elements are concatenated, they specify their union. To complement a set, place a '^' immediately after the opening '['. Property patterns are inverted by modifying their delimiters; "[:^foo]" and "\\P{foo}". In any other location, '^' has no special meaning.

Ranges are indicated by placing two a '-' between two characters, as in "a-z". This specifies the range of all characters from the left to the right, in Unicode order. If the left character is greater than or equal to the right character it is a syntax error. If a '-' occurs as the first character after the opening '[' or '[^', or if it occurs as the last character before the closing ']', then it is taken as a literal. Thus "[a\-b]", "[-ab]", and "[ab-]" all indicate the same set of three characters, 'a', 'b', and '-'.

Sets may be intersected using the '&' operator or the asymmetric set difference may be taken using the '-' operator, for example, "[[:L:]&[\\u0000-\\u0FFF]]" indicates the set of all Unicode letters with values less than 4096. Operators ('&' and '|') have equal precedence and bind left-to-right. Thus "[[:L:]-[a-z]-[\\u0100-\\u01FF]]" is equivalent to "[[[:L:]-[a-z]]-[\\u0100-\\u01FF]]". This only really matters for difference; intersection is commutative.

[a]

The set containing 'a'

[a-z]

The set containing 'a' through 'z' and all letters in between, in Unicode order

[^a-z]

The set containing all characters but 'a' through 'z', that is, U+0000 through 'a'-1 and 'z'+1 through U+10FFFF

[[pat1][pat2]]

The union of sets specified by pat1 and pat2

[[pat1]&[pat2]]

The intersection of sets specified by pat1 and pat2

[[pat1]-[pat2]]

The asymmetric difference of sets specified by pat1 and pat2

[:Lu:] or \p{Lu}

The set of characters having the specified Unicode property; in this case, Unicode uppercase letters

[:^Lu:] or \P{Lu}

The set of characters not having the given Unicode property

Warning: you cannot add an empty string ("") to a UnicodeSet.

Formal syntax

pattern := 

('[' '^'? item* ']') | property

item := 

char | (char '-' char) | pattern-expr

pattern-expr := 

pattern | pattern-expr pattern | pattern-expr op pattern

op := 

'&' | '-'

special := 

'[' | ']' | '-'

char := 

any character that is not special
| ('\'
any character)
| ('\u' hex hex hex hex)

hex := 

any character for which Character.digit(c, 16) returns a non-negative result

property := 

a Unicode property set pattern


Legend:

a := b

 

a may be replaced by b

a?

zero or one instance of a

a*

one or more instances of a

a | b

either a or b

'a'

the literal string between the quotes

Note:

Author:
Alan Liu
Stable:
ICU 2.0

Definition at line 76 of file caniter.h.


Member Enumeration Documentation

anonymous enum
Enumerator:
MIN_VALUE 

Minimum value that can be stored in a UnicodeSet.

Stable:
ICU 2.4
MAX_VALUE 

Maximum value that can be stored in a UnicodeSet.

Stable:
ICU 2.4

Definition at line 354 of file uniset.h.

Internal:
Do not use. This API is for internal use only.

Definition at line 394 of file uniset.h.


Constructor & Destructor Documentation

virtual icu::final::~CanonicalIterator (  )  [virtual]

Destructor Cleans pieces.

Stable:
ICU 2.4
icu::final::~Char16Ptr (  )  [inline]

Destructor.

Stable:
ICU 59
icu::final::~ConstChar16Ptr (  )  [inline]

Destructor.

Stable:
ICU 59
icu::final::~Edits (  ) 

Destructor.

Stable:
ICU 59
icu::final::~SimpleFormatter (  ) 

Destructor.

Stable:
ICU 57
virtual icu::final::~UnicodeSet (  )  [virtual]

Destructs the set.

Stable:
ICU 2.0

Member Function Documentation

UnicodeSet& icu::final::add ( const UnicodeString s  ) 

Adds the specified multicharacter to this set if it is not already present.

If this set already contains the multicharacter, the call leaves this set unchanged. Thus "ch" => {"ch"}
Warning: you cannot add an empty string ("") to a UnicodeSet. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
UnicodeSet& icu::final::add ( UChar32  c  ) 

Adds the specified character to this set if it is not already present.

If this set already contains the specified character, the call leaves this set unchanged. A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet& icu::final::add ( UChar32  start,
UChar32  end 
) [virtual]

Adds the specified range to this set if it is not already present.

If this set already contains the specified range, the call leaves this set unchanged. If end > start then an empty range is added, leaving the set unchanged. This is equivalent to a boolean logic OR, or a set UNION. A frozen set will not be modified.

Parameters:
start first character, inclusive, of range to be added to this set.
end last character, inclusive, of range to be added to this set.
Stable:
ICU 2.0
virtual UnicodeSet& icu::final::addAll ( const UnicodeSet &  c  )  [virtual]

Adds all of the elements in the specified set to this set if they're not already present.

This operation effectively modifies this set so that its value is the union of the two sets. The behavior of this operation is unspecified if the specified collection is modified while the operation is in progress. A frozen set will not be modified.

Parameters:
c set whose elements are to be added to this set.
See also:
add(UChar32, UChar32)
Stable:
ICU 2.0
UnicodeSet& icu::final::addAll ( const UnicodeString s  ) 

Adds each of the characters in this string to the set.

Thus "ch" => {"c", "h"} If this set already any particular character, it has no effect on that character. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
virtual void icu::final::addMatchSetTo ( UnicodeSet &  toUnionTo  )  const [virtual]

Implementation of UnicodeMatcher API.

Union the set of all characters that may be matched by this object into the given set.

Parameters:
toUnionTo the set into which to union the source characters
Stable:
ICU 2.4

Implements icu::UnicodeMatcher.

void icu::final::addReplace ( int32_t  oldLength,
int32_t  newLength 
)

Adds a change edit: a record for a text replacement/insertion/deletion.

Normally called from inside ICU string transformation functions, not user code.

Stable:
ICU 59
void icu::final::addUnchanged ( int32_t  unchangedLength  ) 

Adds a no-change edit: a record for an unchanged segment of text.

Normally called from inside ICU string transformation functions, not user code.

Stable:
ICU 59
UnicodeSet& icu::final::applyIntPropertyValue ( UProperty  prop,
int32_t  value,
UErrorCode ec 
)

Modifies this set to contain those code points which have the given value for the given binary or enumerated property, as returned by u_getIntPropertyValue.

Prior contents of this set are lost. A frozen set will not be modified.

Parameters:
prop a property in the range UCHAR_BIN_START..UCHAR_BIN_LIMIT-1 or UCHAR_INT_START..UCHAR_INT_LIMIT-1 or UCHAR_MASK_START..UCHAR_MASK_LIMIT-1.
value a value in the range u_getIntPropertyMinValue(prop).. u_getIntPropertyMaxValue(prop), with one exception. If prop is UCHAR_GENERAL_CATEGORY_MASK, then value should not be a UCharCategory, but rather a mask value produced by U_GET_GC_MASK(). This allows grouped categories such as [:L:] to be represented.
ec error code input/output parameter
Returns:
a reference to this set
Stable:
ICU 2.4
UnicodeSet& icu::final::applyPattern ( const UnicodeString pattern,
ParsePosition pos,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Parses the given pattern, starting at the given position.

The character at pattern.charAt(pos.getIndex()) must be '[', or the parse fails. Parsing continues until the corresponding closing ']'. If a syntax error is encountered between the opening and closing brace, the parse fails. Upon return from a successful parse, the ParsePosition is updated to point to the character following the closing ']', and a StringBuffer containing a pairs list for the parsed pattern is returned. This method calls itself recursively to parse embedded subpatterns. Empties the set passed before applying the pattern. A frozen set will not be modified.

Parameters:
pattern the string containing the pattern to be parsed. The portion of the string from pos.getIndex(), which must be a '[', to the corresponding closing ']', is parsed.
pos upon entry, the position at which to being parsing. The character at pattern.charAt(pos.getIndex()) must be a '['. Upon return from a successful parse, pos.getIndex() is either the character after the closing ']' of the parsed pattern, or pattern.length() if the closing ']' is the last character of the pattern string.
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-ins to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Returns:
a reference to this
Stable:
ICU 2.8
UnicodeSet& icu::final::applyPattern ( const UnicodeString pattern,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Modifies this set to represent the set specified by the given pattern, optionally ignoring Unicode Pattern_White_Space characters.

See the class description for the syntax of the pattern language. A frozen set will not be modified.

Parameters:
pattern a string specifying what characters are in the set
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-ins to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error. Empties the set passed before applying the pattern.
Returns:
a reference to this
Internal:
Do not use. This API is for internal use only.
UnicodeSet& icu::final::applyPattern ( const UnicodeString pattern,
UErrorCode status 
)

Modifies this set to represent the set specified by the given pattern, ignoring Unicode Pattern_White_Space characters.

See the class description for the syntax of the pattern language. A frozen set will not be modified.

Parameters:
pattern a string specifying what characters are in the set
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error. Empties the set passed before applying the pattern.
Returns:
a reference to this
Stable:
ICU 2.0
UBool icu::final::applyPattern ( const UnicodeString pattern,
UErrorCode errorCode 
) [inline]

Changes this object according to the new pattern.

Parameters:
pattern The pattern string.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call. Set to U_ILLEGAL_ARGUMENT_ERROR for bad argument syntax.
Returns:
TRUE if U_SUCCESS(errorCode).
Stable:
ICU 57

Definition at line 131 of file simpleformatter.h.

References INT32_MAX.

UBool icu::final::applyPatternMinMaxArguments ( const UnicodeString pattern,
int32_t  min,
int32_t  max,
UErrorCode errorCode 
)

Changes this object according to the new pattern.

The number of arguments checked against the given limits is the highest argument number plus one, not the number of occurrences of arguments.

Parameters:
pattern The pattern string.
min The pattern must have at least this many arguments.
max The pattern must have at most this many arguments.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call. Set to U_ILLEGAL_ARGUMENT_ERROR for bad argument syntax and too few or too many arguments.
Returns:
TRUE if U_SUCCESS(errorCode).
Stable:
ICU 57
UnicodeSet& icu::final::applyPropertyAlias ( const UnicodeString prop,
const UnicodeString value,
UErrorCode ec 
)

Modifies this set to contain those code points which have the given value for the given property.

Prior contents of this set are lost. A frozen set will not be modified.

Parameters:
prop a property alias, either short or long. The name is matched loosely. See PropertyAliases.txt for names and a description of loose matching. If the value string is empty, then this string is interpreted as either a General_Category value alias, a Script value alias, a binary property alias, or a special ID. Special IDs are matched loosely and correspond to the following sets:

"ANY" = [\u0000-\U0010FFFF], "ASCII" = [\u0000-\u007F], "Assigned" = [:^Cn:].

Parameters:
value a value alias, either short or long. The name is matched loosely. See PropertyValueAliases.txt for names and a description of loose matching. In addition to aliases listed, numeric values and canonical combining classes may be expressed numerically, e.g., ("nv", "0.5") or ("ccc", "220"). The value string may also be empty.
ec error code input/output parameter
Returns:
a reference to this set
Stable:
ICU 2.4
icu::final::CanonicalIterator ( const UnicodeString source,
UErrorCode status 
)

Construct a CanonicalIterator object.

Parameters:
source string to get results for
status Fill-in parameter which receives the status of this operation.
Stable:
ICU 2.4
icu::final::Char16Ptr ( std::nullptr_t  p  )  [inline]

nullptr constructor.

Parameters:
p nullptr
Stable:
ICU 59
icu::final::Char16Ptr ( wchar_t *  p  )  [inline]

Converts the pointer to char16_t *.

(Only defined if U_SIZEOF_WCHAR_T==2.)

Parameters:
p pointer to be converted
Stable:
ICU 59
icu::final::Char16Ptr ( uint16_t *  p  )  [inline]

Converts the pointer to char16_t *.

Parameters:
p pointer to be converted
Stable:
ICU 59
icu::final::Char16Ptr ( char16_t *  p  )  [inline]

Copies the pointer.

Parameters:
p pointer
Stable:
ICU 59
UChar32 icu::final::charAt ( int32_t  index  )  const

Returns the character at the given index within this set, where the set is ordered by ascending code point.

If the index is out of range, return (UChar32)-1. The inverse of this method is indexOf().

Parameters:
index an index from 0..size()-1
Returns:
the character at the given index, or (UChar32)-1.
Stable:
ICU 2.4
virtual UnicodeSet& icu::final::clear ( void   )  [virtual]

Removes all of the elements from this set.

This set will be empty after this call returns. A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet* icu::final::clone (  )  const [virtual]

Returns a copy of this object.

All UnicodeFunctor objects have to support cloning in order to allow classes using UnicodeFunctors, such as Transliterator, to implement cloning. If this set is frozen, then the clone will be frozen as well. Use cloneAsThawed() for a mutable clone of a frozen set.

See also:
cloneAsThawed
Stable:
ICU 2.0

Implements icu::UnicodeFilter.

UnicodeSet* icu::final::cloneAsThawed (  )  const

Clone the set and make the clone mutable.

See the ICU4J Freezable interface for details.

Returns:
the mutable clone
See also:
freeze
isFrozen
Stable:
ICU 3.8
UnicodeSet& icu::final::closeOver ( int32_t  attribute  ) 

Close this set over the given attribute.

For the attribute USET_CASE, the result is to modify this set so that:

1. For each character or string 'a' in this set, all strings or characters 'b' such that foldCase(a) == foldCase(b) are added to this set.

2. For each string 'e' in the resulting set, if e != foldCase(e), 'e' will be removed.

Example: [aq\u00DF{Bc}{bC}{Fi}] => [aAqQ\u00DF\uFB01{ss}{bc}{fi}]

(Here foldCase(x) refers to the operation u_strFoldCase, and a == b denotes that the contents are the same, not pointer comparison.)

A frozen set will not be modified.

Parameters:
attribute bitmask for attributes to close over. Currently only the USET_CASE bit is supported. Any undefined bits are ignored.
Returns:
a reference to this set.
Stable:
ICU 4.2
virtual UnicodeSet& icu::final::compact (  )  [virtual]

Reallocate this objects internal structures to take up the least possible space, without changing this object's value.

A frozen set will not be modified.

Stable:
ICU 2.4
UnicodeSet& icu::final::complement ( const UnicodeString s  ) 

Complement the specified string in this set.

The set will not contain the specified string once the call returns.
Warning: you cannot add an empty string ("") to a UnicodeSet. A frozen set will not be modified.

Parameters:
s the string to complement
Returns:
this object, for chaining
Stable:
ICU 2.4
UnicodeSet& icu::final::complement ( UChar32  c  ) 

Complements the specified character in this set.

The character will be removed if it is in this set, or will be added if it is not in this set. A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet& icu::final::complement ( UChar32  start,
UChar32  end 
) [virtual]

Complements the specified range in this set.

Any character in the range will be removed if it is in this set, or will be added if it is not in this set. If end > start then an empty range is complemented, leaving the set unchanged. This is equivalent to a boolean logic XOR. A frozen set will not be modified.

Parameters:
start first character, inclusive, of range to be removed from this set.
end last character, inclusive, of range to be removed from this set.
Stable:
ICU 2.0
virtual UnicodeSet& icu::final::complement ( void   )  [virtual]

Inverts this set.

This operation modifies this set so that its value is its complement. This is equivalent to complement(MIN_VALUE, MAX_VALUE). A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet& icu::final::complementAll ( const UnicodeSet &  c  )  [virtual]

Complements in this set all elements contained in the specified set.

Any character in the other set will be removed if it is in this set, or will be added if it is not in this set. A frozen set will not be modified.

Parameters:
c set that defines which elements will be xor'ed from this set.
Stable:
ICU 2.4
UnicodeSet& icu::final::complementAll ( const UnicodeString s  ) 

Complement EACH of the characters in this string.

Note: "ch" == {"c", "h"} If this set already any particular character, it has no effect on that character. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
icu::final::ConstChar16Ptr ( const std::nullptr_t  p  )  [inline]

nullptr constructor.

Parameters:
p nullptr
Stable:
ICU 59
icu::final::ConstChar16Ptr ( const wchar_t *  p  )  [inline]

Converts the pointer to char16_t *.

(Only defined if U_SIZEOF_WCHAR_T==2.)

Parameters:
p pointer to be converted
Stable:
ICU 59
icu::final::ConstChar16Ptr ( const uint16_t *  p  )  [inline]

Converts the pointer to char16_t *.

Parameters:
p pointer to be converted
Stable:
ICU 59
icu::final::ConstChar16Ptr ( const char16_t *  p  )  [inline]

Copies the pointer.

Parameters:
p pointer
Stable:
ICU 59
UBool icu::final::contains ( const UnicodeString s  )  const

Returns true if this set contains the given multicharacter string.

Parameters:
s string to be checked for containment
Returns:
true if this set contains the specified string
Stable:
ICU 2.4
virtual UBool icu::final::contains ( UChar32  start,
UChar32  end 
) const [virtual]

Returns true if this set contains every character of the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the test condition is met
Stable:
ICU 2.0
virtual UBool icu::final::contains ( UChar32  c  )  const [virtual]

Returns true if this set contains the given character.

This function works faster with a frozen set.

Parameters:
c character to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.0

Implements icu::UnicodeFilter.

UBool icu::final::containsAll ( const UnicodeString s  )  const

Returns true if this set contains all the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4
virtual UBool icu::final::containsAll ( const UnicodeSet &  c  )  const [virtual]

Returns true if this set contains all the characters and strings of the given set.

Parameters:
c set to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4
UBool icu::final::containsNone ( const UnicodeString s  )  const

Returns true if this set contains none of the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4
UBool icu::final::containsNone ( const UnicodeSet &  c  )  const

Returns true if this set contains none of the characters and strings of the given set.

Parameters:
c set to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4
UBool icu::final::containsNone ( UChar32  start,
UChar32  end 
) const

Returns true if this set contains none of the characters of the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the test condition is met
Stable:
ICU 2.4
UBool icu::final::containsSome ( const UnicodeString s  )  const [inline]

Returns true if this set contains one or more of the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the condition is met
Stable:
ICU 2.4
UBool icu::final::containsSome ( const UnicodeSet &  s  )  const [inline]

Returns true if this set contains one or more of the characters and strings of the given set.

Parameters:
s The set to be checked for containment
Returns:
true if the condition is met
Stable:
ICU 2.4
UBool icu::final::containsSome ( UChar32  start,
UChar32  end 
) const [inline]

Returns true if this set contains one or more of the characters in the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the condition is met
Stable:
ICU 2.4
UBool icu::final::copyErrorTo ( UErrorCode outErrorCode  )  const

Sets the UErrorCode if an error occurred while recording edits.

Preserves older error codes in the outErrorCode. Normally called from inside ICU string transformation functions, not user code.

Parameters:
outErrorCode Set to an error code if it does not contain one already and an error occurred while recording edits. Otherwise unchanged.
Returns:
TRUE if U_FAILURE(outErrorCode)
Stable:
ICU 59
static UnicodeSet* icu::final::createFrom ( const UnicodeString s  )  [static]

Makes a set from a multicharacter string.

Thus "ch" => {"ch"}
Warning: you cannot add an empty string ("") to a UnicodeSet.

Parameters:
s the source string
Returns:
a newly created set containing the given string. The caller owns the return object and is responsible for deleting it.
Stable:
ICU 2.4
static UnicodeSet* icu::final::createFromAll ( const UnicodeString s  )  [static]

Makes a set from each of the characters in the string.

Thus "ch" => {"c", "h"}

Parameters:
s the source string
Returns:
a newly created set containing the given characters The caller owns the return object and is responsible for deleting it.
Stable:
ICU 2.4
icu::final::Edits ( Edits &&  src  )  [inline]

Move constructor, might leave src empty.

This object will have the same contents that the source object had.

Parameters:
src source edits
Stable:
ICU 60

Definition at line 106 of file edits.h.

icu::final::Edits ( const Edits &  other  )  [inline]

Copy constructor.

Parameters:
other source edits
Stable:
ICU 60

Definition at line 94 of file edits.h.

icu::final::Edits (  )  [inline]

Constructs an empty object.

Stable:
ICU 59

Definition at line 86 of file edits.h.

static int32_t icu::final::fold ( uint32_t  options,
const char16_t *  src,
int32_t  srcLength,
char16_t *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Case-folds a UTF-16 string and optionally records edits.

Case folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT, U_EDITS_NO_RESET, U_FOLD_CASE_DEFAULT, U_FOLD_CASE_EXCLUDE_SPECIAL_I.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of char16_ts). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
u_strFoldCase
Stable:
ICU 59
UnicodeString& icu::final::format ( const UnicodeString value0,
const UnicodeString value1,
const UnicodeString value2,
UnicodeString appendTo,
UErrorCode errorCode 
) const

Formats the given values, appending to the appendTo builder.

An argument value must not be the same object as appendTo. getArgumentLimit() must be at most 3.

Parameters:
value0 Value for argument {0}.
value1 Value for argument {1}.
value2 Value for argument {2}.
appendTo Gets the formatted pattern and values appended.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
appendTo
Stable:
ICU 57
UnicodeString& icu::final::format ( const UnicodeString value0,
const UnicodeString value1,
UnicodeString appendTo,
UErrorCode errorCode 
) const

Formats the given values, appending to the appendTo builder.

An argument value must not be the same object as appendTo. getArgumentLimit() must be at most 2.

Parameters:
value0 Value for argument {0}.
value1 Value for argument {1}.
appendTo Gets the formatted pattern and values appended.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
appendTo
Stable:
ICU 57
UnicodeString& icu::final::format ( const UnicodeString value0,
UnicodeString appendTo,
UErrorCode errorCode 
) const

Formats the given value, appending to the appendTo builder.

The argument value must not be the same object as appendTo. getArgumentLimit() must be at most 1.

Parameters:
value0 Value for argument {0}.
appendTo Gets the formatted pattern and value appended.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
appendTo
Stable:
ICU 57
UnicodeString& icu::final::formatAndAppend ( const UnicodeString *const *  values,
int32_t  valuesLength,
UnicodeString appendTo,
int32_t *  offsets,
int32_t  offsetsLength,
UErrorCode errorCode 
) const

Formats the given values, appending to the appendTo string.

Parameters:
values The argument values. An argument value must not be the same object as appendTo. Can be NULL if valuesLength==getArgumentLimit()==0.
valuesLength The length of the values array. Must be at least getArgumentLimit().
appendTo Gets the formatted pattern and values appended.
offsets offsets[i] receives the offset of where values[i] replaced pattern argument {i}. Can be shorter or longer than values. Can be NULL if offsetsLength==0. If there is no {i} in the pattern, then offsets[i] is set to -1.
offsetsLength The length of the offsets array.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
appendTo
Stable:
ICU 57
UnicodeString& icu::final::formatAndReplace ( const UnicodeString *const *  values,
int32_t  valuesLength,
UnicodeString result,
int32_t *  offsets,
int32_t  offsetsLength,
UErrorCode errorCode 
) const

Formats the given values, replacing the contents of the result string.

May optimize by actually appending to the result if it is the same object as the value corresponding to the initial argument in the pattern.

Parameters:
values The argument values. An argument value may be the same object as result. Can be NULL if valuesLength==getArgumentLimit()==0.
valuesLength The length of the values array. Must be at least getArgumentLimit().
result Gets its contents replaced by the formatted pattern and values.
offsets offsets[i] receives the offset of where values[i] replaced pattern argument {i}. Can be shorter or longer than values. Can be NULL if offsetsLength==0. If there is no {i} in the pattern, then offsets[i] is set to -1.
offsetsLength The length of the offsets array.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call.
Returns:
result
Stable:
ICU 57
UnicodeSet* icu::final::freeze (  ) 

Freeze the set (make it immutable).

Once frozen, it cannot be unfrozen and is therefore thread-safe until it is deleted. See the ICU4J Freezable interface for details. Freezing the set may also make some operations faster, for example contains() and span(). A frozen set will not be modified. (It remains frozen.)

Returns:
this set.
See also:
isFrozen
cloneAsThawed
Stable:
ICU 3.8
static const UnicodeSet* icu::final::fromUSet ( const USet uset  )  [inline, static]

Get a UnicodeSet pointer from a const USet.

Parameters:
uset a const USet (the ICU plain C type for UnicodeSet)
Returns:
the corresponding UnicodeSet pointer.
Stable:
ICU 4.2
static UnicodeSet* icu::final::fromUSet ( USet uset  )  [inline, static]

Get a UnicodeSet pointer from a USet.

Parameters:
uset a USet (the ICU plain C type for UnicodeSet)
Returns:
the corresponding UnicodeSet pointer.
Stable:
ICU 4.2
const char16_t* icu::final::get (  )  const [inline]

Pointer access.

Returns:
the wrapped pointer
Stable:
ICU 59
char16_t* icu::final::get (  )  const [inline]

Pointer access.

Returns:
the wrapped pointer
Stable:
ICU 59
int32_t icu::final::getArgumentLimit (  )  const [inline]
Returns:
The max argument number + 1.
Stable:
ICU 57

Definition at line 157 of file simpleformatter.h.

Iterator icu::final::getCoarseChangesIterator (  )  const [inline]

Returns an Iterator for coarse-grained change edits (adjacent change edits are treated as one).

Can be used to perform simple string updates. Skips no-change edits.

Returns:
an Iterator that merges adjacent changes.
Stable:
ICU 59

Definition at line 438 of file edits.h.

References TRUE.

Iterator icu::final::getCoarseIterator (  )  const [inline]

Returns an Iterator for coarse-grained change and no-change edits (adjacent change edits are treated as one).

Can be used to perform simple string updates. Adjacent change edits are treated as one edit.

Returns:
an Iterator that merges adjacent changes.
Stable:
ICU 59

Definition at line 450 of file edits.h.

References FALSE, and TRUE.

virtual UClassID icu::final::getDynamicClassID ( void   )  const [virtual]

Implement UnicodeFunctor API.

Returns:
The class ID for this object. All objects of a given class have the same class ID. Objects of other classes have different class IDs.
Stable:
ICU 2.4

Reimplemented from icu::UObject.

virtual UClassID icu::final::getDynamicClassID (  )  const [virtual]

ICU "poor man's RTTI", returns a UClassID for the actual class.

Stable:
ICU 2.2

Reimplemented from icu::UObject.

Iterator icu::final::getFineChangesIterator (  )  const [inline]

Returns an Iterator for fine-grained change edits (full granularity of change edits is retained).

Can be used for modifying styled text. Skips no-change edits.

Returns:
an Iterator that separates adjacent changes.
Stable:
ICU 59

Definition at line 462 of file edits.h.

References FALSE, and TRUE.

Iterator icu::final::getFineIterator (  )  const [inline]

Returns an Iterator for fine-grained change and no-change edits (full granularity of change edits is retained).

Can be used for modifying styled text.

Returns:
an Iterator that separates adjacent changes.
Stable:
ICU 59

Definition at line 473 of file edits.h.

References FALSE.

virtual int32_t icu::final::getRangeCount ( void   )  const [virtual]

Iteration method that returns the number of ranges contained in this set.

See also:
getRangeStart
getRangeEnd
Stable:
ICU 2.4
virtual UChar32 icu::final::getRangeEnd ( int32_t  index  )  const [virtual]

Iteration method that returns the last character in the specified range of this set.

See also:
getRangeStart
getRangeEnd
Stable:
ICU 2.4
virtual UChar32 icu::final::getRangeStart ( int32_t  index  )  const [virtual]

Iteration method that returns the first character in the specified range of this set.

See also:
getRangeCount
getRangeEnd
Stable:
ICU 2.4
UnicodeString icu::final::getSource (  ) 

Gets the NFD form of the current source we are iterating over.

Returns:
gets the source: NOTE: it is the NFD form of source
Stable:
ICU 2.4
static UClassID icu::final::getStaticClassID ( void   )  [static]

Return the class ID for this class.

This is useful only for comparing to a return value from getDynamicClassID(). For example:

 .      Base* polymorphic_pointer = createPolymorphicObject();
 .      if (polymorphic_pointer->getDynamicClassID() ==
 .          Derived::getStaticClassID()) ...
 
Returns:
The class ID for all objects of this class.
Stable:
ICU 2.0

Reimplemented from icu::UnicodeFilter.

static UClassID icu::final::getStaticClassID (  )  [static]

ICU "poor man's RTTI", returns a UClassID for this class.

Stable:
ICU 2.2

Reimplemented from icu::UnicodeFilter.

UnicodeString icu::final::getTextWithNoArguments ( int32_t *  offsets,
int32_t  offsetsLength 
) const [inline]

Returns the pattern text with none of the arguments.

Like formatting with all-empty string values.

TODO(ICU-20406): Replace this with an Iterator interface.

Parameters:
offsets offsets[i] receives the offset of where {i} was located before it was replaced by an empty string. For example, "a{0}b{1}" produces offset 1 for i=0 and 2 for i=1. Can be nullptr if offsetsLength==0. If there is no {i} in the pattern, then offsets[i] is set to -1.
offsetsLength The length of the offsets array.
Internal:
Do not use. This API is for internal use only.

Definition at line 294 of file simpleformatter.h.

UnicodeString icu::final::getTextWithNoArguments (  )  const [inline]

Returns the pattern text with none of the arguments.

Like formatting with all-empty string values.

Stable:
ICU 57

Definition at line 270 of file simpleformatter.h.

UBool icu::final::hasChanges (  )  const [inline]
Returns:
TRUE if there are any change edits
Stable:
ICU 59

Definition at line 177 of file edits.h.

virtual int32_t icu::final::hashCode ( void   )  const [virtual]

Returns the hash code value for this set.

Returns:
the hash code value for this set.
See also:
Object::hashCode()
Stable:
ICU 2.0
int32_t icu::final::indexOf ( UChar32  c  )  const

Returns the index of the given character within this set, where the set is ordered by ascending code point.

If the character is not in this set, return -1. The inverse of this method is charAt().

Returns:
an index from 0..size()-1, or -1
Stable:
ICU 2.4
UBool icu::final::isBogus ( void   )  const [inline]

Determine if this object contains a valid set.

A bogus set has no value. It is different from an empty set. It can be used to indicate that no set value is available.

Returns:
TRUE if the set is bogus/invalid, FALSE otherwise
See also:
setToBogus()
Stable:
ICU 4.0
virtual UBool icu::final::isEmpty ( void   )  const [virtual]

Returns true if this set contains no elements.

Returns:
true if this set contains no elements.
Stable:
ICU 2.0
UBool icu::final::isFrozen (  )  const [inline]

Determines whether the set has been frozen (made immutable) or not.

See the ICU4J Freezable interface for details.

Returns:
TRUE/FALSE for whether the set has been frozen
See also:
freeze
cloneAsThawed
Stable:
ICU 3.8
int32_t icu::final::lengthDelta (  )  const [inline]

How much longer is the new text compared with the old text?

Returns:
new length minus old length
Stable:
ICU 59

Definition at line 172 of file edits.h.

virtual UMatchDegree icu::final::matches ( const Replaceable text,
int32_t &  offset,
int32_t  limit,
UBool  incremental 
) [virtual]

Implement UnicodeMatcher::matches().

Stable:
ICU 2.4

Reimplemented from icu::UnicodeFilter.

Edits& icu::final::mergeAndAppend ( const Edits &  ab,
const Edits &  bc,
UErrorCode errorCode 
)

Merges the two input Edits and appends the result to this object.

Consider two string transformations (for example, normalization and case mapping) where each records Edits in addition to writing an output string.
Edits ab reflect how substrings of input string a map to substrings of intermediate string b.
Edits bc reflect how substrings of intermediate string b map to substrings of output string c.
This function merges ab and bc such that the additional edits recorded in this object reflect how substrings of input string a map to substrings of output string c.

If unrelated Edits are passed in where the output string of the first has a different length than the input string of the second, then a U_ILLEGAL_ARGUMENT_ERROR is reported.

Parameters:
ab reflects how substrings of input string a map to substrings of intermediate string b.
bc reflects how substrings of intermediate string b map to substrings of output string c.
errorCode ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns:
*this, with the merged edits appended
Stable:
ICU 60
UnicodeString icu::final::next (  ) 

Get the next canonically equivalent string.


Warning: The strings are not guaranteed to be in any particular order.

Returns:
the next string that is canonically equivalent. A bogus string is returned when the iteration is done.
Stable:
ICU 2.4
int32_t icu::final::numberOfChanges (  )  const [inline]
Returns:
the number of change edits
Stable:
ICU 60

Definition at line 183 of file edits.h.

icu::final::operator char16_t * (  )  const [inline]

char16_t pointer access via type conversion (e.g., static_cast).

Returns:
the wrapped pointer
Stable:
ICU 59

Definition at line 90 of file char16ptr.h.

icu::final::operator const char16_t * (  )  const [inline]

char16_t pointer access via type conversion (e.g., static_cast).

Returns:
the wrapped pointer
Stable:
ICU 59

Definition at line 198 of file char16ptr.h.

UBool icu::final::operator!= ( const UnicodeSet &  o  )  const [inline]

Compares the specified object with this set for equality.

Returns true if the specified set is not equal to this set.

Stable:
ICU 2.0
UnicodeSet& icu::final::operator= ( const UnicodeSet &  o  ) 

Assigns this object to be a copy of another.

A frozen set will not be modified.

Stable:
ICU 2.0
SimpleFormatter& icu::final::operator= ( const SimpleFormatter &  other  ) 

Assignment operator.

Stable:
ICU 57
Edits& icu::final::operator= ( Edits &&  src  ) 

Move assignment operator, might leave src empty.

This object will have the same contents that the source object had. The behavior is undefined if *this and src are the same object.

Parameters:
src source edits
Returns:
*this
Stable:
ICU 60
Edits& icu::final::operator= ( const Edits &  other  ) 

Assignment operator.

Parameters:
other source edits
Returns:
*this
Stable:
ICU 60
virtual UBool icu::final::operator== ( const UnicodeSet &  o  )  const [virtual]

Compares the specified object with this set for equality.

Returns true if the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set).

Parameters:
o set to be compared for equality with this set.
Returns:
true if the specified set is equal to this set.
Stable:
ICU 2.0
static void icu::final::permute ( UnicodeString source,
UBool  skipZeros,
Hashtable *  result,
UErrorCode status 
) [static]

Dumb recursive implementation of permutation.

TODO: optimize

Parameters:
source the string to find permutations for
skipZeros determine if skip zeros
result the results in a set.
status Fill-in parameter which receives the status of this operation.
Internal:
Do not use. This API is for internal use only.
UnicodeSet& icu::final::remove ( const UnicodeString s  ) 

Removes the specified string from this set if it is present.

The set will not contain the specified character once the call returns. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
UnicodeSet& icu::final::remove ( UChar32  c  ) 

Removes the specified character from this set if it is present.

The set will not contain the specified range once the call returns. A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet& icu::final::remove ( UChar32  start,
UChar32  end 
) [virtual]

Removes the specified range from this set if it is present.

The set will not contain the specified range once the call returns. If end > start then an empty range is removed, leaving the set unchanged. A frozen set will not be modified.

Parameters:
start first character, inclusive, of range to be removed from this set.
end last character, inclusive, of range to be removed from this set.
Stable:
ICU 2.0
virtual UnicodeSet& icu::final::removeAll ( const UnicodeSet &  c  )  [virtual]

Removes from this set all of its elements that are contained in the specified set.

This operation effectively modifies this set so that its value is the asymmetric set difference of the two sets. A frozen set will not be modified.

Parameters:
c set that defines which elements will be removed from this set.
Stable:
ICU 2.0
UnicodeSet& icu::final::removeAll ( const UnicodeString s  ) 

Remove EACH of the characters in this string.

Note: "ch" == {"c", "h"} If this set already any particular character, it has no effect on that character. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
virtual UnicodeSet& icu::final::removeAllStrings (  )  [virtual]

Remove all strings from this set.

Returns:
a reference to this set.
Stable:
ICU 4.2
static UBool icu::final::resemblesPattern ( const UnicodeString pattern,
int32_t  pos 
) [static]

Return true if the given position, in the given pattern, appears to be the start of a UnicodeSet pattern.

Stable:
ICU 2.4
void icu::final::reset (  ) 

Resets the data but may not release memory.

Stable:
ICU 59
void icu::final::reset (  ) 

Resets the iterator so that one can start again from the beginning.

Stable:
ICU 2.4
UnicodeSet& icu::final::retain ( UChar32  c  ) 

Retain the specified character from this set if it is present.

A frozen set will not be modified.

Stable:
ICU 2.0
virtual UnicodeSet& icu::final::retain ( UChar32  start,
UChar32  end 
) [virtual]

Retain only the elements in this set that are contained in the specified range.

If end > start then an empty range is retained, leaving the set empty. This is equivalent to a boolean logic AND, or a set INTERSECTION. A frozen set will not be modified.

Parameters:
start first character, inclusive, of range to be retained to this set.
end last character, inclusive, of range to be retained to this set.
Stable:
ICU 2.0
virtual UnicodeSet& icu::final::retainAll ( const UnicodeSet &  c  )  [virtual]

Retains only the elements in this set that are contained in the specified set.

In other words, removes from this set all of its elements that are not contained in the specified set. This operation effectively modifies this set so that its value is the intersection of the two sets. A frozen set will not be modified.

Parameters:
c set that defines which elements this set will retain.
Stable:
ICU 2.0
UnicodeSet& icu::final::retainAll ( const UnicodeString s  ) 

Retains EACH of the characters in this string.

Note: "ch" == {"c", "h"} If this set already any particular character, it has no effect on that character. A frozen set will not be modified.

Parameters:
s the source string
Returns:
this object, for chaining
Stable:
ICU 2.4
int32_t icu::final::serialize ( uint16_t *  dest,
int32_t  destCapacity,
UErrorCode ec 
) const

Serializes this set into an array of 16-bit integers.

Serialization (currently) only records the characters in the set; multicharacter strings are ignored.

The array has following format (each line is one 16-bit integer):

length = (n+2*m) | (m!=0?0x8000:0) bmpLength = n; present if m!=0 bmp[0] bmp[1] ... bmp[n-1] supp-high[0] supp-low[0] supp-high[1] supp-low[1] ... supp-high[m-1] supp-low[m-1]

The array starts with a header. After the header are n bmp code points, then m supplementary code points. Either n or m or both may be zero. n+2*m is always <= 0x7FFF.

If there are no supplementary characters (if m==0) then the header is one 16-bit integer, 'length', with value n.

If there are supplementary characters (if m!=0) then the header is two 16-bit integers. The first, 'length', has value (n+2*m)|0x8000. The second, 'bmpLength', has value n.

After the header the code points are stored in ascending order. Supplementary code points are stored as most significant 16 bits followed by least significant 16 bits.

Parameters:
dest pointer to buffer of destCapacity 16-bit integers. May be NULL only if destCapacity is zero.
destCapacity size of dest, or zero. Must not be negative.
ec error code. Will be set to U_INDEX_OUTOFBOUNDS_ERROR if n+2*m > 0x7FFF. Will be set to U_BUFFER_OVERFLOW_ERROR if n+2*m+(m!=0?2:1) > destCapacity.
Returns:
the total length of the serialized format, including the header, that is, n+2*m+(m!=0?2:1), or 0 on error other than U_BUFFER_OVERFLOW_ERROR.
Stable:
ICU 2.4
UnicodeSet& icu::final::set ( UChar32  start,
UChar32  end 
)

Make this object represent the range `start - end`.

If `end > start` then this object is set to an empty range. A frozen set will not be modified.

Parameters:
start first character in the set, inclusive
end last character in the set, inclusive
Stable:
ICU 2.4
void icu::final::setSource ( const UnicodeString newSource,
UErrorCode status 
)

Set a new source for this iterator.

Allows object reuse.

Parameters:
newSource the source string to iterate against. This allows the same iterator to be used while changing the source string, saving object creation.
status Fill-in parameter which receives the status of this operation.
Stable:
ICU 2.4
void icu::final::setToBogus (  ) 

Make this UnicodeSet object invalid.

The string will test TRUE with isBogus().

A bogus set has no value. It is different from an empty set. It can be used to indicate that no set value is available.

This utility function is used throughout the UnicodeSet implementation to indicate that a UnicodeSet operation failed, and may be used in other functions, especially but not exclusively when such functions do not take a UErrorCode for simplicity.

See also:
isBogus()
Stable:
ICU 4.0
icu::final::SimpleFormatter ( const SimpleFormatter &  other  )  [inline]

Copy constructor.

Stable:
ICU 57

Definition at line 106 of file simpleformatter.h.

icu::final::SimpleFormatter ( const UnicodeString pattern,
int32_t  min,
int32_t  max,
UErrorCode errorCode 
) [inline]

Constructs a formatter from the pattern string.

The number of arguments checked against the given limits is the highest argument number plus one, not the number of occurrences of arguments.

Parameters:
pattern The pattern string.
min The pattern must have at least this many arguments.
max The pattern must have at most this many arguments.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call. Set to U_ILLEGAL_ARGUMENT_ERROR for bad argument syntax and too few or too many arguments.
Stable:
ICU 57

Definition at line 97 of file simpleformatter.h.

icu::final::SimpleFormatter ( const UnicodeString pattern,
UErrorCode errorCode 
) [inline]

Constructs a formatter from the pattern string.

Parameters:
pattern The pattern string.
errorCode ICU error code in/out parameter. Must fulfill U_SUCCESS before the function call. Set to U_ILLEGAL_ARGUMENT_ERROR for bad argument syntax.
Stable:
ICU 57

Definition at line 79 of file simpleformatter.h.

icu::final::SimpleFormatter (  )  [inline]

Default constructor.

Stable:
ICU 57

Definition at line 68 of file simpleformatter.h.

virtual int32_t icu::final::size ( void   )  const [virtual]

Returns the number of elements in this set (its cardinality).

Note than the elements of a set may include both individual codepoints and strings.

Returns:
the number of elements in this set (its cardinality).
Stable:
ICU 2.0
int32_t icu::final::span ( const UnicodeString s,
int32_t  start,
USetSpanCondition  spanCondition 
) const [inline]

Returns the end of the substring of the input string according to the USetSpanCondition.

Same as start+span(s.getBuffer()+start, s.length()-start, spanCondition) after pinning start to 0<=start<=s.length().

Parameters:
s the string
start the start index in the string for the span operation
spanCondition specifies the containment condition
Returns:
the exclusive end of the substring according to the spanCondition; the substring s.tempSubStringBetween(start, end) fulfills the spanCondition
Stable:
ICU 4.4
See also:
USetSpanCondition
int32_t icu::final::span ( const char16_t *  s,
int32_t  length,
USetSpanCondition  spanCondition 
) const

Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).

See USetSpanCondition for details. Similar to the strspn() C library function. Unpaired surrogates are treated according to contains() of their surrogate code points. This function works faster with a frozen set and with a non-negative string length argument.

Parameters:
s start of the string
length of the string; can be -1 for NUL-terminated
spanCondition specifies the containment condition
Returns:
the length of the initial substring according to the spanCondition; 0 if the start of the string does not fit the spanCondition
Stable:
ICU 3.8
See also:
USetSpanCondition
int32_t icu::final::spanBack ( const UnicodeString s,
int32_t  limit,
USetSpanCondition  spanCondition 
) const [inline]

Returns the start of the substring of the input string according to the USetSpanCondition.

Same as spanBack(s.getBuffer(), limit, spanCondition) after pinning limit to 0<=end<=s.length().

Parameters:
s the string
limit the exclusive-end index in the string for the span operation (use s.length() or INT32_MAX for spanning back from the end of the string)
spanCondition specifies the containment condition
Returns:
the start of the substring according to the spanCondition; the substring s.tempSubStringBetween(start, limit) fulfills the spanCondition
Stable:
ICU 4.4
See also:
USetSpanCondition
int32_t icu::final::spanBack ( const char16_t *  s,
int32_t  length,
USetSpanCondition  spanCondition 
) const

Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).

See USetSpanCondition for details. Unpaired surrogates are treated according to contains() of their surrogate code points. This function works faster with a frozen set and with a non-negative string length argument.

Parameters:
s start of the string
length of the string; can be -1 for NUL-terminated
spanCondition specifies the containment condition
Returns:
the start of the trailing substring according to the spanCondition; the string length if the end of the string does not fit the spanCondition
Stable:
ICU 3.8
See also:
USetSpanCondition
int32_t icu::final::spanBackUTF8 ( const char *  s,
int32_t  length,
USetSpanCondition  spanCondition 
) const

Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).

See USetSpanCondition for details. Malformed byte sequences are treated according to contains(0xfffd). This function works faster with a frozen set and with a non-negative string length argument.

Parameters:
s start of the string (UTF-8)
length of the string; can be -1 for NUL-terminated
spanCondition specifies the containment condition
Returns:
the start of the trailing substring according to the spanCondition; the string length if the end of the string does not fit the spanCondition
Stable:
ICU 3.8
See also:
USetSpanCondition
int32_t icu::final::spanUTF8 ( const char *  s,
int32_t  length,
USetSpanCondition  spanCondition 
) const

Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).

See USetSpanCondition for details. Similar to the strspn() C library function. Malformed byte sequences are treated according to contains(0xfffd). This function works faster with a frozen set and with a non-negative string length argument.

Parameters:
s start of the string (UTF-8)
length of the string; can be -1 for NUL-terminated
spanCondition specifies the containment condition
Returns:
the length of the initial substring according to the spanCondition; 0 if the start of the string does not fit the spanCondition
Stable:
ICU 3.8
See also:
USetSpanCondition
static int32_t icu::final::toLower ( const char *  locale,
uint32_t  options,
const char16_t *  src,
int32_t  srcLength,
char16_t *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Lowercases a UTF-16 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of char16_ts). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
u_strToLower
Stable:
ICU 59
virtual UnicodeString& icu::final::toPattern ( UnicodeString result,
UBool  escapeUnprintable = FALSE 
) const [virtual]

Returns a string representation of this set.

If the result of calling this function is passed to a UnicodeSet constructor, it will produce another set that is equal to this one. A frozen set will not be modified.

Parameters:
result the string to receive the rules. Previous contents will be deleted.
escapeUnprintable if TRUE then convert unprintable character to their hex escape representations, \uxxxx or \Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
Stable:
ICU 2.0

Implements icu::UnicodeMatcher.

static int32_t icu::final::toTitle ( const char *  locale,
uint32_t  options,
BreakIterator iter,
const char16_t *  src,
int32_t  srcLength,
char16_t *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Titlecases a UTF-16 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with options bits.)

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT, U_EDITS_NO_RESET, U_TITLECASE_NO_LOWERCASE, U_TITLECASE_NO_BREAK_ADJUSTMENT, U_TITLECASE_ADJUST_TO_CASED, U_TITLECASE_WHOLE_STRING, U_TITLECASE_SENTENCES.
iter A break iterator to find the first characters of words that are to be titlecased. It is set to the source string (setText()) and used one or more times for iteration (first() and next()). If NULL, then a word break iterator for the locale is used (or something equivalent).
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of char16_ts). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
u_strToTitle
ucasemap_toTitle
Stable:
ICU 59
static int32_t icu::final::toUpper ( const char *  locale,
uint32_t  options,
const char16_t *  src,
int32_t  srcLength,
char16_t *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Uppercases a UTF-16 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of char16_ts). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
u_strToUpper
Stable:
ICU 59
const USet* icu::final::toUSet (  )  const [inline]

Produce a const USet * pointer for this UnicodeSet.

USet is the plain C type for UnicodeSet

Returns:
a const USet pointer for this UnicodeSet
Stable:
ICU 4.2
USet* icu::final::toUSet (  )  [inline]

Produce a USet * pointer for this UnicodeSet.

USet is the plain C type for UnicodeSet

Returns:
a USet pointer for this UnicodeSet
Stable:
ICU 4.2
icu::final::UnicodeSet ( const UnicodeSet &  o  ) 

Constructs a set that is identical to the given UnicodeSet.

Stable:
ICU 2.0
icu::final::UnicodeSet ( const UnicodeString pattern,
ParsePosition pos,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
pos on input, the position in pattern at which to start parsing. On output, the position after the last character parsed.
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-in characters to UnicodeSets; may be NULL
status input-output error code
Stable:
ICU 2.8
icu::final::UnicodeSet ( const UnicodeString pattern,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-in characters to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Internal:
Do not use. This API is for internal use only.
icu::final::UnicodeSet ( const UnicodeString pattern,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Stable:
ICU 2.0
icu::final::UnicodeSet ( const uint16_t  buffer[],
int32_t  bufferLen,
ESerialization  serialization,
UErrorCode status 
)

Constructs a set from the output of serialize().

Parameters:
buffer the 16 bit array
bufferLen the original length returned from serialize()
serialization the value 'kSerialized'
status error code
Internal:
Do not use. This API is for internal use only.
icu::final::UnicodeSet ( UChar32  start,
UChar32  end 
)

Constructs a set containing the given range.

If end < start then an empty set is created.

Parameters:
start first character, inclusive, of range
end last character, inclusive, of range
Stable:
ICU 2.4
icu::final::UnicodeSet (  ) 

Constructs an empty set.

Stable:
ICU 2.0
static int32_t icu::final::utf8Fold ( uint32_t  options,
const char *  src,
int32_t  srcLength,
char *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Case-folds a UTF-8 string and optionally records edits.

Case folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT, U_EDITS_NO_RESET, U_FOLD_CASE_DEFAULT, U_FOLD_CASE_EXCLUDE_SPECIAL_I.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
ucasemap_utf8FoldCase
Stable:
ICU 59
static void icu::final::utf8Fold ( uint32_t  options,
StringPiece  src,
ByteSink sink,
Edits *  edits,
UErrorCode errorCode 
) [static]

Case-folds a UTF-8 string and optionally records edits.

Case folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.

The result may be longer or shorter than the original.

Parameters:
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
sink A ByteSink to which the result string is written. sink.Flush() is called at the end.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
See also:
ucasemap_utf8FoldCase
Stable:
ICU 60
static int32_t icu::final::utf8ToLower ( const char *  locale,
uint32_t  options,
const char *  src,
int32_t  srcLength,
char *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Lowercases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
ucasemap_utf8ToLower
Stable:
ICU 59
static void icu::final::utf8ToLower ( const char *  locale,
uint32_t  options,
StringPiece  src,
ByteSink sink,
Edits *  edits,
UErrorCode errorCode 
) [static]

Lowercases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
sink A ByteSink to which the result string is written. sink.Flush() is called at the end.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
See also:
ucasemap_utf8ToLower
Stable:
ICU 60
static int32_t icu::final::utf8ToTitle ( const char *  locale,
uint32_t  options,
BreakIterator iter,
const char *  src,
int32_t  srcLength,
char *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Titlecases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with options bits.)

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT, U_EDITS_NO_RESET, U_TITLECASE_NO_LOWERCASE, U_TITLECASE_NO_BREAK_ADJUSTMENT, U_TITLECASE_ADJUST_TO_CASED, U_TITLECASE_WHOLE_STRING, U_TITLECASE_SENTENCES.
iter A break iterator to find the first characters of words that are to be titlecased. It is set to the source string (setUText()) and used one or more times for iteration (first() and next()). If NULL, then a word break iterator for the locale is used (or something equivalent).
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
ucasemap_utf8ToTitle
Stable:
ICU 59
static void icu::final::utf8ToTitle ( const char *  locale,
uint32_t  options,
BreakIterator iter,
StringPiece  src,
ByteSink sink,
Edits *  edits,
UErrorCode errorCode 
) [static]

Titlecases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original.

Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with options bits.)

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT, U_EDITS_NO_RESET, U_TITLECASE_NO_LOWERCASE, U_TITLECASE_NO_BREAK_ADJUSTMENT, U_TITLECASE_ADJUST_TO_CASED, U_TITLECASE_WHOLE_STRING, U_TITLECASE_SENTENCES.
iter A break iterator to find the first characters of words that are to be titlecased. It is set to the source string (setUText()) and used one or more times for iteration (first() and next()). If NULL, then a word break iterator for the locale is used (or something equivalent).
src The original string.
sink A ByteSink to which the result string is written. sink.Flush() is called at the end.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
See also:
ucasemap_utf8ToTitle
Stable:
ICU 60
static int32_t icu::final::utf8ToUpper ( const char *  locale,
uint32_t  options,
const char *  src,
int32_t  srcLength,
char *  dest,
int32_t  destCapacity,
Edits *  edits,
UErrorCode errorCode 
) [static]

Uppercases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful. When the result would be longer than destCapacity, the full length is returned and a U_BUFFER_OVERFLOW_ERROR is set.
See also:
ucasemap_utf8ToUpper
Stable:
ICU 59
static void icu::final::utf8ToUpper ( const char *  locale,
uint32_t  options,
StringPiece  src,
ByteSink sink,
Edits *  edits,
UErrorCode errorCode 
) [static]

Uppercases a UTF-8 string and optionally records edits.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original.

Parameters:
locale The locale ID. ("" = root locale, NULL = default locale.)
options Options bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
src The original string.
sink A ByteSink to which the result string is written. sink.Flush() is called at the end.
edits Records edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be NULL.
errorCode Reference to an in/out error code value which must not indicate a failure before the function call.
See also:
ucasemap_utf8ToUpper
Stable:
ICU 60

The documentation for this class was generated from the following files:

Generated on 3 Aug 2020 for ICU 67.1 by  doxygen 1.6.1