uspoof.h File Reference

Unicode Security and Spoofing Detection, C API. More...

#include "unicode/utypes.h"
#include "unicode/uset.h"
#include "unicode/parseerr.h"
#include "unicode/localpointer.h"

Go to the source code of this file.

Typedefs

typedef struct USpoofChecker USpoofChecker
 typedef for C of USpoofChecker
typedef struct USpoofCheckResult USpoofCheckResult

Enumerations

enum  USpoofChecks {
  USPOOF_SINGLE_SCRIPT_CONFUSABLE = 1, USPOOF_MIXED_SCRIPT_CONFUSABLE = 2, USPOOF_WHOLE_SCRIPT_CONFUSABLE = 4, USPOOF_CONFUSABLE = USPOOF_SINGLE_SCRIPT_CONFUSABLE | USPOOF_MIXED_SCRIPT_CONFUSABLE | USPOOF_WHOLE_SCRIPT_CONFUSABLE,
  USPOOF_ANY_CASE = 8, USPOOF_RESTRICTION_LEVEL = 16, USPOOF_SINGLE_SCRIPT = USPOOF_RESTRICTION_LEVEL, USPOOF_INVISIBLE = 32,
  USPOOF_CHAR_LIMIT = 64, USPOOF_MIXED_NUMBERS = 128, USPOOF_HIDDEN_OVERLAY = 256, USPOOF_ALL_CHECKS = 0xFFFF,
  USPOOF_AUX_INFO = 0x40000000
}
 

Enum for the kinds of checks that USpoofChecker can perform.

More...
enum  URestrictionLevel {
  USPOOF_ASCII = 0x10000000, USPOOF_SINGLE_SCRIPT_RESTRICTIVE = 0x20000000, USPOOF_HIGHLY_RESTRICTIVE = 0x30000000, USPOOF_MODERATELY_RESTRICTIVE = 0x40000000,
  USPOOF_MINIMALLY_RESTRICTIVE = 0x50000000, USPOOF_UNRESTRICTIVE = 0x60000000, USPOOF_RESTRICTION_LEVEL_MASK = 0x7F000000, USPOOF_UNDEFINED_RESTRICTIVE = -1
}
 

Constants from UAX #39 for use in uspoof_setRestrictionLevel, and for returned identifier restriction levels in check results.

More...

Functions

USpoofCheckeruspoof_open (UErrorCode *status)
 Create a Unicode Spoof Checker, configured to perform all checks except for USPOOF_LOCALE_LIMIT and USPOOF_CHAR_LIMIT.
USpoofCheckeruspoof_openFromSerialized (const void *data, int32_t length, int32_t *pActualLength, UErrorCode *pErrorCode)
 Open a Spoof checker from its serialized form, stored in 32-bit-aligned memory.
USpoofCheckeruspoof_openFromSource (const char *confusables, int32_t confusablesLen, const char *confusablesWholeScript, int32_t confusablesWholeScriptLen, int32_t *errType, UParseError *pe, UErrorCode *status)
 Open a Spoof Checker from the source form of the spoof data.
void uspoof_close (USpoofChecker *sc)
 Close a Spoof Checker, freeing any memory that was being held by its implementation.
USpoofCheckeruspoof_clone (const USpoofChecker *sc, UErrorCode *status)
 Clone a Spoof Checker.

Detailed Description

Unicode Security and Spoofing Detection, C API.

This class, based on Unicode Technical Report #36 and Unicode Technical Standard #39, has two main functions:

  1. Checking whether two strings are visually confusable with each other, such as "Harvest" and "arvest", where the second string starts with the Greek capital letter Eta.
  2. Checking whether an individual string is likely to be an attempt at confusing the reader (spoof detection), such as "paypal" with some Latin characters substituted with Cyrillic look-alikes.

Although originally designed as a method for flagging suspicious identifier strings such as URLs, USpoofChecker has a number of other practical use cases, such as preventing attempts to evade bad-word content filters.

The functions of this class are exposed as C API, with a handful of syntactical conveniences for C++.

Confusables

The following example shows how to use USpoofChecker to check for confusability between two strings:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 UChar* str1 = (UChar*) u"Harvest";
 UChar* str2 = (UChar*) u"\u0397arvest";  // with U+0397 GREEK CAPITAL LETTER ETA

 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);

 int32_t bitmask = uspoof_areConfusable(sc, str1, -1, str2, -1, &status);
 UBool result = bitmask != 0;
 // areConfusable: 1 (status: U_ZERO_ERROR)
 printf("areConfusable: %d (status: %s)\n", result, u_errorName(status));
 uspoof_close(sc);

The call to uspoof_open creates a USpoofChecker object; the call to uspoof_setChecks enables confusable checking and disables all other checks; the call to uspoof_areConfusable performs the confusability test; and the following line extracts the result out of the return value. For best performance, the instance should be created once (e.g., upon application startup), and the efficient uspoof_areConfusable method can be used at runtime.

The type LocalUSpoofCheckerPointer is exposed for C++ programmers. It will automatically call uspoof_close when the object goes out of scope:

 {.cpp}
 UErrorCode status = U_ZERO_ERROR;
 LocalUSpoofCheckerPointer sc(uspoof_open(&status));
 uspoof_setChecks(sc.getAlias(), USPOOF_CONFUSABLE, &status);
 // ...

UTS 39 defines two strings to be confusable if they map to the same skeleton string. A skeleton can be thought of as a "hash code". uspoof_getSkeleton computes the skeleton for a particular string, so the following snippet is equivalent to the example above:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 UChar* str1 = (UChar*) u"Harvest";
 UChar* str2 = (UChar*) u"\u0397arvest";  // with U+0397 GREEK CAPITAL LETTER ETA

 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);

 // Get skeleton 1
 int32_t skel1Len = uspoof_getSkeleton(sc, 0, str1, -1, NULL, 0, &status);
 UChar* skel1 = (UChar*) malloc(++skel1Len * sizeof(UChar));
 status = U_ZERO_ERROR;
 uspoof_getSkeleton(sc, 0, str1, -1, skel1, skel1Len, &status);

 // Get skeleton 2
 int32_t skel2Len = uspoof_getSkeleton(sc, 0, str2, -1, NULL, 0, &status);
 UChar* skel2 = (UChar*) malloc(++skel2Len * sizeof(UChar));
 status = U_ZERO_ERROR;
 uspoof_getSkeleton(sc, 0, str2, -1, skel2, skel2Len, &status);

 // Are the skeletons the same?
 UBool result = u_strcmp(skel1, skel2) == 0;
 // areConfusable: 1 (status: U_ZERO_ERROR)
 printf("areConfusable: %d (status: %s)\n", result, u_errorName(status));
 uspoof_close(sc);
 free(skel1);
 free(skel2);

If you need to check if a string is confusable with any string in a dictionary of many strings, rather than calling uspoof_areConfusable many times in a loop, uspoof_getSkeleton can be used instead, as shown below:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 #define DICTIONARY_LENGTH 2
 UChar* dictionary[DICTIONARY_LENGTH] = { (UChar*) u"lorem", (UChar*) u"ipsum" };
 UChar* skeletons[DICTIONARY_LENGTH];
 UChar* str = (UChar*) u"1orern";

 // Setup:
 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);
 for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
     UChar* word = dictionary[i];
     int32_t len = uspoof_getSkeleton(sc, 0, word, -1, NULL, 0, &status);
     skeletons[i] = (UChar*) malloc(++len * sizeof(UChar));
     status = U_ZERO_ERROR;
     uspoof_getSkeleton(sc, 0, word, -1, skeletons[i], len, &status);
 }

 // Live Check:
 {
     int32_t len = uspoof_getSkeleton(sc, 0, str, -1, NULL, 0, &status);
     UChar* skel = (UChar*) malloc(++len * sizeof(UChar));
     status = U_ZERO_ERROR;
     uspoof_getSkeleton(sc, 0, str, -1, skel, len, &status);
     UBool result = FALSE;
     for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
         result = u_strcmp(skel, skeletons[i]) == 0;
         if (result == TRUE) { break; }
     }
     // Has confusable in dictionary: 1 (status: U_ZERO_ERROR)
     printf("Has confusable in dictionary: %d (status: %s)\n", result, u_errorName(status));
     free(skel);
 }

 for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
     free(skeletons[i]);
 }
 uspoof_close(sc);

Note: Since the Unicode confusables mapping table is frequently updated, confusable skeletons are not guaranteed to be the same between ICU releases. We therefore recommend that you always compute confusable skeletons at runtime and do not rely on creating a permanent, or difficult to update, database of skeletons.

Spoof Detection

The following snippet shows a minimal example of using USpoofChecker to perform spoof detection on a string:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 UChar* str = (UChar*) u"p\u0430ypal";  // with U+0430 CYRILLIC SMALL LETTER A

 // Get the default set of allowable characters:
 USet* allowed = uset_openEmpty();
 uset_addAll(allowed, uspoof_getRecommendedSet(&status));
 uset_addAll(allowed, uspoof_getInclusionSet(&status));

 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setAllowedChars(sc, allowed, &status);
 uspoof_setRestrictionLevel(sc, USPOOF_MODERATELY_RESTRICTIVE);

 int32_t bitmask = uspoof_check(sc, str, -1, NULL, &status);
 UBool result = bitmask != 0;
 // fails checks: 1 (status: U_ZERO_ERROR)
 printf("fails checks: %d (status: %s)\n", result, u_errorName(status));
 uspoof_close(sc);
 uset_close(allowed);

As in the case for confusability checking, it is good practice to create one USpoofChecker instance at startup, and call the cheaper uspoof_check online. We specify the set of allowed characters to be those with type RECOMMENDED or INCLUSION, according to the recommendation in UTS 39.

In addition to uspoof_check, the function uspoof_checkUTF8 is exposed for UTF8-encoded char* strings, and uspoof_checkUnicodeString is exposed for C++ programmers.

If the USPOOF_AUX_INFO check is enabled, a limited amount of information on why a string failed the checks is available in the returned bitmask. For complete information, use the uspoof_check2 class of functions with a USpoofCheckResult parameter:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 UChar* str = (UChar*) u"p\u0430ypal";  // with U+0430 CYRILLIC SMALL LETTER A

 // Get the default set of allowable characters:
 USet* allowed = uset_openEmpty();
 uset_addAll(allowed, uspoof_getRecommendedSet(&status));
 uset_addAll(allowed, uspoof_getInclusionSet(&status));

 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setAllowedChars(sc, allowed, &status);
 uspoof_setRestrictionLevel(sc, USPOOF_MODERATELY_RESTRICTIVE);

 USpoofCheckResult* checkResult = uspoof_openCheckResult(&status);
 int32_t bitmask = uspoof_check2(sc, str, -1, checkResult, &status);

 int32_t failures1 = bitmask;
 int32_t failures2 = uspoof_getCheckResultChecks(checkResult, &status);
 assert(failures1 == failures2);
 // checks that failed: 0x00000010 (status: U_ZERO_ERROR)
 printf("checks that failed: %#010x (status: %s)\n", failures1, u_errorName(status));

 // Cleanup:
 uspoof_close(sc);
 uset_close(allowed);
 uspoof_closeCheckResult(checkResult);

C++ users can take advantage of a few syntactical conveniences. The following snippet is functionally equivalent to the one above:

 {.cpp}
 UErrorCode status = U_ZERO_ERROR;
 UnicodeString str((UChar*) u"p\u0430ypal");  // with U+0430 CYRILLIC SMALL LETTER A

 // Get the default set of allowable characters:
 UnicodeSet allowed;
 allowed.addAll(*uspoof_getRecommendedUnicodeSet(&status));
 allowed.addAll(*uspoof_getInclusionUnicodeSet(&status));

 LocalUSpoofCheckerPointer sc(uspoof_open(&status));
 uspoof_setAllowedChars(sc.getAlias(), allowed.toUSet(), &status);
 uspoof_setRestrictionLevel(sc.getAlias(), USPOOF_MODERATELY_RESTRICTIVE);

 LocalUSpoofCheckResultPointer checkResult(uspoof_openCheckResult(&status));
 int32_t bitmask = uspoof_check2UnicodeString(sc.getAlias(), str, checkResult.getAlias(), &status);

 int32_t failures1 = bitmask;
 int32_t failures2 = uspoof_getCheckResultChecks(checkResult.getAlias(), &status);
 assert(failures1 == failures2);
 // checks that failed: 0x00000010 (status: U_ZERO_ERROR)
 printf("checks that failed: %#010x (status: %s)\n", failures1, u_errorName(status));

 // Explicit cleanup not necessary.

The return value is a bitmask of the checks that failed. In this case, there was one check that failed: USPOOF_RESTRICTION_LEVEL, corresponding to the fifth bit (16). The possible checks are:

These checks can be enabled independently of each other. For example, if you were interested in checking for only the INVISIBLE and MIXED_NUMBERS conditions, you could do:

 {.c}
 UErrorCode status = U_ZERO_ERROR;
 UChar* str = (UChar*) u"8\u09EA";  // 8 mixed with U+09EA BENGALI DIGIT FOUR

 USpoofChecker* sc = uspoof_open(&status);
 uspoof_setChecks(sc, USPOOF_INVISIBLE | USPOOF_MIXED_NUMBERS, &status);

 int32_t bitmask = uspoof_check2(sc, str, -1, NULL, &status);
 UBool result = bitmask != 0;
 // fails checks: 1 (status: U_ZERO_ERROR)
 printf("fails checks: %d (status: %s)\n", result, u_errorName(status));
 uspoof_close(sc);

Here is an example in C++ showing how to compute the restriction level of a string:

 {.cpp}
 UErrorCode status = U_ZERO_ERROR;
 UnicodeString str((UChar*) u"p\u0430ypal");  // with U+0430 CYRILLIC SMALL LETTER A

 // Get the default set of allowable characters:
 UnicodeSet allowed;
 allowed.addAll(*uspoof_getRecommendedUnicodeSet(&status));
 allowed.addAll(*uspoof_getInclusionUnicodeSet(&status));

 LocalUSpoofCheckerPointer sc(uspoof_open(&status));
 uspoof_setAllowedChars(sc.getAlias(), allowed.toUSet(), &status);
 uspoof_setRestrictionLevel(sc.getAlias(), USPOOF_MODERATELY_RESTRICTIVE);
 uspoof_setChecks(sc.getAlias(), USPOOF_RESTRICTION_LEVEL | USPOOF_AUX_INFO, &status);

 LocalUSpoofCheckResultPointer checkResult(uspoof_openCheckResult(&status));
 int32_t bitmask = uspoof_check2UnicodeString(sc.getAlias(), str, checkResult.getAlias(), &status);

 URestrictionLevel restrictionLevel = uspoof_getCheckResultRestrictionLevel(checkResult.getAlias(), &status);
 // Since USPOOF_AUX_INFO was enabled, the restriction level is also available in the upper bits of the bitmask:
 assert((restrictionLevel & bitmask) == restrictionLevel);
 // Restriction level: 0x50000000 (status: U_ZERO_ERROR)
 printf("Restriction level: %#010x (status: %s)\n", restrictionLevel, u_errorName(status));

The code '0x50000000' corresponds to the restriction level USPOOF_MINIMALLY_RESTRICTIVE. Since USPOOF_MINIMALLY_RESTRICTIVE is weaker than USPOOF_MODERATELY_RESTRICTIVE, the string fails the check.

Note: The Restriction Level is the most powerful of the checks. The full logic is documented in UTS 39, but the basic idea is that strings are restricted to contain characters from only a single script, except that most scripts are allowed to have Latin characters interspersed. Although the default restriction level is HIGHLY_RESTRICTIVE, it is recommended that users set their restriction level to MODERATELY_RESTRICTIVE, which allows Latin mixed with all other scripts except Cyrillic, Greek, and Cherokee, with which it is often confusable. For more details on the levels, see UTS 39 or URestrictionLevel. The Restriction Level test is aware of the set of allowed characters set in uspoof_setAllowedChars. Note that characters which have script code COMMON or INHERITED, such as numbers and punctuation, are ignored when computing whether a string has multiple scripts.

Additional Information

A USpoofChecker instance may be used repeatedly to perform checks on any number of identifiers.

Thread Safety: The test functions for checking a single identifier, or for testing whether two identifiers are possible confusable, are thread safe. They may called concurrently, from multiple threads, using the same USpoofChecker instance.

More generally, the standard ICU thread safety rules apply: functions that take a const USpoofChecker parameter are thread safe. Those that take a non-const USpoofChecker are not thread safe..

Stable:
ICU 4.6

Definition in file uspoof.h.


Typedef Documentation

typedef struct USpoofChecker USpoofChecker

typedef for C of USpoofChecker

Stable:
ICU 4.2

Definition at line 360 of file uspoof.h.

See also:
uspoof_openCheckResult
Stable:
ICU 58

Definition at line 367 of file uspoof.h.


Enumeration Type Documentation

Constants from UAX #39 for use in uspoof_setRestrictionLevel, and for returned identifier restriction levels in check results.

Stable:
ICU 51
See also:
uspoof_setRestrictionLevel
uspoof_check
Enumerator:
USPOOF_ASCII 

All characters in the string are in the identifier profile and all characters in the string are in the ASCII range.

Stable:
ICU 51
USPOOF_SINGLE_SCRIPT_RESTRICTIVE 

The string classifies as ASCII-Only, or all characters in the string are in the identifier profile and the string is single-script, according to the definition in UTS 39 section 5.1.

Stable:
ICU 53
USPOOF_HIGHLY_RESTRICTIVE 

The string classifies as Single Script, or all characters in the string are in the identifier profile and the string is covered by any of the following sets of scripts, according to the definition in UTS 39 section 5.1:.

  • Latin + Han + Bopomofo (or equivalently: Latn + Hanb)
  • Latin + Han + Hiragana + Katakana (or equivalently: Latn + Jpan)
  • Latin + Han + Hangul (or equivalently: Latn +Kore)

This is the default restriction in ICU.

Stable:
ICU 51
USPOOF_MODERATELY_RESTRICTIVE 

The string classifies as Highly Restrictive, or all characters in the string are in the identifier profile and the string is covered by Latin and any one other Recommended or Aspirational script, except Cyrillic, Greek, and Cherokee.

Stable:
ICU 51
USPOOF_MINIMALLY_RESTRICTIVE 

All characters in the string are in the identifier profile.

Allow arbitrary mixtures of scripts.

Stable:
ICU 51
USPOOF_UNRESTRICTIVE 

Any valid identifiers, including characters outside of the Identifier Profile.

Stable:
ICU 51
USPOOF_RESTRICTION_LEVEL_MASK 

Mask for selecting the Restriction Level bits from the return value of uspoof_check.

Stable:
ICU 53
USPOOF_UNDEFINED_RESTRICTIVE 

An undefined restriction level.

Internal:
Do not use. This API is for internal use only.

Definition at line 530 of file uspoof.h.

Enum for the kinds of checks that USpoofChecker can perform.

These enum values are used both to select the set of checks that will be performed, and to report results from the check function.

Stable:
ICU 4.2
Enumerator:
USPOOF_SINGLE_SCRIPT_CONFUSABLE 

When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are from the same script, according to UTS 39 section 4.

See also:
uspoof_areConfusable
Stable:
ICU 4.2
USPOOF_MIXED_SCRIPT_CONFUSABLE 

When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script, according to UTS 39 section 4.

See also:
uspoof_areConfusable
Stable:
ICU 4.2
USPOOF_WHOLE_SCRIPT_CONFUSABLE 

When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script but both of them are single-script strings, according to UTS 39 section 4.

See also:
uspoof_areConfusable
Stable:
ICU 4.2
USPOOF_CONFUSABLE 

Enable this flag in uspoof_setChecks to turn on all types of confusables.

You may set the checks to some subset of SINGLE_SCRIPT_CONFUSABLE, MIXED_SCRIPT_CONFUSABLE, or WHOLE_SCRIPT_CONFUSABLE to make uspoof_areConfusable return only those types of confusables.

See also:
uspoof_areConfusable
uspoof_getSkeleton
Stable:
ICU 58
USPOOF_ANY_CASE 

This flag is deprecated and no longer affects the behavior of SpoofChecker.

Deprecated:
ICU 58 Any case confusable mappings were removed from UTS 39; the corresponding ICU API was deprecated.
USPOOF_RESTRICTION_LEVEL 

Check that an identifier is no looser than the specified RestrictionLevel.

The default if uspoof_setRestrictionLevel is not called is HIGHLY_RESTRICTIVE.

If USPOOF_AUX_INFO is enabled the actual restriction level of the identifier being tested will also be returned by uspoof_check().

See also:
URestrictionLevel
uspoof_setRestrictionLevel
USPOOF_AUX_INFO
Stable:
ICU 51
USPOOF_SINGLE_SCRIPT 

Check that an identifier contains only characters from a single script (plus chars from the common and inherited scripts.

) Applies to checks of a single identifier check only.

Deprecated:
ICU 51 Use RESTRICTION_LEVEL instead.
USPOOF_INVISIBLE 

Check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark.

This check does not test the input string as a whole for conformance to any particular syntax for identifiers.

USPOOF_CHAR_LIMIT 

Check that an identifier contains only characters from a specified set of acceptable characters.

See uspoof_setAllowedChars and uspoof_setAllowedLocales. Note that a string that fails this check will also fail the USPOOF_RESTRICTION_LEVEL check.

USPOOF_MIXED_NUMBERS 

Check that an identifier does not mix numbers from different numbering systems.

For more information, see UTS 39 section 5.3.

Stable:
ICU 51
USPOOF_HIDDEN_OVERLAY 

Check that an identifier does not have a combining character following a character in which that combining character would be hidden; for example 'i' followed by a U+0307 combining dot.

More specifically, the following characters are forbidden from preceding a U+0307:

  • Those with the Soft_Dotted Unicode property (which includes 'i' and 'j')
  • Latin lowercase letter 'l'
  • Dotless 'i' and 'j' ('ı' and 'ȷ', U+0131 and U+0237)
  • Any character whose confusable prototype ends with such a character (Soft_Dotted, 'l', 'ı', or 'ȷ')

In addition, combining characters are allowed between the above characters and U+0307 except those with combining class 0 or combining class "Above" (230, same class as U+0307).

This list and the number of combing characters considered by this check may grow over time.

Draft:
This API may be changed in the future versions and was introduced in ICU 62
USPOOF_ALL_CHECKS 

Enable all spoof checks.

Stable:
ICU 4.6
USPOOF_AUX_INFO 

Enable the return of auxillary (non-error) information in the upper bits of the check results value.

If this "check" is not enabled, the results of uspoof_check will be zero when an identifier passes all of the enabled checks.

If this "check" is enabled, (uspoof_check() & USPOOF_ALL_CHECKS) will be zero when an identifier passes all checks.

Stable:
ICU 51

Definition at line 376 of file uspoof.h.


Function Documentation

USpoofChecker* uspoof_clone ( const USpoofChecker sc,
UErrorCode status 
)

Clone a Spoof Checker.

The clone will be set to perform the same checks as the original source.

Parameters:
sc The source USpoofChecker
status The error code, set if this function encounters a problem.
Returns:
Stable:
ICU 4.2
void uspoof_close ( USpoofChecker sc  ) 

Close a Spoof Checker, freeing any memory that was being held by its implementation.

Stable:
ICU 4.2
USpoofChecker* uspoof_open ( UErrorCode status  ) 

Create a Unicode Spoof Checker, configured to perform all checks except for USPOOF_LOCALE_LIMIT and USPOOF_CHAR_LIMIT.

Note that additional checks may be added in the future, resulting in the changes to the default checking behavior.

Parameters:
status The error code, set if this function encounters a problem.
Returns:
the newly created Spoof Checker
Stable:
ICU 4.2
USpoofChecker* uspoof_openFromSerialized ( const void *  data,
int32_t  length,
int32_t *  pActualLength,
UErrorCode pErrorCode 
)

Open a Spoof checker from its serialized form, stored in 32-bit-aligned memory.

Inverse of uspoof_serialize(). The memory containing the serialized data must remain valid and unchanged as long as the spoof checker, or any cloned copies of the spoof checker, are in use. Ownership of the memory remains with the caller. The spoof checker (and any clones) must be closed prior to deleting the serialized data.

Parameters:
data a pointer to 32-bit-aligned memory containing the serialized form of spoof data
length the number of bytes available at data; can be more than necessary
pActualLength receives the actual number of bytes at data taken up by the data; can be NULL
pErrorCode ICU error code
Returns:
the spoof checker.
See also:
uspoof_open
uspoof_serialize
Stable:
ICU 4.2
USpoofChecker* uspoof_openFromSource ( const char *  confusables,
int32_t  confusablesLen,
const char *  confusablesWholeScript,
int32_t  confusablesWholeScriptLen,
int32_t *  errType,
UParseError pe,
UErrorCode status 
)

Open a Spoof Checker from the source form of the spoof data.

The input corresponds to the Unicode data file confusables.txt as described in Unicode UAX #39. The syntax of the source data is as described in UAX #39 for this file, and the content of this file is acceptable input.

The character encoding of the (char *) input text is UTF-8.

Parameters:
confusables a pointer to the confusable characters definitions, as found in file confusables.txt from unicode.org.
confusablesLen The length of the confusables text, or -1 if the input string is zero terminated.
confusablesWholeScript Deprecated in ICU 58. No longer used.
confusablesWholeScriptLen Deprecated in ICU 58. No longer used.
errType In the event of an error in the input, indicates which of the input files contains the error. The value is one of USPOOF_SINGLE_SCRIPT_CONFUSABLE or USPOOF_WHOLE_SCRIPT_CONFUSABLE, or zero if no errors are found.
pe In the event of an error in the input, receives the position in the input text (line, offset) of the error.
status an in/out ICU UErrorCode. Among the possible errors is U_PARSE_ERROR, which is used to report syntax errors in the input.
Returns:
A spoof checker that uses the rules from the input files.
Stable:
ICU 4.2

Generated on 12 Nov 2018 for ICU 63.1 by  doxygen 1.6.1