Unicode Security and Spoofing Detection, C API. More...
#include "unicode/utypes.h"#include "unicode/uset.h"#include "unicode/parseerr.h"#include "unicode/localpointer.h"Go to the source code of this file.
Typedefs | |
| typedef struct USpoofChecker | USpoofChecker |
| typedef for C of USpoofChecker | |
| typedef struct USpoofCheckResult | USpoofCheckResult |
Enumerations | |
| enum | USpoofChecks { USPOOF_SINGLE_SCRIPT_CONFUSABLE = 1, USPOOF_MIXED_SCRIPT_CONFUSABLE = 2, USPOOF_WHOLE_SCRIPT_CONFUSABLE = 4, USPOOF_CONFUSABLE = USPOOF_SINGLE_SCRIPT_CONFUSABLE | USPOOF_MIXED_SCRIPT_CONFUSABLE | USPOOF_WHOLE_SCRIPT_CONFUSABLE, USPOOF_ANY_CASE = 8, USPOOF_RESTRICTION_LEVEL = 16, USPOOF_SINGLE_SCRIPT = USPOOF_RESTRICTION_LEVEL, USPOOF_INVISIBLE = 32, USPOOF_CHAR_LIMIT = 64, USPOOF_MIXED_NUMBERS = 128, USPOOF_HIDDEN_OVERLAY = 256, USPOOF_ALL_CHECKS = 0xFFFF, USPOOF_AUX_INFO = 0x40000000 } |
Enum for the kinds of checks that USpoofChecker can perform. More... | |
| enum | URestrictionLevel { USPOOF_ASCII = 0x10000000, USPOOF_SINGLE_SCRIPT_RESTRICTIVE = 0x20000000, USPOOF_HIGHLY_RESTRICTIVE = 0x30000000, USPOOF_MODERATELY_RESTRICTIVE = 0x40000000, USPOOF_MINIMALLY_RESTRICTIVE = 0x50000000, USPOOF_UNRESTRICTIVE = 0x60000000, USPOOF_RESTRICTION_LEVEL_MASK = 0x7F000000, USPOOF_UNDEFINED_RESTRICTIVE = -1 } |
Constants from UAX #39 for use in uspoof_setRestrictionLevel, and for returned identifier restriction levels in check results. More... | |
Functions | |
| USpoofChecker * | uspoof_open (UErrorCode *status) |
| Create a Unicode Spoof Checker, configured to perform all checks except for USPOOF_LOCALE_LIMIT and USPOOF_CHAR_LIMIT. | |
| USpoofChecker * | uspoof_openFromSerialized (const void *data, int32_t length, int32_t *pActualLength, UErrorCode *pErrorCode) |
| Open a Spoof checker from its serialized form, stored in 32-bit-aligned memory. | |
| USpoofChecker * | uspoof_openFromSource (const char *confusables, int32_t confusablesLen, const char *confusablesWholeScript, int32_t confusablesWholeScriptLen, int32_t *errType, UParseError *pe, UErrorCode *status) |
| Open a Spoof Checker from the source form of the spoof data. | |
| void | uspoof_close (USpoofChecker *sc) |
| Close a Spoof Checker, freeing any memory that was being held by its implementation. | |
| USpoofChecker * | uspoof_clone (const USpoofChecker *sc, UErrorCode *status) |
| Clone a Spoof Checker. | |
Unicode Security and Spoofing Detection, C API.
This class, based on Unicode Technical Report #36 and Unicode Technical Standard #39, has two main functions:
Although originally designed as a method for flagging suspicious identifier strings such as URLs, USpoofChecker has a number of other practical use cases, such as preventing attempts to evade bad-word content filters.
The functions of this class are exposed as C API, with a handful of syntactical conveniences for C++.
The following example shows how to use USpoofChecker to check for confusability between two strings:
{.c}
UErrorCode status = U_ZERO_ERROR;
UChar* str1 = (UChar*) u"Harvest";
UChar* str2 = (UChar*) u"\u0397arvest"; // with U+0397 GREEK CAPITAL LETTER ETA
USpoofChecker* sc = uspoof_open(&status);
uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);
int32_t bitmask = uspoof_areConfusable(sc, str1, -1, str2, -1, &status);
UBool result = bitmask != 0;
// areConfusable: 1 (status: U_ZERO_ERROR)
printf("areConfusable: %d (status: %s)\n", result, u_errorName(status));
uspoof_close(sc);
The call to uspoof_open creates a USpoofChecker object; the call to uspoof_setChecks enables confusable checking and disables all other checks; the call to uspoof_areConfusable performs the confusability test; and the following line extracts the result out of the return value. For best performance, the instance should be created once (e.g., upon application startup), and the efficient uspoof_areConfusable method can be used at runtime.
The type LocalUSpoofCheckerPointer is exposed for C++ programmers. It will automatically call uspoof_close when the object goes out of scope:
{.cpp}
UErrorCode status = U_ZERO_ERROR;
LocalUSpoofCheckerPointer sc(uspoof_open(&status));
uspoof_setChecks(sc.getAlias(), USPOOF_CONFUSABLE, &status);
// ...
UTS 39 defines two strings to be confusable if they map to the same skeleton string. A skeleton can be thought of as a "hash code". uspoof_getSkeleton computes the skeleton for a particular string, so the following snippet is equivalent to the example above:
{.c}
UErrorCode status = U_ZERO_ERROR;
UChar* str1 = (UChar*) u"Harvest";
UChar* str2 = (UChar*) u"\u0397arvest"; // with U+0397 GREEK CAPITAL LETTER ETA
USpoofChecker* sc = uspoof_open(&status);
uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);
// Get skeleton 1
int32_t skel1Len = uspoof_getSkeleton(sc, 0, str1, -1, NULL, 0, &status);
UChar* skel1 = (UChar*) malloc(++skel1Len * sizeof(UChar));
status = U_ZERO_ERROR;
uspoof_getSkeleton(sc, 0, str1, -1, skel1, skel1Len, &status);
// Get skeleton 2
int32_t skel2Len = uspoof_getSkeleton(sc, 0, str2, -1, NULL, 0, &status);
UChar* skel2 = (UChar*) malloc(++skel2Len * sizeof(UChar));
status = U_ZERO_ERROR;
uspoof_getSkeleton(sc, 0, str2, -1, skel2, skel2Len, &status);
// Are the skeletons the same?
UBool result = u_strcmp(skel1, skel2) == 0;
// areConfusable: 1 (status: U_ZERO_ERROR)
printf("areConfusable: %d (status: %s)\n", result, u_errorName(status));
uspoof_close(sc);
free(skel1);
free(skel2);
If you need to check if a string is confusable with any string in a dictionary of many strings, rather than calling uspoof_areConfusable many times in a loop, uspoof_getSkeleton can be used instead, as shown below:
{.c}
UErrorCode status = U_ZERO_ERROR;
#define DICTIONARY_LENGTH 2
UChar* dictionary[DICTIONARY_LENGTH] = { (UChar*) u"lorem", (UChar*) u"ipsum" };
UChar* skeletons[DICTIONARY_LENGTH];
UChar* str = (UChar*) u"1orern";
// Setup:
USpoofChecker* sc = uspoof_open(&status);
uspoof_setChecks(sc, USPOOF_CONFUSABLE, &status);
for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
UChar* word = dictionary[i];
int32_t len = uspoof_getSkeleton(sc, 0, word, -1, NULL, 0, &status);
skeletons[i] = (UChar*) malloc(++len * sizeof(UChar));
status = U_ZERO_ERROR;
uspoof_getSkeleton(sc, 0, word, -1, skeletons[i], len, &status);
}
// Live Check:
{
int32_t len = uspoof_getSkeleton(sc, 0, str, -1, NULL, 0, &status);
UChar* skel = (UChar*) malloc(++len * sizeof(UChar));
status = U_ZERO_ERROR;
uspoof_getSkeleton(sc, 0, str, -1, skel, len, &status);
UBool result = FALSE;
for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
result = u_strcmp(skel, skeletons[i]) == 0;
if (result == TRUE) { break; }
}
// Has confusable in dictionary: 1 (status: U_ZERO_ERROR)
printf("Has confusable in dictionary: %d (status: %s)\n", result, u_errorName(status));
free(skel);
}
for (size_t i=0; i<DICTIONARY_LENGTH; i++) {
free(skeletons[i]);
}
uspoof_close(sc);
Note: Since the Unicode confusables mapping table is frequently updated, confusable skeletons are not guaranteed to be the same between ICU releases. We therefore recommend that you always compute confusable skeletons at runtime and do not rely on creating a permanent, or difficult to update, database of skeletons.
The following snippet shows a minimal example of using USpoofChecker to perform spoof detection on a string:
{.c}
UErrorCode status = U_ZERO_ERROR;
UChar* str = (UChar*) u"p\u0430ypal"; // with U+0430 CYRILLIC SMALL LETTER A
// Get the default set of allowable characters:
USet* allowed = uset_openEmpty();
uset_addAll(allowed, uspoof_getRecommendedSet(&status));
uset_addAll(allowed, uspoof_getInclusionSet(&status));
USpoofChecker* sc = uspoof_open(&status);
uspoof_setAllowedChars(sc, allowed, &status);
uspoof_setRestrictionLevel(sc, USPOOF_MODERATELY_RESTRICTIVE);
int32_t bitmask = uspoof_check(sc, str, -1, NULL, &status);
UBool result = bitmask != 0;
// fails checks: 1 (status: U_ZERO_ERROR)
printf("fails checks: %d (status: %s)\n", result, u_errorName(status));
uspoof_close(sc);
uset_close(allowed);
As in the case for confusability checking, it is good practice to create one USpoofChecker instance at startup, and call the cheaper uspoof_check online. We specify the set of allowed characters to be those with type RECOMMENDED or INCLUSION, according to the recommendation in UTS 39.
In addition to uspoof_check, the function uspoof_checkUTF8 is exposed for UTF8-encoded char* strings, and uspoof_checkUnicodeString is exposed for C++ programmers.
If the USPOOF_AUX_INFO check is enabled, a limited amount of information on why a string failed the checks is available in the returned bitmask. For complete information, use the uspoof_check2 class of functions with a USpoofCheckResult parameter:
{.c}
UErrorCode status = U_ZERO_ERROR;
UChar* str = (UChar*) u"p\u0430ypal"; // with U+0430 CYRILLIC SMALL LETTER A
// Get the default set of allowable characters:
USet* allowed = uset_openEmpty();
uset_addAll(allowed, uspoof_getRecommendedSet(&status));
uset_addAll(allowed, uspoof_getInclusionSet(&status));
USpoofChecker* sc = uspoof_open(&status);
uspoof_setAllowedChars(sc, allowed, &status);
uspoof_setRestrictionLevel(sc, USPOOF_MODERATELY_RESTRICTIVE);
USpoofCheckResult* checkResult = uspoof_openCheckResult(&status);
int32_t bitmask = uspoof_check2(sc, str, -1, checkResult, &status);
int32_t failures1 = bitmask;
int32_t failures2 = uspoof_getCheckResultChecks(checkResult, &status);
assert(failures1 == failures2);
// checks that failed: 0x00000010 (status: U_ZERO_ERROR)
printf("checks that failed: %#010x (status: %s)\n", failures1, u_errorName(status));
// Cleanup:
uspoof_close(sc);
uset_close(allowed);
uspoof_closeCheckResult(checkResult);
C++ users can take advantage of a few syntactical conveniences. The following snippet is functionally equivalent to the one above:
{.cpp}
UErrorCode status = U_ZERO_ERROR;
UnicodeString str((UChar*) u"p\u0430ypal"); // with U+0430 CYRILLIC SMALL LETTER A
// Get the default set of allowable characters:
UnicodeSet allowed;
allowed.addAll(*uspoof_getRecommendedUnicodeSet(&status));
allowed.addAll(*uspoof_getInclusionUnicodeSet(&status));
LocalUSpoofCheckerPointer sc(uspoof_open(&status));
uspoof_setAllowedChars(sc.getAlias(), allowed.toUSet(), &status);
uspoof_setRestrictionLevel(sc.getAlias(), USPOOF_MODERATELY_RESTRICTIVE);
LocalUSpoofCheckResultPointer checkResult(uspoof_openCheckResult(&status));
int32_t bitmask = uspoof_check2UnicodeString(sc.getAlias(), str, checkResult.getAlias(), &status);
int32_t failures1 = bitmask;
int32_t failures2 = uspoof_getCheckResultChecks(checkResult.getAlias(), &status);
assert(failures1 == failures2);
// checks that failed: 0x00000010 (status: U_ZERO_ERROR)
printf("checks that failed: %#010x (status: %s)\n", failures1, u_errorName(status));
// Explicit cleanup not necessary.
The return value is a bitmask of the checks that failed. In this case, there was one check that failed: USPOOF_RESTRICTION_LEVEL, corresponding to the fifth bit (16). The possible checks are:
RESTRICTION_LEVEL: flags strings that violate the Restriction Level test as specified in UTS 39; in most cases, this means flagging strings that contain characters from multiple different scripts. INVISIBLE: flags strings that contain invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark. CHAR_LIMIT: flags strings that contain characters outside of a specified set of acceptable characters. See uspoof_setAllowedChars and uspoof_setAllowedLocales. MIXED_NUMBERS: flags strings that contain digits from multiple different numbering systems. These checks can be enabled independently of each other. For example, if you were interested in checking for only the INVISIBLE and MIXED_NUMBERS conditions, you could do:
{.c}
UErrorCode status = U_ZERO_ERROR;
UChar* str = (UChar*) u"8\u09EA"; // 8 mixed with U+09EA BENGALI DIGIT FOUR
USpoofChecker* sc = uspoof_open(&status);
uspoof_setChecks(sc, USPOOF_INVISIBLE | USPOOF_MIXED_NUMBERS, &status);
int32_t bitmask = uspoof_check2(sc, str, -1, NULL, &status);
UBool result = bitmask != 0;
// fails checks: 1 (status: U_ZERO_ERROR)
printf("fails checks: %d (status: %s)\n", result, u_errorName(status));
uspoof_close(sc);
Here is an example in C++ showing how to compute the restriction level of a string:
{.cpp}
UErrorCode status = U_ZERO_ERROR;
UnicodeString str((UChar*) u"p\u0430ypal"); // with U+0430 CYRILLIC SMALL LETTER A
// Get the default set of allowable characters:
UnicodeSet allowed;
allowed.addAll(*uspoof_getRecommendedUnicodeSet(&status));
allowed.addAll(*uspoof_getInclusionUnicodeSet(&status));
LocalUSpoofCheckerPointer sc(uspoof_open(&status));
uspoof_setAllowedChars(sc.getAlias(), allowed.toUSet(), &status);
uspoof_setRestrictionLevel(sc.getAlias(), USPOOF_MODERATELY_RESTRICTIVE);
uspoof_setChecks(sc.getAlias(), USPOOF_RESTRICTION_LEVEL | USPOOF_AUX_INFO, &status);
LocalUSpoofCheckResultPointer checkResult(uspoof_openCheckResult(&status));
int32_t bitmask = uspoof_check2UnicodeString(sc.getAlias(), str, checkResult.getAlias(), &status);
URestrictionLevel restrictionLevel = uspoof_getCheckResultRestrictionLevel(checkResult.getAlias(), &status);
// Since USPOOF_AUX_INFO was enabled, the restriction level is also available in the upper bits of the bitmask:
assert((restrictionLevel & bitmask) == restrictionLevel);
// Restriction level: 0x50000000 (status: U_ZERO_ERROR)
printf("Restriction level: %#010x (status: %s)\n", restrictionLevel, u_errorName(status));
The code '0x50000000' corresponds to the restriction level USPOOF_MINIMALLY_RESTRICTIVE. Since USPOOF_MINIMALLY_RESTRICTIVE is weaker than USPOOF_MODERATELY_RESTRICTIVE, the string fails the check.
Note: The Restriction Level is the most powerful of the checks. The full logic is documented in UTS 39, but the basic idea is that strings are restricted to contain characters from only a single script, except that most scripts are allowed to have Latin characters interspersed. Although the default restriction level is HIGHLY_RESTRICTIVE, it is recommended that users set their restriction level to MODERATELY_RESTRICTIVE, which allows Latin mixed with all other scripts except Cyrillic, Greek, and Cherokee, with which it is often confusable. For more details on the levels, see UTS 39 or URestrictionLevel. The Restriction Level test is aware of the set of allowed characters set in uspoof_setAllowedChars. Note that characters which have script code COMMON or INHERITED, such as numbers and punctuation, are ignored when computing whether a string has multiple scripts.
A USpoofChecker instance may be used repeatedly to perform checks on any number of identifiers.
Thread Safety: The test functions for checking a single identifier, or for testing whether two identifiers are possible confusable, are thread safe. They may called concurrently, from multiple threads, using the same USpoofChecker instance.
More generally, the standard ICU thread safety rules apply: functions that take a const USpoofChecker parameter are thread safe. Those that take a non-const USpoofChecker are not thread safe..
Definition in file uspoof.h.
| typedef struct USpoofChecker USpoofChecker |
| typedef struct USpoofCheckResult USpoofCheckResult |
| enum URestrictionLevel |
Constants from UAX #39 for use in uspoof_setRestrictionLevel, and for returned identifier restriction levels in check results.
| USPOOF_ASCII |
All characters in the string are in the identifier profile and all characters in the string are in the ASCII range.
|
| USPOOF_SINGLE_SCRIPT_RESTRICTIVE |
The string classifies as ASCII-Only, or all characters in the string are in the identifier profile and the string is single-script, according to the definition in UTS 39 section 5.1.
|
| USPOOF_HIGHLY_RESTRICTIVE |
The string classifies as Single Script, or all characters in the string are in the identifier profile and the string is covered by any of the following sets of scripts, according to the definition in UTS 39 section 5.1:.
This is the default restriction in ICU.
|
| USPOOF_MODERATELY_RESTRICTIVE |
The string classifies as Highly Restrictive, or all characters in the string are in the identifier profile and the string is covered by Latin and any one other Recommended or Aspirational script, except Cyrillic, Greek, and Cherokee.
|
| USPOOF_MINIMALLY_RESTRICTIVE |
All characters in the string are in the identifier profile. Allow arbitrary mixtures of scripts.
|
| USPOOF_UNRESTRICTIVE |
Any valid identifiers, including characters outside of the Identifier Profile.
|
| USPOOF_RESTRICTION_LEVEL_MASK |
Mask for selecting the Restriction Level bits from the return value of uspoof_check.
|
| USPOOF_UNDEFINED_RESTRICTIVE |
An undefined restriction level.
|
| enum USpoofChecks |
Enum for the kinds of checks that USpoofChecker can perform.
These enum values are used both to select the set of checks that will be performed, and to report results from the check function.
| USPOOF_SINGLE_SCRIPT_CONFUSABLE |
When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are from the same script, according to UTS 39 section 4.
|
| USPOOF_MIXED_SCRIPT_CONFUSABLE |
When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script, according to UTS 39 section 4.
|
| USPOOF_WHOLE_SCRIPT_CONFUSABLE |
When performing the two-string uspoof_areConfusable test, this flag in the return value indicates that the two strings are visually confusable and that they are not from the same script but both of them are single-script strings, according to UTS 39 section 4.
|
| USPOOF_CONFUSABLE |
Enable this flag in uspoof_setChecks to turn on all types of confusables. You may set the checks to some subset of SINGLE_SCRIPT_CONFUSABLE, MIXED_SCRIPT_CONFUSABLE, or WHOLE_SCRIPT_CONFUSABLE to make uspoof_areConfusable return only those types of confusables.
|
| USPOOF_ANY_CASE |
This flag is deprecated and no longer affects the behavior of SpoofChecker.
|
| USPOOF_RESTRICTION_LEVEL |
Check that an identifier is no looser than the specified RestrictionLevel. The default if uspoof_setRestrictionLevel is not called is HIGHLY_RESTRICTIVE. If USPOOF_AUX_INFO is enabled the actual restriction level of the identifier being tested will also be returned by uspoof_check().
|
| USPOOF_SINGLE_SCRIPT |
Check that an identifier contains only characters from a single script (plus chars from the common and inherited scripts. ) Applies to checks of a single identifier check only.
|
| USPOOF_INVISIBLE |
Check an identifier for the presence of invisible characters, such as zero-width spaces, or character sequences that are likely not to display, such as multiple occurrences of the same non-spacing mark. This check does not test the input string as a whole for conformance to any particular syntax for identifiers. |
| USPOOF_CHAR_LIMIT |
Check that an identifier contains only characters from a specified set of acceptable characters. See uspoof_setAllowedChars and uspoof_setAllowedLocales. Note that a string that fails this check will also fail the USPOOF_RESTRICTION_LEVEL check. |
| USPOOF_MIXED_NUMBERS |
Check that an identifier does not mix numbers from different numbering systems. For more information, see UTS 39 section 5.3.
|
| USPOOF_HIDDEN_OVERLAY |
Check that an identifier does not have a combining character following a character in which that combining character would be hidden; for example 'i' followed by a U+0307 combining dot. More specifically, the following characters are forbidden from preceding a U+0307:
In addition, combining characters are allowed between the above characters and U+0307 except those with combining class 0 or combining class "Above" (230, same class as U+0307). This list and the number of combing characters considered by this check may grow over time.
|
| USPOOF_ALL_CHECKS |
Enable all spoof checks.
|
| USPOOF_AUX_INFO |
Enable the return of auxillary (non-error) information in the upper bits of the check results value. If this "check" is not enabled, the results of uspoof_check will be zero when an identifier passes all of the enabled checks. If this "check" is enabled, (uspoof_check() & USPOOF_ALL_CHECKS) will be zero when an identifier passes all checks.
|
| USpoofChecker* uspoof_clone | ( | const USpoofChecker * | sc, | |
| UErrorCode * | status | |||
| ) |
Clone a Spoof Checker.
The clone will be set to perform the same checks as the original source.
| sc | The source USpoofChecker | |
| status | The error code, set if this function encounters a problem. |
| void uspoof_close | ( | USpoofChecker * | sc | ) |
Close a Spoof Checker, freeing any memory that was being held by its implementation.
| USpoofChecker* uspoof_open | ( | UErrorCode * | status | ) |
Create a Unicode Spoof Checker, configured to perform all checks except for USPOOF_LOCALE_LIMIT and USPOOF_CHAR_LIMIT.
Note that additional checks may be added in the future, resulting in the changes to the default checking behavior.
| status | The error code, set if this function encounters a problem. |
| USpoofChecker* uspoof_openFromSerialized | ( | const void * | data, | |
| int32_t | length, | |||
| int32_t * | pActualLength, | |||
| UErrorCode * | pErrorCode | |||
| ) |
Open a Spoof checker from its serialized form, stored in 32-bit-aligned memory.
Inverse of uspoof_serialize(). The memory containing the serialized data must remain valid and unchanged as long as the spoof checker, or any cloned copies of the spoof checker, are in use. Ownership of the memory remains with the caller. The spoof checker (and any clones) must be closed prior to deleting the serialized data.
| data | a pointer to 32-bit-aligned memory containing the serialized form of spoof data | |
| length | the number of bytes available at data; can be more than necessary | |
| pActualLength | receives the actual number of bytes at data taken up by the data; can be NULL | |
| pErrorCode | ICU error code |
| USpoofChecker* uspoof_openFromSource | ( | const char * | confusables, | |
| int32_t | confusablesLen, | |||
| const char * | confusablesWholeScript, | |||
| int32_t | confusablesWholeScriptLen, | |||
| int32_t * | errType, | |||
| UParseError * | pe, | |||
| UErrorCode * | status | |||
| ) |
Open a Spoof Checker from the source form of the spoof data.
The input corresponds to the Unicode data file confusables.txt as described in Unicode UAX #39. The syntax of the source data is as described in UAX #39 for this file, and the content of this file is acceptable input.
The character encoding of the (char *) input text is UTF-8.
| confusables | a pointer to the confusable characters definitions, as found in file confusables.txt from unicode.org. | |
| confusablesLen | The length of the confusables text, or -1 if the input string is zero terminated. | |
| confusablesWholeScript | Deprecated in ICU 58. No longer used. | |
| confusablesWholeScriptLen | Deprecated in ICU 58. No longer used. | |
| errType | In the event of an error in the input, indicates which of the input files contains the error. The value is one of USPOOF_SINGLE_SCRIPT_CONFUSABLE or USPOOF_WHOLE_SCRIPT_CONFUSABLE, or zero if no errors are found. | |
| pe | In the event of an error in the input, receives the position in the input text (line, offset) of the error. | |
| status | an in/out ICU UErrorCode. Among the possible errors is U_PARSE_ERROR, which is used to report syntax errors in the input. |
1.6.1