Locale

A <code>Locale</code> object represents a specific geographical, political, or cultural region. An operation that requires a <code>Locale</code> to perform its task is called <em>locale-sensitive</em> and uses the <code>Locale</code> to tailor information for the user. For example, displaying a number is a locale-sensitive operation— the number should be formatted according to the customs and conventions of the user's native country, region, or culture.

<p> The {@code Locale} class implements IETF BCP 47 which is composed of <a href="http://tools.ietf.org/html/rfc4647">RFC 4647 "Matching of Language Tags"</a> and <a href="http://tools.ietf.org/html/rfc5646">RFC 5646 "Tags for Identifying Languages"</a> with support for the LDML (UTS#35, "Unicode Locale Data Markup Language") BCP 47-compatible extensions for locale data exchange.

<p> A <code>Locale</code> object logically consists of the fields described below.

<dl> <dt><a id="def_language"><b>language</b></a></dt>

<dd>ISO 639 alpha-2 or alpha-3 language code, or registered language subtags up to 8 alpha letters (for future enhancements). When a language has both an alpha-2 code and an alpha-3 code, the alpha-2 code must be used. You can find a full list of valid language codes in the IANA Language Subtag Registry (search for "Type: language"). The language field is case insensitive, but <code>Locale</code> always canonicalizes to lower case.</dd>

<dd>Well-formed language values have the form <code>[a-zA-Z]{2,8}</code>. Note that this is not the full BCP47 language production, since it excludes extlang. They are not needed since modern three-letter language codes replace them.</dd>

<dd>Example: "en" (English), "ja" (Japanese), "kok" (Konkani)</dd>

<dt><a id="def_script"><b>script</b></a></dt>

<dd>ISO 15924 alpha-4 script code. You can find a full list of valid script codes in the IANA Language Subtag Registry (search for "Type: script"). The script field is case insensitive, but <code>Locale</code> always canonicalizes to title case (the first letter is upper case and the rest of the letters are lower case).</dd>

<dd>Well-formed script values have the form <code>[a-zA-Z]{4}</code></dd>

<dd>Example: "Latn" (Latin), "Cyrl" (Cyrillic)</dd>

<dt><a id="def_region"><b>country (region)</b></a></dt>

<dd>ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code. You can find a full list of valid country and region codes in the IANA Language Subtag Registry (search for "Type: region"). The country (region) field is case insensitive, but <code>Locale</code> always canonicalizes to upper case.</dd>

<dd>Well-formed country/region values have the form <code>[a-zA-Z]{2} | [0-9]{3}</code></dd>

<dd>Example: "US" (United States), "FR" (France), "029" (Caribbean)</dd>

<dt><a id="def_variant"><b>variant</b></a></dt>

<dd>Any arbitrary value used to indicate a variation of a <code>Locale</code>. Where there are two or more variant values each indicating its own semantics, these values should be ordered by importance, with most important first, separated by underscore('_'). The variant field is case sensitive.</dd>

<dd>Note: IETF BCP 47 places syntactic restrictions on variant subtags. Also BCP 47 subtags are strictly used to indicate additional variations that define a language or its dialects that are not covered by any combinations of language, script and region subtags. You can find a full list of valid variant codes in the IANA Language Subtag Registry (search for "Type: variant").

<p>However, the variant field in <code>Locale</code> has historically been used for any kind of variation, not just language variations. For example, some supported variants available in Java SE Runtime Environments indicate alternative cultural behaviors such as calendar type or number script. In BCP 47 this kind of information, which does not identify the language, is supported by extension subtags or private use subtags.</dd>

<dd>Well-formed variant values have the form <code>SUBTAG (('_'|'-') SUBTAG)*</code> where <code>SUBTAG = [0-9][0-9a-zA-Z]{3} | [0-9a-zA-Z]{5,8}</code>. (Note: BCP 47 only uses hyphen ('-') as a delimiter, this is more lenient).</dd>

<dd>Example: "polyton" (Polytonic Greek), "POSIX"</dd>

<dt><a id="def_extensions"><b>extensions</b></a></dt>

<dd>A map from single character keys to string values, indicating extensions apart from language identification. The extensions in <code>Locale</code> implement the semantics and syntax of BCP 47 extension subtags and private use subtags. The extensions are case insensitive, but <code>Locale</code> canonicalizes all extension keys and values to lower case. Note that extensions cannot have empty values.</dd>

<dd>Well-formed keys are single characters from the set <code>[0-9a-zA-Z]</code>. Well-formed values have the form <code>SUBTAG ('-' SUBTAG)*</code> where for the key 'x' <code>SUBTAG = [0-9a-zA-Z]{1,8}</code> and for other keys <code>SUBTAG = [0-9a-zA-Z]{2,8}</code> (that is, 'x' allows single-character subtags).</dd>

<dd>Example: key="u"/value="ca-japanese" (Japanese Calendar), key="x"/value="java-1-7"</dd> </dl>

<b>Note:</b> Although BCP 47 requires field values to be registered in the IANA Language Subtag Registry, the <code>Locale</code> class does not provide any validation features. The <code>Builder</code> only checks if an individual field satisfies the syntactic requirement (is well-formed), but does not validate the value itself. See {@link Builder} for details.

<h3><a id="def_locale_extension">Unicode locale/language extension</a></h3>

<p>UTS#35, "Unicode Locale Data Markup Language" defines optional attributes and keywords to override or refine the default behavior associated with a locale. A keyword is represented by a pair of key and type. For example, "nu-thai" indicates that Thai local digits (value:"thai") should be used for formatting numbers (key:"nu").

<p>The keywords are mapped to a BCP 47 extension value using the extension key 'u' ({@link #UNICODE_LOCALE_EXTENSION}). The above example, "nu-thai", becomes the extension "u-nu-thai".

<p>Thus, when a <code>Locale</code> object contains Unicode locale attributes and keywords, <code>getExtension(UNICODE_LOCALE_EXTENSION)</code> will return a string representing this information, for example, "nu-thai". The <code>Locale</code> class also provides {@link #getUnicodeLocaleAttributes}, {@link #getUnicodeLocaleKeys}, and {@link #getUnicodeLocaleType} which allow you to access Unicode locale attributes and key/type pairs directly. When represented as a string, the Unicode Locale Extension lists attributes alphabetically, followed by key/type sequences with keys listed alphabetically (the order of subtags comprising a key's type is fixed when the type is defined)

<p>A well-formed locale key has the form <code>[0-9a-zA-Z]{2}</code>. A well-formed locale type has the form <code>"" | [0-9a-zA-Z]{3,8} ('-' [0-9a-zA-Z]{3,8})*</code> (it can be empty, or a series of subtags 3-8 alphanums in length). A well-formed locale attribute has the form <code>[0-9a-zA-Z]{3,8}</code> (it is a single subtag with the same form as a locale type subtag).

<p>The Unicode locale extension specifies optional behavior in locale-sensitive services. Although the LDML specification defines various keys and values, actual locale-sensitive service implementations in a Java Runtime Environment might not support any particular Unicode locale attributes or key/type pairs.

<h4>Creating a Locale</h4>

<p>There are several different ways to create a <code>Locale</code> object.

<h5>Builder</h5>

<p>Using {@link Builder} you can construct a <code>Locale</code> object that conforms to BCP 47 syntax.

<h5>Constructors</h5>

<p>The <code>Locale</code> class provides three constructors: <blockquote> <pre> {@link #Locale(string language)} {@link #Locale(string language, string country)} {@link #Locale(string language, string country, string variant)} </pre> </blockquote> These constructors allow you to create a <code>Locale</code> object with language, country and variant, but you cannot specify script or extensions.

<h5>Factory Methods</h5>

<p>The method {@link #forLanguageTag} creates a <code>Locale</code> object for a well-formed BCP 47 language tag.

<h5>Locale Constants</h5>

<p>The <code>Locale</code> class provides a number of convenient constants that you can use to create <code>Locale</code> objects for commonly used locales. For example, the following creates a <code>Locale</code> object for the United States: <blockquote> <pre> Locale.US </pre> </blockquote>

<h4><a id="LocaleMatching">Locale Matching</a></h4>

<p>If an application or a system is internationalized and provides localized resources for multiple locales, it sometimes needs to find one or more locales (or language tags) which meet each user's specific preferences. Note that a term "language tag" is used interchangeably with "locale" in this locale matching documentation.

<p>In order to do matching a user's preferred locales to a set of language tags, <a href="http://tools.ietf.org/html/rfc4647">RFC 4647 Matching of Language Tags</a> defines two mechanisms: filtering and lookup. <em>Filtering</em> is used to get all matching locales, whereas <em>lookup</em> is to choose the best matching locale. Matching is done case-insensitively. These matching mechanisms are described in the following sections.

<p>A user's preference is called a <em>Language Priority List</em> and is expressed as a list of language ranges. There are syntactically two types of language ranges: basic and extended. See {@link Locale.LanguageRange Locale.LanguageRange} for details.

<h5>Filtering</h5>

<p>The filtering operation returns all matching language tags. It is defined in RFC 4647 as follows: "In filtering, each language range represents the least specific language tag (that is, the language tag with fewest number of subtags) that is an acceptable match. All of the language tags in the matching set of tags will have an equal or greater number of subtags than the language range. Every non-wildcard subtag in the language range will appear in every one of the matching language tags."

<p>There are two types of filtering: filtering for basic language ranges (called "basic filtering") and filtering for extended language ranges (called "extended filtering"). They may return different results by what kind of language ranges are included in the given Language Priority List. {@link Locale.FilteringMode} is a parameter to specify how filtering should be done.

<h5>Lookup</h5>

<p>The lookup operation returns the best matching language tags. It is defined in RFC 4647 as follows: "By contrast with filtering, each language range represents the most specific tag that is an acceptable match. The first matching tag found, according to the user's priority, is considered the closest match and is the item returned."

<p>For example, if a Language Priority List consists of two language ranges, {@code "zh-Hant-TW"} and {@code "en-US"}, in prioritized order, lookup method progressively searches the language tags below in order to find the best matching language tag. <blockquote> <pre> 1. zh-Hant-TW 2. zh-Hant 3. zh 4. en-US 5. en </pre> </blockquote> If there is a language tag which matches completely to a language range above, the language tag is returned.

<p>{@code "*"} is the special language range, and it is ignored in lookup.

<p>If multiple language tags match as a result of the subtag {@code '*'} included in a language range, the first matching language tag returned by an {@link Iterator} over a {@link Collection} of language tags is treated as the best matching one.

<h4>Use of Locale</h4>

<p>Once you've created a <code>Locale</code> you can query it for information about itself. Use <code>getCountry</code> to get the country (or region) code and <code>getLanguage</code> to get the language code. You can use <code>getDisplayCountry</code> to get the name of the country suitable for displaying to the user. Similarly, you can use <code>getDisplayLanguage</code> to get the name of the language suitable for displaying to the user. Interestingly, the <code>getDisplayXXX</code> methods are themselves locale-sensitive and have two versions: one that uses the default {@link LocaleCategory#DISPLAY DISPLAY} locale and one that uses the locale specified as an argument.

<p>The Java Platform provides a number of classes that perform locale-sensitive operations. For example, the <code>NumberFormat</code> class formats numbers, currency, and percentages in a locale-sensitive manner. Classes such as <code>NumberFormat</code> have several convenience methods for creating a default object of that type. For example, the <code>NumberFormat</code> class provides these three convenience methods for creating a default <code>NumberFormat</code> object: <blockquote> <pre> NumberFormat.getInstance() NumberFormat.getCurrencyInstance() NumberFormat.getPercentInstance() </pre> </blockquote> Each of these methods has two variants; one with an explicit locale and one without; the latter uses the default {@link LocaleCategory#FORMAT FORMAT} locale: <blockquote> <pre> NumberFormat.getInstance(myLocale) NumberFormat.getCurrencyInstance(myLocale) NumberFormat.getPercentInstance(myLocale) </pre> </blockquote> A <code>Locale</code> is the mechanism for identifying the kind of object (<code>NumberFormat</code>) that you would like to get. The locale is <STRONG>just</STRONG> a mechanism for identifying objects, <STRONG>not</STRONG> a container for the objects themselves.

<h4>Compatibility</h4>

<p>In order to maintain compatibility with existing usage, Locale's constructors retain their behavior prior to the Java Runtime Environment version 1.7. The same is largely true for the <code>toString</code> method. Thus Locale objects can continue to be used as they were. In particular, clients who parse the output of toString into language, country, and variant fields can continue to do so (although this is strongly discouraged), although the variant field will have additional information in it if script or extensions are present.

<p>In addition, BCP 47 imposes syntax restrictions that are not imposed by Locale's constructors. This means that conversions between some Locales and BCP 47 language tags cannot be made without losing information. Thus <code>toLanguageTag</code> cannot represent the state of locales whose language, country, or variant do not conform to BCP 47.

<p>Because of these issues, it is recommended that clients migrate away from constructing non-conforming locales and use the <code>forLanguageTag</code> and <code>Locale.Builder</code> APIs instead. Clients desiring a string representation of the complete locale can then always rely on <code>toLanguageTag</code> for this purpose.

<h5><a id="special_cases_constructor">Special cases</a></h5>

<p>For compatibility reasons, two non-conforming locales are treated as special cases. These are <b>{@code ja_JP_JP}</b> and <b>{@code th_TH_TH}</b>. These are ill-formed in BCP 47 since the variants are too short. To ease migration to BCP 47, these are treated specially during construction. These two cases (and only these) cause a constructor to generate an extension, all other values behave exactly as they did prior to Java 7.

<p>Java has used {@code ja_JP_JP} to represent Japanese as used in Japan together with the Japanese Imperial calendar. This is now representable using a Unicode locale extension, by specifying the Unicode locale key {@code ca} (for "calendar") and type {@code japanese}. When the Locale constructor is called with the arguments "ja", "JP", "JP", the extension "u-ca-japanese" is automatically added.

<p>Java has used {@code th_TH_TH} to represent Thai as used in Thailand together with Thai digits. This is also now representable using a Unicode locale extension, by specifying the Unicode locale key {@code nu} (for "number") and value {@code thai}. When the Locale constructor is called with the arguments "th", "TH", "TH", the extension "u-nu-thai" is automatically added.

<h5>Serialization</h5>

<p>During serialization, writeObject writes all fields to the output stream, including extensions.

<p>During deserialization, readResolve adds extensions as described in <a href="#special_cases_constructor">Special Cases</a>, only for the two cases th_TH_TH and ja_JP_JP.

<h5>Legacy language codes</h5>

<p>Locale's constructor has always converted three language codes to their earlier, obsoleted forms: {@code he} maps to {@code iw}, {@code yi} maps to {@code ji}, and {@code id} maps to {@code in}. This continues to be the case, in order to not break backwards compatibility.

<p>The APIs added in 1.7 map between the old and new language codes, maintaining the old codes internal to Locale (so that <code>getLanguage</code> and <code>toString</code> reflect the old code), but using the new codes in the BCP 47 language tag APIs (so that <code>toLanguageTag</code> reflects the new one). This preserves the equivalence between Locales no matter which code or API is used to construct them. Java's default resource bundle lookup mechanism also implements this mapping, so that resources can be named using either convention, see {@link ResourceBundle.Control}.

<h5>Three-letter language/country(region) codes</h5>

<p>The Locale constructors have always specified that the language and the country param be two characters in length, although in practice they have accepted any length. The specification has now been relaxed to allow language codes of two to eight characters and country (region) codes of two to three characters, and in particular, three-letter language codes and three-digit region codes as specified in the IANA Language Subtag Registry. For compatibility, the implementation still does not impose a length constraint.

@see Builder @see ResourceBundle @see java.text.Format @see java.text.NumberFormat @see java.text.Collator @author Mark Davis

final

class Locale {

static Locale ENGLISH();

static Locale FRENCH();

static Locale GERMAN();

static Locale ITALIAN();

static Locale JAPANESE();

static Locale KOREAN();

static Locale CHINESE();

static Locale SIMPLIFIED_CHINESE();

static Locale TRADITIONAL_CHINESE();

static Locale FRANCE();

static Locale GERMANY();

static Locale ITALY();

static Locale JAPAN();

static Locale KOREA();

alias CHINA = SIMPLIFIED_CHINESE;

alias PRC = SIMPLIFIED_CHINESE;

alias TAIWAN = TRADITIONAL_CHINESE;

static Locale UK();

static Locale US();

static Locale CANADA();

static Locale CANADA_FRENCH();

static Locale ROOT();

enum char PRIVATE_USE_EXTENSION;

enum char UNICODE_LOCALE_EXTENSION;

class IsoCountryCode;

this(string language, string country, string variant);

this(string language, string country);

this(string language);

static size_t hashOf(string language, string country, string script, string variant);

static Locale getInstance(string language, string country, string variant);

static Locale getInstance(string language, string script, string country, string variant);

static Locale getDefault();

static Locale getDefault(LocaleCategory category);

static void setDefault(Locale newLocale);

static void setDefault(LocaleCategory category, Locale newLocale);

string getLanguage();

string getScript();

string getCountry();

string getVariant();

bool hasExtensions();

string getUnicodeLocaleType(string key);

string toString();

}

Constructors

this this(string language, string country, string variant): Construct a locale from language, country and variant. This constructor normalizes the language value to lowercase and the country value to uppercase. <p> <b>Note:</b> <ul> <li>ISO 639 is not a stable standard; some of the language codes it defines (specifically "iw", "ji", and "in") have changed. This constructor accepts both the old codes ("iw", "ji", and "in") and the new codes ("he", "yi", and "id"), but all other API on Locale will return only the OLD codes. <li>For backward compatibility reasons, this constructor does not make any syntactic checks on the input. <li>The two cases ("ja", "JP", "JP") and ("th", "TH", "TH") are handled specially, see <a href="#special_cases_constructor">Special Cases</a> for more information. </ul>
this this(string language, string country): Construct a locale from language and country. This constructor normalizes the language value to lowercase and the country value to uppercase. <p> <b>Note:</b> <ul> <li>ISO 639 is not a stable standard; some of the language codes it defines (specifically "iw", "ji", and "in") have changed. This constructor accepts both the old codes ("iw", "ji", and "in") and the new codes ("he", "yi", and "id"), but all other API on Locale will return only the OLD codes. <li>For backward compatibility reasons, this constructor does not make any syntactic checks on the input. </ul>
this this(string language): Construct a locale from a language code. This constructor normalizes the language value to lowercase. <p> <b>Note:</b> <ul> <li>ISO 639 is not a stable standard; some of the language codes it defines (specifically "iw", "ji", and "in") have changed. This constructor accepts both the old codes ("iw", "ji", and "in") and the new codes ("he", "yi", and "id"), but all other API on Locale will return only the OLD codes. <li>For backward compatibility reasons, this constructor does not make any syntactic checks on the input. </ul>

Members

Aliases

CHINA alias CHINA = SIMPLIFIED_CHINESE: Useful constant for country.
PRC alias PRC = SIMPLIFIED_CHINESE: Useful constant for country.
TAIWAN alias TAIWAN = TRADITIONAL_CHINESE: Useful constant for country.

Classes

IsoCountryCode class IsoCountryCode: Enum for specifying the type defined in ISO 3166. This enum is used to retrieve the two-letter ISO3166-1 alpha-2, three-letter ISO3166-1 alpha-3, four-letter ISO3166-3 country codes.

Functions

getCountry string getCountry(): Returns the country/region code for this locale, which should either be the empty string, an uppercase ISO 3166 2-letter code, or a UN M.49 3-digit code.
getLanguage string getLanguage(): Returns the language code of this Locale.
getScript string getScript(): Returns the script for this locale, which should either be the empty string or an ISO 15924 4-letter script code. The first letter is uppercase and the rest are lowercase, for example, 'Latn', 'Cyrl'.
getUnicodeLocaleType string getUnicodeLocaleType(string key): Undocumented in source. Be warned that the author may not have intended to support it.
getVariant string getVariant(): Returns the variant code for this locale.
hasExtensions bool hasExtensions(): Returns {@code true} if this {@code Locale} has any <a href="#def_extensions"> extensions</a>.
toString string toString(): Returns a string representation of this <code>Locale</code> object, consisting of language, country, variant, script, and extensions as below: <blockquote> language ~ "_" ~ country ~ "_" ~ (variant ~ "_#" | "#") + script ~ "_" ~ extensions </blockquote>

Static functions

CANADA Locale CANADA(): Useful constant for country.
CANADA_FRENCH Locale CANADA_FRENCH(): Useful constant for country.
CHINESE Locale CHINESE(): Useful constant for language.
ENGLISH Locale ENGLISH(): Useful constant for language.
FRANCE Locale FRANCE(): Useful constant for country.
FRENCH Locale FRENCH(): Useful constant for language.
GERMAN Locale GERMAN(): Useful constant for language.
GERMANY Locale GERMANY(): Useful constant for country.
ITALIAN Locale ITALIAN(): Useful constant for language.
ITALY Locale ITALY(): Useful constant for country.
JAPAN Locale JAPAN(): Useful constant for country.
JAPANESE Locale JAPANESE(): Useful constant for language.
KOREA Locale KOREA(): Useful constant for country.
KOREAN Locale KOREAN(): Useful constant for language.
ROOT Locale ROOT(): Useful constant for the root locale. The root locale is the locale whose language, country, and variant are empty ("") strings. This is regarded as the base locale of all locales, and is used as the language/country neutral locale for the locale sensitive operations.
SIMPLIFIED_CHINESE Locale SIMPLIFIED_CHINESE(): Useful constant for language.
TRADITIONAL_CHINESE Locale TRADITIONAL_CHINESE(): Useful constant for language.
UK Locale UK(): Useful constant for country.
US Locale US(): Useful constant for country.
getDefault Locale getDefault(): Gets the current value of the default locale for this instance of the Java Virtual Machine. <p> The Java Virtual Machine sets the default locale during startup based on the host environment. It is used by many locale-sensitive methods if no locale is explicitly specified. It can be changed using the {@link #setDefault(java.util.Locale) setDefault} method.
getDefault Locale getDefault(LocaleCategory category): Gets the current value of the default locale for the specified Category for this instance of the Java Virtual Machine. <p> The Java Virtual Machine sets the default locale during startup based on the host environment. It is used by many locale-sensitive methods if no locale is explicitly specified. It can be changed using the setDefault(LocaleCategory, Locale) method.
getInstance Locale getInstance(string language, string country, string variant): Returns a <code>Locale</code> constructed from the given <code>language</code>, <code>country</code> and <code>variant</code>. If the same <code>Locale</code> instance is available in the cache, then that instance is returned. Otherwise, a new <code>Locale</code> instance is created and cached.
getInstance Locale getInstance(string language, string script, string country, string variant): Undocumented in source. Be warned that the author may not have intended to support it.
hashOf size_t hashOf(string language, string country, string script, string variant): Undocumented in source. Be warned that the author may not have intended to support it.
setDefault void setDefault(Locale newLocale): Sets the default locale for this instance of the Java Virtual Machine. This does not affect the host locale. <p> If there is a security manager, its <code>checkPermission</code> method is called with a <code>PropertyPermission("user.language", "write")</code> permission before the default locale is changed. <p> The Java Virtual Machine sets the default locale during startup based on the host environment. It is used by many locale-sensitive methods if no locale is explicitly specified. <p> Since changing the default locale may affect many different areas of functionality, this method should only be used if the caller is prepared to reinitialize locale-sensitive code running within the same Java Virtual Machine. <p> By setting the default locale with this method, all of the default locales for each Category are also set to the specified default locale.
setDefault void setDefault(LocaleCategory category, Locale newLocale): Sets the default locale for the specified Category for this instance of the Java Virtual Machine. This does not affect the host locale. <p> If there is a security manager, its checkPermission method is called with a PropertyPermission("user.language", "write") permission before the default locale is changed. <p> The Java Virtual Machine sets the default locale during startup based on the host environment. It is used by many locale-sensitive methods if no locale is explicitly specified. <p> Since changing the default locale may affect many different areas of functionality, this method should only be used if the caller is prepared to reinitialize locale-sensitive code running within the same Java Virtual Machine.

Variables

PRIVATE_USE_EXTENSION enum char PRIVATE_USE_EXTENSION;: The key for the private use extension ('x').
UNICODE_LOCALE_EXTENSION enum char UNICODE_LOCALE_EXTENSION;: The key for Unicode locale extension ('u').