Unicode expressed like "\u4a44
".
Chinese words located from 0x3400 to 0x9fa5, including simplified and traditional words.
Online tool: Unicode to Chinese covnerter
C#
Stores unicode data as a string, for each unicode in a string lookup the string by hex or integer number to get the right characters to replace them in the string.
string ZHTable = "㐀㐁㐂㐃㐄㐅㐆㐇㐈㐉㐊㐋㐌㐍㐎㐏㐐... ...";
private string GetChar(string unicode)
{
unicode = unicode.Replace("\\u", "");
var code = Convert.ToInt32(unicode, 16);
return ZHTable[code - 13312].ToString();
}
var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");
string result = text;
if (matches.Count > 0) {
foreach (var m in matches) {
var key = m.ToString();
result = result.Replace(key, GetChar(m.ToString()));
}
}
public string ToChinese(string text) {
var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");
string result = text;
if (matches.Count > 0) {
foreach (var m in matches) {
var key = m.ToString();
result = result.Replace(m.ToString(), GetChar(key));
}
}
return result;
}
PHP
Stores unicode data in a dictionary like array, for each unicode in a string look up the dictionary to get the right characters to replace them in the string.
$unicodedata = [
... ...
0x3437 => '㐷',
0x3438 => '㐸',
0x3439 => '㐹',
0x343a => '㐺',
0x343b => '㐻',
0x343c => '㐼',
... ...
];
for number calculate as power of 16;
for a-f, convert to ascii code - 87, so as a=10, f=15, then calculate as power of 16;
for hex letter only has a-f, other than that throw exception.
if($this->is_number($u[$i])){
$val += $u[$i] * pow(16, 3-$i);
}
elseif($this->is_hexletter($u[$i])){
$val += ord($u[$i])-87 * pow(16,3-$i);
}
to match all unicode, which starts with \u and followed by 4 chars of number of a to f letters.
replace these matches with the final char value.
$patten = "/\\u[\da-f]{4}/i";
preg_match_all($patten, $text, $matches);
foreach($matches[0] as $m){
$char = $this->get_char($m);
$text = str_replace('\\'.$m, $char, $text);
}
Web API
As web API, for performance consideration, it is best to request by an unicode array json and returns a dictionary json string to the consumer, in front end using ajax event to replace the unicode.
C#
public Dictionary<string,string> ToChinese(string[] unicodes) {
foreach (var uc in unicodes) {
if (!Regex.IsMatch(uc, @"\\u[a-f0-9]{4}"))
throw new ArgumentException("The array contains non-unicode characters.");
}
Dictionary<string, string> dict = new Dictionary<string, string>();
foreach(var uc in unicodes) {
dict.Add(uc, GetChineseWord(uc).ToString());
}
return dict;
}
//best feed with minimum size: @"\u4e07\u9526\u534e\u4eba"
public string ToChineseAPI(string unicodejson) {
var matches = Regex.Matches(unicodejson, @"\\u[a-f0-9]{4}").Select(x=>x.ToString());
var dict = ToChinese(matches.ToArray());
var entries = dict.Select(d => string.Format("\"{0}\": {1}", d.Key, string.Join(",", d.Value)));
return "{" + string.Join(",", entries) + "}";
}
PHP
function convertjson($text){
$patten = "/\\u[\da-f]{4}/i";
preg_match_all($patten, $text, $matches);
$arr = array();
foreach($matches[0] as $m){
$arr['\\'.$m] = $this->get_char($m);
}
return json_encode($arr, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
}
Comments