IT Blog

  • Blog
  • Technology
    • Architecture
    • CMS
    • CRM
    • Web
    • DotNET
    • Python
    • Database
    • BI
    • Program Language
  • Users
    • Login
    • Register
    • Forgot Password?
  • ENEN
    • 中文中文
    • ENEN
Experience IT
In a World of Technology, People Make the Difference.
  1. 首页
  2. Technology
  3. DotNET
  4. 正文

Unicode to Chinese conversion notes

2020-09-18 163点热度 0人点赞 0条评论
Loading...

Unicode expressed like "\u4a44".

Chinese words located from 0x3400 to 0x9fa5, including simplified and traditional words.

Online tool: Unicode to Chinese covnerter

C#

Stores unicode data as a string, for each unicode in a string lookup the string by hex or integer number to get the right characters to replace them in the string.

string ZHTable = "㐀㐁㐂㐃㐄㐅㐆㐇㐈㐉㐊㐋㐌㐍㐎㐏㐐... ...";

 

private string GetChar(string unicode)
{
    unicode = unicode.Replace("\\u", "");
    var code = Convert.ToInt32(unicode, 16);
    return ZHTable[code - 13312].ToString();
}
var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");

string result = text;
if (matches.Count > 0) {
    foreach (var m in matches) {
        var key = m.ToString();
        result = result.Replace(key, GetChar(m.ToString()));
    }
}

 

public string ToChinese(string text) {
    var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");

    string result = text;
    if (matches.Count > 0) {
        foreach (var m in matches) {
            var key = m.ToString();
            result = result.Replace(m.ToString(), GetChar(key));
        }
    }
    return result;
}

PHP

Stores unicode data in a dictionary like array, for each unicode in a string look up the dictionary to get the right characters to replace them in the string.

$unicodedata = [
... ...
0x3437 => '㐷',
0x3438 => '㐸',
0x3439 => '㐹',
0x343a => '㐺',
0x343b => '㐻',
0x343c => '㐼',
... ...
];

for number calculate as power of 16;

for a-f, convert to ascii code - 87, so as a=10, f=15, then calculate as power of 16;

for hex letter only has a-f, other than that throw exception.

if($this->is_number($u[$i])){
	$val += $u[$i] * pow(16, 3-$i);
}
elseif($this->is_hexletter($u[$i])){
	$val += ord($u[$i])-87 * pow(16,3-$i);
}

to match all unicode, which starts with \u and followed by 4 chars of number of a to f letters.
replace these matches with the final char value.

$patten = "/\\u[\da-f]{4}/i";
preg_match_all($patten, $text, $matches);

foreach($matches[0] as $m){
	$char = $this->get_char($m);
	$text = str_replace('\\'.$m, $char, $text);
}

Web API

As web API, for performance consideration, it is best to request by an unicode array json and returns a dictionary json string to the consumer, in front end using ajax event to replace the unicode.

C#

public Dictionary<string,string> ToChinese(string[] unicodes) {
    foreach (var uc in unicodes) {
        if (!Regex.IsMatch(uc, @"\\u[a-f0-9]{4}"))
            throw new ArgumentException("The array contains non-unicode characters.");
    }

    Dictionary<string, string> dict = new Dictionary<string, string>();

    foreach(var uc in unicodes) {
        dict.Add(uc, GetChineseWord(uc).ToString());
    }
    return dict;
}
//best feed with minimum size: @"\u4e07\u9526\u534e\u4eba"
public string ToChineseAPI(string unicodejson) {
    var matches = Regex.Matches(unicodejson, @"\\u[a-f0-9]{4}").Select(x=>x.ToString());

    var dict = ToChinese(matches.ToArray());
    var entries = dict.Select(d => string.Format("\"{0}\": {1}", d.Key, string.Join(",", d.Value)));
    return "{" + string.Join(",", entries) + "}";
}

PHP

function convertjson($text){
    $patten = "/\\u[\da-f]{4}/i";
    preg_match_all($patten, $text, $matches);

    $arr = array();
    foreach($matches[0] as $m){
        $arr['\\'.$m] = $this->get_char($m);
    }

    return json_encode($arr, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
}
error
fb-share-icon
Tweet
fb-share-icon
IT Team
Author: IT Team

Loading...
标签: Language
最后更新:2020-09-18

IT Team

stay absorbed stay excellent

点赞
< 上一篇
下一篇 >

文章评论

取消回复
最新 热点 随机
最新 热点 随机
Controling Category List on Sidebar Widget Restoring the Links Manager Adding reCaptcha for user forms in WordPress Scheduling Background Job in WordPress WordPress database access with $wpdb CSS Tricks
Scheduling Background Job in WordPressRestoring the Links Manager恢复链接管理器AdSense合规指南Controling Category List on Sidebar WidgetAdding reCaptcha for user forms in WordPress
Covid-19 open data OneDrive for Business 文件到 Google Drive 文件 Parallax One analysis Apostrophe 2 title font is too big Theme Demo Setup - BeautyPack JQuery this,$(this),$this,$()
Categories
  • Architecture
  • BI
  • C#
  • CSS
  • Database
  • Digital Marketing
  • DotNET
  • Hosting
  • HTML
  • JavaScript
  • PHP
  • Program Language
  • Python
  • Security
  • SEO
  • Technology
  • Web
  • Wordpress

COPYRIGHT © 2021 Hostlike IT Blog. All rights reserved.