IT Blog

  • Blog
  • Technology
    • Technology
    • Architecture
    • CMS
    • CRM
    • Web
    • DotNET
    • Python
    • Database
    • BI
    • Program Language
  • Users
    • Login
    • Register
    • Forgot Password?
  • ENEN
    • 中文中文
    • ENEN
Experience IT
In a World of Technology, People Make the Difference.
  1. Home
  2. Technology
  3. DotNET
  4. Unicode to Chinese conversion notes

Unicode to Chinese conversion notes

2020-09-18 823 Views 0 Like 0 Comments

Unicode expressed like "\u4a44".

Chinese words located from 0x3400 to 0x9fa5, including simplified and traditional words.

Online tool: Unicode to Chinese covnerter

Table of Contents

  • C#
  • PHP
  • Web API
    • C#
    • PHP

C#

Stores unicode data as a string, for each unicode in a string lookup the string by hex or integer number to get the right characters to replace them in the string.

string ZHTable = "㐀㐁㐂㐃㐄㐅㐆㐇㐈㐉㐊㐋㐌㐍㐎㐏㐐... ...";

 

private string GetChar(string unicode)
{
    unicode = unicode.Replace("\\u", "");
    var code = Convert.ToInt32(unicode, 16);
    return ZHTable[code - 13312].ToString();
}
var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");

string result = text;
if (matches.Count > 0) {
    foreach (var m in matches) {
        var key = m.ToString();
        result = result.Replace(key, GetChar(m.ToString()));
    }
}

 

public string ToChinese(string text) {
    var matches = Regex.Matches(text, @"\\u[a-f0-9]{4}");

    string result = text;
    if (matches.Count > 0) {
        foreach (var m in matches) {
            var key = m.ToString();
            result = result.Replace(m.ToString(), GetChar(key));
        }
    }
    return result;
}

PHP

Stores unicode data in a dictionary like array, for each unicode in a string look up the dictionary to get the right characters to replace them in the string.

$unicodedata = [
... ...
0x3437 => '㐷',
0x3438 => '㐸',
0x3439 => '㐹',
0x343a => '㐺',
0x343b => '㐻',
0x343c => '㐼',
... ...
];

for number calculate as power of 16;

for a-f, convert to ascii code - 87, so as a=10, f=15, then calculate as power of 16;

for hex letter only has a-f, other than that throw exception.

if($this->is_number($u[$i])){
	$val += $u[$i] * pow(16, 3-$i);
}
elseif($this->is_hexletter($u[$i])){
	$val += ord($u[$i])-87 * pow(16,3-$i);
}

to match all unicode, which starts with \u and followed by 4 chars of number of a to f letters.
replace these matches with the final char value.

$patten = "/\\u[\da-f]{4}/i";
preg_match_all($patten, $text, $matches);

foreach($matches[0] as $m){
	$char = $this->get_char($m);
	$text = str_replace('\\'.$m, $char, $text);
}

Web API

As web API, for performance consideration, it is best to request by an unicode array json and returns a dictionary json string to the consumer, in front end using ajax event to replace the unicode.

C#

public Dictionary<string,string> ToChinese(string[] unicodes) {
    foreach (var uc in unicodes) {
        if (!Regex.IsMatch(uc, @"\\u[a-f0-9]{4}"))
            throw new ArgumentException("The array contains non-unicode characters.");
    }

    Dictionary<string, string> dict = new Dictionary<string, string>();

    foreach(var uc in unicodes) {
        dict.Add(uc, GetChineseWord(uc).ToString());
    }
    return dict;
}
//best feed with minimum size: @"\u4e07\u9526\u534e\u4eba"
public string ToChineseAPI(string unicodejson) {
    var matches = Regex.Matches(unicodejson, @"\\u[a-f0-9]{4}").Select(x=>x.ToString());

    var dict = ToChinese(matches.ToArray());
    var entries = dict.Select(d => string.Format("\"{0}\": {1}", d.Key, string.Join(",", d.Value)));
    return "{" + string.Join(",", entries) + "}";
}

PHP

function convertjson($text){
    $patten = "/\\u[\da-f]{4}/i";
    preg_match_all($patten, $text, $matches);

    $arr = array();
    foreach($matches[0] as $m){
        $arr['\\'.$m] = $this->get_char($m);
    }

    return json_encode($arr, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
}

 2,892 total views,  4 views today

error
fb-share-icon
Tweet
fb-share-icon
IT Team
Author: IT Team

Tags: Language
Last updated:2020-09-18

IT Team

This person is lazy and left nothing

Like
< Previous
Next >

Comments

Cancel reply
Chinese (Simplified) Chinese (Simplified) Chinese (Traditional) Chinese (Traditional) English English French French German German Japanese Japanese Korean Korean Russian Russian
Newest Hotspots Random
Newest Hotspots Random
Rich editor not working Making web page scroll down automatically Getting data from Dapper result All Unicode Chars How to keep and display contact form 7 data Common Regular Expressions
fix menu arrow Special Links in HTML Show any data in DataTables in WordPress WP UI Design - Language Selection on Menu Bar Fixing Kratos theme multi-language issue Fix checkbox symbol
Categories
  • Architecture
  • BI
  • C#
  • CSS
  • Database
  • DotNET
  • Hosting
  • HTML
  • JavaScript
  • PHP
  • Program Language
  • Python
  • Security
  • SEO
  • Technology
  • Web
  • Wordpress

COPYRIGHT © 2021 Hostlike IT Blog. All rights reserved.

This site is supported by Hostlike.com