Results 1 to 5 of 5

Thread: Character Set Problems when importing Chinese UTF-8 contacts

  1. #1
    hkphooey is offline Sugar Community Member
    Join Date
    Jan 2007
    Posts
    94

    Default Character Set Problems when importing Chinese UTF-8 contacts

    Hi,

    Having some problems importing Chinese Characters. I've set all the langauge preferences to utf-8 i.e. in conf.php and /includes/language/en_us.lang.php

    I can paste chinese characters into any of the text fields relating to a contact and they are stored accurately in the database. They are redisplayed in the Detail view without any problems.

    Now, we have a large file of contacts that we wish to import. The Chinese names display fine in Excel. We export to CSV. If we open the CSV file in a unicode aware editor, the file type is confirmed to be UTF-8. So far so good.

    When we go through the contact import process, the file is somehow converted back to ASCII, and in the import preview the chinese names are displayed as ASCII. eg. 王澤豪 I've checked the source, and the character encoding of the page is still utf-8, so I gather that in the import process the file has got garbled. Not surprisingly, when the final import is performed, the ASCII characters are put into the database, rather than the UTF-8 characters.

    I have re-tried the data, converting the UTF-8 characters into HTML entities. Still the same problem, although obviously instead of the ASCII I get the HTML entity code eg. & # 567878 ;

    Any ideas how to get around this? Or where to start hacking in the code to fix the import process?
    Last edited by hkphooey; 2007-02-15 at 06:35 AM.

  2. #2
    hkphooey is offline Sugar Community Member
    Join Date
    Jan 2007
    Posts
    94

    Default Re: Character Set Problems when importing Chinese UTF-8 contacts

    I've started a Wiki page for Unicode UTF-8 issues, to gather them all together into one place.

    http://www.sugarcrm.com/wiki/index.p...acter_Set_Tips

  3. #3
    hkphooey is offline Sugar Community Member
    Join Date
    Jan 2007
    Posts
    94

    Default Re: Character Set Problems when importing Chinese UTF-8 contacts

    Should I raise this as a bug?

  4. #4
    hkphooey is offline Sugar Community Member
    Join Date
    Jan 2007
    Posts
    94

    Default Re: Character Set Problems when importing Chinese UTF-8 contacts

    Latest inspiration occurred when I read that PHP didn't handle UTF-8 natively. I tried changing a few of the mbstring settings in my php.ini file and restarted apache. MBstrings refers to multibyte strings and looked promising. My phpinfo output looked like this:


    mbstring
    Multibyte Support enabled
    Japanese support enabled
    Simplified chinese support enabled
    Traditional chinese support enabled
    Korean support enabled
    Russian support enabled
    HTTP input encoding translation enabled
    Multibyte (japanese) regex support enabled

    Directive Local Value Master Value
    mbstring.detect_order no value no value
    mbstring.encoding_translation On On
    mbstring.func_overload 0 0
    mbstring.http_input UTF-8 UTF-8
    mbstring.http_output UTF-8 UTF-8
    mbstring.internal_encoding UTF-8 UTF-8
    mbstring.language Neutral Neutral
    mbstring.substitute_character no value no value
    For the record this didn't work. Can anyone confirm they are having problems importing Unicode / UTF-8 characters in any other language? Maybe its just an issue with Chinese/multibyte characters, rather than an issue with Unicode in general.

  5. #5
    hkphooey is offline Sugar Community Member
    Join Date
    Jan 2007
    Posts
    94

    Default Re: Character Set Problems when importing Chinese UTF-8 contacts

    Time to eat crow pie. I thought I'd set all the options to import and export UTF-8 globally. However it seems that the admin user I was logged in as decided not to use the global settings, and was importing as ASCII. I'm not sure how that happened, as I'd checked it several times. Anyway, under My Account, in the top toolbar, I changed it back to UTF-8 and everything works fine now. Damn, I feel stupid.
    Last edited by hkphooey; 2007-03-29 at 01:40 AM. Reason: Wrong

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. ZuckerReports character set problem
    By balou in forum Help
    Replies: 2
    Last Post: 2007-10-17, 03:14 PM
  2. Importing Contacts: Where can I set a universal value?
    By leenwebb in forum Developer Help
    Replies: 3
    Last Post: 2007-01-19, 09:27 PM
  3. Help for importing Contacts
    By krafft in forum Help
    Replies: 10
    Last Post: 2006-10-05, 01:51 PM
  4. Replies: 3
    Last Post: 2006-04-19, 04:15 PM
  5. contacts or accounts when importing from outlook?
    By schiettecatte in forum Help
    Replies: 1
    Last Post: 2005-11-29, 05:14 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •