Colliding Collations in MySQL

Every once in a while I write a MySQL query that fails because of differing collations.  Here’s how to resolve those issues.

First, a little background.

In MySQL, each server, database, table and column can have its own character set and collation. The MySQL 5.0 manual does a terrific job of explaining character sets and collations, so we’ll let them handle that part: http://dev.mysql.com/doc/refman/5.0/en/charset-general.html.

You can see what character sets and collations your server has like so:

mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| binary   | Binary pseudo charset       | binary              |      1 |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+

mysql> show collation like 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         | Yes      |       1 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci  | latin1  | 15 |         | Yes      |       1 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci | latin1  | 48 |         | Yes      |       1 |
| latin1_general_cs | latin1  | 49 |         | Yes      |       1 |
| latin1_spanish_ci | latin1  | 94 |         | Yes      |       1 |
+-------------------+---------+----+---------+----------+---------+

FYI, the _ci, _cs and _bin endings indicate “case insensitive”, “case sensitive” and “binary”, respectively.  Binary is, perforce, case sensitive.

To see the default settings for your server, check out the variables:

mysql> show variables like 'char%';
+--------------------------+------------------------------------------------------+
| Variable_name            | Value                                                |
+--------------------------+------------------------------------------------------+
| character_set_client     | latin1                                               |
| character_set_connection | latin1                                               |
| character_set_database   | latin1                                               |
| character_set_filesystem | binary                                               |
| character_set_results    | latin1                                               |
| character_set_server     | latin1                                               |
| character_set_system     | utf8                                                 |
| character_sets_dir       | /Applications/xampp/xamppfiles/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
8 rows in set (0.01 sec)

mysql> show variables like 'coll%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

So, you can specify the character set and collation right down to the column.  A collation is specific to a given character set, so you can’t compare two fields from different character sets, or from different collations.  In fact, that’s where we started: MySQL rapped my knuckles for trying to compare fields with different collations.

Let’s set up a collation collision for demonstration purposes.

drop table if exists a;
create table a (
	flda1 varchar(255)
) engine=myisam collate=latin1_swedish_ci;

drop table if exists b;
create table b (
	fldb1 varchar(50) not null
) engine=myisam collate=latin1_general_ci;

insert into a values ('aaa');
insert into b values ('bbb');

So if we did a select comparing the two fields, we’d get:

mysql> select * from a, b where a.flda1 = b.fldb1;
ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='
mysql>

Great. Now, to fix it, we want to set the collation of table b to latin1_swedish_ci.  You’ll note that getting a “show create table b”, produces the following DDL script:

CREATE TABLE `b` (
  `fldb1` varchar(50) collate latin1_general_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

Clearly we’ll need to change both the table and the field’s collations:

alter table b
character set latin1 collate latin1_swedish_ci,
modify column fldb1 varchar(50) character set latin1 collate latin1_swedish_ci NOT NULL

You can see the syntax for alter at http://dev.mysql.com/doc/refman/5.0/en/alter-table.html.  After which we get a create script of:

CREATE TABLE `b` (
  `fldb1` varchar(50) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

And now, when we execute the select script:

mysql> select * from a, b where a.flda1 = b.fldb1;
Empty set (0.20 sec)

Our error message is gone and the script actually executes.  Our mission is complete.  It’s Miller time.

8 thoughts on “Colliding Collations in MySQL

  1. Thanks man. I thought this was the problem and chaged the coalition using both phpmyadmin and workbenches options yet it didnt work out. this sql did it. great job.

  2. What do you do if it’s a literal being given the wrong collation type? (esp when you’ve only set defaults to latin1 for server and database?)

  3. Hi,

    I wonder if you can help?
    I’m having an issue trying to set the server character set to Latin1 and the collation to latin1_swedish_ci on windows 8 with MySQL.
    By default the installer is choosing utf8 and when I change it, the server goes into a start-stop bounce indefinitely.
    Any ideas?

    Cheers …J

    • Hi Jay. Sorry I didn’t get back to you sooner. Also, I don’t have any experience running MySQL on Windows after XP. I’d make sure I had the latest version of MySQL and Windows patches, of course. If it was still an issue, I might look at changing one, or more, of the underlying assumptions: Does it *have* to be Windows? What about trying MariaDB?

      Whatever you decide to do, I’d be interested in hearing how you resolved this.

      Best of luck.

Leave a Reply

Your email address will not be published. Required fields are marked *