Skip to Content

Mysql - Query with trailing spaces

Posted on

WHAT

Today I am confused when I run this query:

[9] pry(main)> User.where(username: "TuanT")
  User Load (0.6ms)  SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND `users`.`username` = 'TuanT'
=> [#<User:0x000000010cadae00
  id: 12,
  username: "TuanT   ",
  password: "$2a$10$/S.28EMKpBYw84OVUQeFquuhJ2xH/zsuIQaph434iKIKA5/tg/SNS",
  created_at: Tue, 30 Nov 2021 04:08:29.000000000 UTC +00:00,
  updated_at: Wed, 16 Nov 2022 06:35:03.000000000 UTC +00:00,
  deleted_at: nil>]
[10] pry(main)>

As you can see, the generated query is exactly what I want:

SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND `users`.`username` = 'TuanT'

However, the user who has username = "TuanT " (with 3 spaces) is returned.

WHY

After some searching, I found interesting things.

MYSQL collations have a pad attribute, which has a value of PAD SPACE or NO PAD.

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings. NO PAD collations treat trailing spaces as significant in comparisons, like any other character. PAD SPACE collations treat trailing spaces as insignificant in comparisons; strings are compared without regard to trailing spaces  https://dev.mysql.com/doc/refman/8.0/en/char.html

mysql> SELECT COLLATION_NAME,
              PAD_ATTRIBUTE
       FROM INFORMATION_SCHEMA.COLLATIONS
       WHERE CHARACTER_SET_NAME = 'utf8mb4'
       AND COLLATION_NAME in ('utf8mb4_unicode_ci','utf8mb4_0900_as_ci');

+--------------------+---------------+
| COLLATION_NAME     | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_as_ci | NO PAD        |
| utf8mb4_unicode_ci | PAD SPACE     |
+--------------------+---------------+

My database is utf8mb4_unicode_ci, so PAD ATTRIBUTE is PAD SPACE => It treats trailing spaces as insignificant in comparisons.

It’s reason why an user who hash username = "TuanT " is returned, although search query is username = "TuanT"

HOW

Solution 1

  • Change to a new collation, which has pad attribute NO PAD.
mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
|          1 |
+------------+
mysql> SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci';
mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
|          0 |
+------------+

Solution 2

  • Use BINARY for query.
[10] pry(main)> User.where("username = BINARY ?", "TuanT")
  User Load (6.0ms)  SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username = BINARY 'TuanT')
=> []
[11] pry(main)> User.where("username = BINARY ?", "TuanT   ")
  User Load (1.1ms)  SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username = BINARY 'TuanT   ')
=> [#<User:0x000000010cb7aba8
  id: 12,
  username: "TuanT   ",
  password: "$2a$10$/S.28EMKpBYw84OVUQeFquuhJ2xH/zsuIQaph434iKIKA5/tg/SNS",
  created_at: Tue, 30 Nov 2021 04:08:29.000000000 UTC +00:00,
  updated_at: Wed, 16 Nov 2022 06:35:03.000000000 UTC +00:00,
  deleted_at: nil]
[12] pry(main)>

Reference

comments powered by Disqus