Mysql - Query with trailing spaces
WHAT
Today I am confused when I run this query:
[9] pry(main)> User.where(username: "TuanT")
User Load (0.6ms) SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND `users`.`username` = 'TuanT'
=> [#<User:0x000000010cadae00
id: 12,
username: "TuanT ",
password: "$2a$10$/S.28EMKpBYw84OVUQeFquuhJ2xH/zsuIQaph434iKIKA5/tg/SNS",
created_at: Tue, 30 Nov 2021 04:08:29.000000000 UTC +00:00,
updated_at: Wed, 16 Nov 2022 06:35:03.000000000 UTC +00:00,
deleted_at: nil>]
[10] pry(main)>
As you can see, the generated query is exactly what I want:
SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND `users`.`username` = 'TuanT'
However, the user who has username = "TuanT "
(with 3 spaces) is returned.
WHY
After some searching, I found interesting things.
MYSQL collations have a pad attribute, which has a value of PAD SPACE or NO PAD.
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings. NO PAD collations treat trailing spaces as significant in comparisons, like any other character. PAD SPACE collations treat trailing spaces as insignificant in comparisons; strings are compared without regard to trailing spaces https://dev.mysql.com/doc/refman/8.0/en/char.html
mysql> SELECT COLLATION_NAME,
PAD_ATTRIBUTE
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE CHARACTER_SET_NAME = 'utf8mb4'
AND COLLATION_NAME in ('utf8mb4_unicode_ci','utf8mb4_0900_as_ci');
+--------------------+---------------+
| COLLATION_NAME | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_as_ci | NO PAD |
| utf8mb4_unicode_ci | PAD SPACE |
+--------------------+---------------+
My database is utf8mb4_unicode_ci
, so PAD ATTRIBUTE is PAD SPACE
=> It treats trailing spaces as insignificant in comparisons.
It’s reason why an user who hash username = "TuanT "
is returned, although search query is username = "TuanT"
HOW
Solution 1
- Change to a new collation, which has pad attribute
NO PAD
.
mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
| 1 |
+------------+
mysql> SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci';
mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
| 0 |
+------------+
Solution 2
- Use
BINARY
for query.
[10] pry(main)> User.where("username = BINARY ?", "TuanT")
User Load (6.0ms) SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username = BINARY 'TuanT')
=> []
[11] pry(main)> User.where("username = BINARY ?", "TuanT ")
User Load (1.1ms) SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL AND (username = BINARY 'TuanT ')
=> [#<User:0x000000010cb7aba8
id: 12,
username: "TuanT ",
password: "$2a$10$/S.28EMKpBYw84OVUQeFquuhJ2xH/zsuIQaph434iKIKA5/tg/SNS",
created_at: Tue, 30 Nov 2021 04:08:29.000000000 UTC +00:00,
updated_at: Wed, 16 Nov 2022 06:35:03.000000000 UTC +00:00,
deleted_at: nil]
[12] pry(main)>