URL, URN, URI and URI.escape in Ruby
1. URI, URL & URN
- URL: Tell you how (
scheme
) and where (domain, port, path, query string, fragmentID, ..
) to get something. - URN: simply is a unique name.
- URI is URN or URL => URN defines an item’s identity, while URL provides a method for finding it.
2. Why we need to encode?
URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded.
This means that we need to encode these characters when passing into a URL. Special characters such as &, space, !
when entered in a url need to be escaped, otherwise they may cause unpredictable situations.
3. encode
vs encode_component
encode_component
should be used to encode a URI Component - a string that is supposed to be part of a URL.
encode
should be used to encode a URI or an existing URL.
=> If you have a complete URL, use encode
. But if you have a part of a URL, use encode_component
.
4. warning: URI.escape is obsolete
Actually, URI.escape
(and it alias URI.encode
) have been marked as obsolete for over 10 years now. Now ruby 2.7.0 shows a warning. Other Ruby version just shows it if we run script in a verbose mode.
The reason why this methods is obsolete:
* URI consists many components (like path
or query
), and we don’t want to escape them in the same way. (For example: #
character is fine if it is at the end of URI, but when the same #
is part of user’s input (in search query), we want to encode it to ensure correct interpretation)
* URI.escape
use gsub
for the whole string and doesn’t differentiate between distinct components => We should use another method.
URI.escape
is removed from Ruby 3.0 (PR)
5. URI.escape replacements
There are many ‘recommendations’ for URI.escape
replacement: ERB::Util.#url_encode
, CGI.escape
, URI.encode_www_form
, WEBrick::HTTPUtils.#escape_form
, WEBrick::HTTPUtils.#escape
.
However, after researching, I think that they don’t have the same behavior as URI.escape
. In the end, I decided to use Addressable::URI
.