Skip to Content

URL, URN, URI and URI.escape in Ruby

Posted on

1. URI, URL & URN

maxresdefault

  • URL: Tell you how (scheme) and where (domain, port, path, query string, fragmentID, ..) to get something.
  • URN: simply is a unique name.
  • URI is URN or URL => URN defines an item’s identity, while URL provides a method for finding it.

2. Why we need to encode?

URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded.

This means that we need to encode these characters when passing into a URL. Special characters such as &, space, ! when entered in a url need to be escaped, otherwise they may cause unpredictable situations.

3. encode vs encode_component

encode_component should be used to encode a URI Component - a string that is supposed to be part of a URL. encode should be used to encode a URI or an existing URL.

=> If you have a complete URL, use encode. But if you have a part of a URL, use encode_component.

4. warning: URI.escape is obsolete

Actually, URI.escape (and it alias URI.encode) have been marked as obsolete for over 10 years now. Now ruby 2.7.0 shows a warning. Other Ruby version just shows it if we run script in a verbose mode.

The reason why this methods is obsolete: * URI consists many components (like path or query), and we don’t want to escape them in the same way. (For example: # character is fine if it is at the end of URI, but when the same # is part of user’s input (in search query), we want to encode it to ensure correct interpretation) * URI.escape use gsub for the whole string and doesn’t differentiate between distinct components => We should use another method.

  • URI.escape is removed from Ruby 3.0 (PR)

5. URI.escape replacements

There are many ‘recommendations’ for URI.escape replacement: ERB::Util.#url_encode, CGI.escape, URI.encode_www_form , WEBrick::HTTPUtils.#escape_form , WEBrick::HTTPUtils.#escape.

However, after researching, I think that they don’t have the same behavior as URI.escape. In the end, I decided to use Addressable::URI.

Reference:

comments powered by Disqus