Create a URL-safe string in MySQL

Given that my native language uses some funny chars like ‘ą’ or ‘ć’ and there are multiple chars that are not safe when placed in the URL, from time to time, there is a need to create a URL-safe string.

Almost every programming language uses its own method to provide such a safe string. The problem is that various applications use my database, and I wanted to have such a utility function in the database itself.

I decided to go the easy way: leave safe characters such as letters and numbers, replace some of the other characters (for instance: ‘ą’ to ‘a’), and change all others to ‘_’. As a result, I created this function:

DROP FUNCTION IF EXISTS `create_safe_string`;

DELIMITER //

CREATE FUNCTION `create_safe_string`(input TEXT)
  RETURNS TEXT
  BEGIN
    DECLARE output TEXT DEFAULT '';
    DECLARE position INT DEFAULT 1;
    DECLARE this_letter VARCHAR(1) DEFAULT '';
    WHILE position < (LENGTH(input) + 1) DO
      SET this_letter = lower(SUBSTRING(input, position, 1));
      CASE 
        -- change "unsafe" chars to the safe ones
        WHEN this_letter = 'ą' THEN SET output = CONCAT(output, 'a');
        WHEN this_letter = 'ć' THEN SET output = CONCAT(output, 'c');
        WHEN this_letter = 'ę' THEN SET output = CONCAT(output, 'e');
        WHEN this_letter = 'ł' THEN SET output = CONCAT(output, 'l');
        WHEN this_letter = 'ń' THEN SET output = CONCAT(output, 'n');
        WHEN this_letter = 'ó' THEN SET output = CONCAT(output, 'o');
        WHEN this_letter = 'ś' THEN SET output = CONCAT(output, 's');
        WHEN this_letter = 'ż' THEN SET output = CONCAT(output, 'z');
        WHEN this_letter = 'ź' THEN SET output = CONCAT(output, 'z');
        -- these letters and numbers should remain unchanged
        WHEN this_letter IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z') THEN SET output = CONCAT(output, this_letter);
        -- everything else should be changed to _
        ELSE 
          -- if the last char is already _ then skip this step
          IF SUBSTRING(output, -1) <> '_'
            THEN SET output = CONCAT(output, '_');
          END IF;
      END CASE;
      SET position = position + 1;
    END WHILE;
    
    -- remove _ on the end and beginning of the output string
    IF SUBSTRING(output, -1) = '_'
    	THEN SET output = SUBSTRING(output, 1, LENGTH(output)-1);
    END IF;
    IF SUBSTRING(output, 1, 1) = '_'
    	THEN SET output = SUBSTRING(output, 2, LENGTH(output));
    END IF;
    
    RETURN output;
  END //

DELIMITER ;

You probably spot two additional things: If there is already ‘_’ placed in the output string as the last character, it is not repeated if there is a new occurrence. I’m also removing this character from the beginning and the end of the output string. This way, the text produced by the function looks better.

In fact, most of the browsers right now is handling the native characters just right. The “unsafe” characters replacement in the function above may not be needed in your case. I had to change them this way because of the methods used by the applications. They tend to remove my native characters from the string.

Leave a Reply

Your email address will not be published. Required fields are marked *