Saturday, October 08, 2005

ColdFusion function using RegEX (Regular Expression) to verify special characters in URL

Goal:
Create the ColdFusion function using RegEX (Regular Expression) which verify string (URL) do not use special characters.

Story:
I came across situation when we allowed our clients to enter the URL variables in our system. The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set. We can not have special characters in the URL. There is simple and easy explanation of this issue is at following site by Brian Wilson (http://www.blooberry.com/indexdot/html/topics/urlencoding.htm)

The special characters are as follows:

  • Dollar ("$")
  • Ampersand ("&")
  • Plus ("+")
  • Comma (",")
  • Forward slash/Virgule ("/")
  • Colon (":")
  • Semi-colon (";")
  • Question mark ("?")
  • 'At' symbol ("@")
  • Single Quotation marks ("’")
  • Double Quotation marks ("”")
  • 'Less Than' symbol ("<")
  • 'Greater Than' symbol (">")
  • 'Pound' character ("#")
  • Percent character ("%")
  • Left Curly Brace ("{")
  • Right Curly Brace ("}")
  • Vertical Bar/Pipe ("|")
  • Backslash ("\")
  • Caret ("^")
  • Tilde ("~")
  • Left Square Bracket ("[")
  • Right Square Bracket ("]")
  • Grave Accent ("`")

Result:
I wrote the ColdFusion function using Regular Expression (RegEx) which performs following tasks.

  • Returns (Displays) the error if special character is used in the string.
  • Returns (Displays) the special character to easily identify.

Code:

<!--- Function Begin -->
<cffunction name="CheckSpecialChars" output="false" returntype="struct">
<cfargument name="PageName" type="string" required="yes">

<!--- Creating the return variables. -->
<cfset arguments.SpecialCharsError = 0>
<cfset arguments.SpecialChar = ''>

<!--- Regular Experssion -->
<cfset regex = '[$|&|+|,|/|:|;|=|?|@| |/"|/''|<|>|##|.|%|{|}|\||^|~|\]|\[|`]'>

<!--- Following is the logic to determine the error and special chars -->
<cfif refind(regex, arguments.PageName)>
<cfset arguments.SpecialCharsError = 1>
<cfset r = refind(regex,arguments.PageName)>
<cfif find(" ", mid(arguments.PageName, r, 1))>
<cfset arguments.SpecialChar = '[space]'>
<cfelse>
<cfset arguments.SpecialChar = mid(arguments.PageName, r, 1)>
</cfif>
</cfif>

<!--- Creating the return structure -->
<!--- Returns: 1) Special Char Error 2) Special Char -->
<cfset arguments.retSpecialCharsCheck = StructNew()>
<cfset arguments.retSpecialCharsCheck.SpecialCharsError = arguments.SpecialCharsError>
<cfset arguments.retSpecialCharsCheck.SpecialChar = arguments.SpecialChar>

<!--- Return the Result Variable Structure -->
<cfreturn arguments.retSpecialCharsCheck>
</cffunction>
<!--- Function End -->

Online Reference:
Good Reference of special URL Encoding

http://www.blooberry.com/indexdot/html/topics/urlencoding.htm


RFC URL Encoding
http://www.rfc-editor.org/rfc/rfc1738.txt