Working with user-centric fields in WordPress – such as input elements, textarea elements, or any type of field in which a user can supply their own values is a place that should always be a target of sanitization.

Sanitizing URLs in WordPress with Its API

Fortunately, the WordPress API provides a number of functions to help with this. Depending on your use case, you may need to do one of the following:

And those are all well and good but there are also ways in which you can work to sanitize the data using functions provided by PHP.

Sure, sometimes regular expressions are the way to go but, other times, you may want to use facilities that are built into the language, easier to understand, and easier to follow.

When writing my own code (and when reviewing others) I try to keep that in mind. So with that said, here’s a process that you can use that may make your efforts easier when working with URLs in WordPress.

Sanitizing URLs in WordPress

If you’re not up for the full discussion about this, you can skip down to the heading at the bottom of the post that outlines the code and how to use it in your work.

Let’s say that you have an open input element and you want to allow the user to provide a URL that will eventually be rendered on the front-end perhaps in a link, in some type of schema, or simply as-is.

It’s possible to be really aggressive and simply do something like “If this isn’t a valid URL, then don’t save it.”

And, to be clear, I’m not saying a valid URL doesn’t return a 404 (because sometimes pages exist and years later they don’t). I’m defining a valid URL as one that’s peroperly formatted and doesn’t inclucide any extraneous information.

So say, for the sake of this simple example, you’re offering a text field that’s eventually going to be saved to the post metadata table. Further, you’re going to strip out anything that’s illegal in a URL and leave only what would create a valid URL.

To do that, I find the following functions to be most useful:

The latter sounds a little confusing because it’s predicated on the idea that you understand what filters actually are. In PHP, filters can be broken down into two use cases:

  1. Validation
  2. Sanitization

And they are essentially pre-built ways for us to process data in such a way that processes data to determine if it’s the type of information, usually a string or an array, for which we’re looking.

Putting It To Work

With that in mind, here’s how you might try to save information to the database without any type of sanitization (which is a bad thing):

With the code above, the user can literally enter anything into the field and have it saved to the database. This is why sanitization is important. Without it, the user can reak havoc on the the user experience or the entire WordPress installation.

So what’s it look like to apply the code from above when sanitizing URLs? Generally speaking, it looks like this:

First, the code is run through a PHP filter that validates a URL. If the string passed into the filter function doesn’t even work as a URL, then the function will return false.

If, on the other hand, it does work as a valid URL, then we can strip any slashes that are not needed. This means that we can “unquote any quoted strings.” More precisely this will strip the first consecutive backslash (because, in PHP, backslashes are used to escape a backslash). Clear? 🙂

Finally, we opt to strip the tags because we only want the URL itself. We don’t want any markup, tags, or anything that might be able to sabotage the data being written into the database. This means that if you opt to run something like:

https://tommcfarlin.com/<script type="text/javascript">alert(\'hello world!\');</script>

You will be left with:

"https://tommcfarlin.com/alert('hello world!');"

So putting all of the above code together using this string:

https://tommcfarlin.com/<script type="text/javascript">alert(\'hello world!\');</script>'

Will result in the following output:

https://tommcfarlin.com/alert('hello world!');

This is obviously not a valid URL but it’s clean, safe, and allows you to perform any other work you may need to do to validate the URL is safe for the user.