A Quick Tip To Extract Data with RegEx

In a recent project, one of the features that I was working on required that the application make periodic calls to a third-party API in order to retrieve a piece of information to be used later throughout the application.

This piece of data changes over time (though the intervals are irregular) and the end point to which the application connects doesn’t return a standard XML, JSON, or the information in any other standard protocol. Instead, it returns a string of mixed HTML and JavaScript.

The piece of key information is prefixed stored in a JavaScript so it’s easy to get the proverbial bearings from the API’s response, but in terms of grabbing the unique data, it requires some work to extract the data with a regular expression.

Getting Unique IDs with a RegEx

The short version is ultimately that we can use preg_match and preg_replace to get the information that we need. How we actually do it depends on the structure of the information that’s returned to us.

Give that the API returns information in a string and the piece of information that we need is in mixed HTML and JavaScript and the information is always set to that of a JavaScript variable, we can assume that the data will include something like this:

From here, we can then setup a regular express to parse any string that’s preceded by that specific variable name:

This will grab the value of the variable. After this, we need to remove all non-alphanumeric characters. This includes anything like semicolons, slashes, and so on. We only need the data that’s composed of letters and numbers:

This particular example may be very, very niche and not something that you’re likely to encounter. But, if so, here’s something of reference. If nothing else, it’s something else that I have to look back and and refer to should I hit a scenario like this again.

A Quick Tip To Extract Data with RegEx

Getting Unique IDs with a RegEx

Leave a Reply Cancel reply

Current Projects