One of the problems that comes with importing CSV files into WordPress isn’t the process that comes with actually doing it but that comes with the work that happens while processing a large dataset.
That is, on staging and production servers, we are subjected to the constraints of the PHP environment which may end up resulting in PHP timeouts. In order to help combat this, there are a number of things that we can do the first of which is to split up our input file into smaller files (which I discussed in the previous post).
Obviously, the idea behind this particular strategy is to create smaller files that we can import one at a time that won’t result in a PHP timeout issue (since smaller files clearly result in less data to import). Though this comes with another challenge: giving the user visual cues that something happening while they wait for the import process to complete.
There are a few strategies for doing this, but I’ll cover the process that I’ve used at a high-level and at which I’ve found to work well given lots of information and constraints above.
Giving Visual Cues in Importing CSV Files in WordPress
Whenever we talk about showing the user visual cues that something is happening during a process, this can refer to something as simple as showing a spinning wheel (not that we necessarily like that particular strategy), showing a string of increasing decimals much like you’d see in a terminal, a progress bar, or something else.
Regardless, whatever you end up using it comes down to the following process:
- Making asynchronous requests to the server to kick off the process
- Saving the total number of files to be processed
- Saving the total number of files that have been processed
- Making an asynchronous request to the server to get the percentage of completion
Relatively straightforward, isn’t it? Of course, this all depends on how you want to do this. When it comes to working with WordPress, there are two ways in which you can deal with Ajax:
- There’s the standard Ajax API
- There’s the Heartbeat API
Personally, I’m a fan of the first option. Even though we’re going to be polling the server at an interval during the import process and the Heartbeat API is for polling the server at a certain interval, the first option gives a little more control on the client side in terms of setting up our timers, intervals, and when and how we can terminate them based on the state of the import.
So assuming that we’re going to go with the standard Ajax API, here’s how you can manage importing the various files that were creating in using the strategy in the previous article.
1. The Total Number of Files
During the process of breaking up the files, I’ve found it useful to setup a temporary option that includes the total number of files that I created.
Momentarily, we’ll see how this is useful when setting up an Ajax request. But for now, just make sure that you store a value somewhere that you can easily retrieve via Ajax.
2. Start The Machine
In order to begin processing the smaller CSV files, you need to be able to read all of the files in the directory to which you’ve written them, and you need to be able to do this asynchronously.
Again, I’m using the standard WordPress Ajax API so my server side code looks something like this:
As noted above, I’ve only given comments and steps for how to approach this as your mileage and implementation will likely vary based on how you implement the various algorithms. Regardless, the general idea is the same:
- Read the most recent file
- Import it
- Delete it
- Update the number of files by one
We do the last step so that we can calculate the progress in another Ajax request.
3. Making Progress
In order to determine how much progress your script has made, you simply need to take the number of parsed files and divide it by the number of total files then multiple it by 100. If you’re interested in being as precise as possible, this is all you need to do; otherwise, you can just use floor() or ceil() in order to get round numbers to return to the client.
But again, this requires a few things:
- A server-side hook for returning the percentage
- A server-side value for indicating the process is complete
- A client-side timer to poll the server, update progress, and stop when the process has complete
First, here’s how to calculate the percentage of completion on the server-side (as well as define the hook):
The code comments and the code should be pretty clear, but to make sure that nothing’s missing, notice that the we first make sure that the acme-parsed-files
value exists each time this is checked. This is because that somewhere in your code – most likely when the number of parsed files equals the total number of files – you’re going to want to delete the option from the database to clean up after yourself.
What About JavaScript?
In order to keep these posts as lean as I can (I know – they’re not exactly lean, anyway), I’m trying to break down each part of the process into smaller posts so that each article covers one idea.
In this particular article, the primary idea is keeping track of the files you’ve created and the files you’ve parsed.
In the next article, we’ll look at some JavaScript for how to actually communicate with the hooks we’ve defined above, as well as how to setup timers and clear the timers when the process is done.
And for those who are familiar with jQuery Progressbar, we’ll eventually take a look at that, too.
Hmm, weird, seems like my comment on last article does not go through.
I’ve been looking a way to update the AJAX progress dynamically myself from my last project, but the huddle is, I don’t have any value to calculate the progress from. Is there any good solution to get the progress percentage from a long PHP process? I eventually settled with using XHR.onprogress event, but the solution is not quite elegant.
Secondly, what’s your opinion on the performance impact of using update_option and get_option to check the AJAX progress, since in the end of the day, it will still involved with interaction with the database.
Thanks.
Just had not had time to approve and respond to it yet :).
Without knowing more info, it’s hard to say. If the process doesn’t render output, then it’s just a matter of having to wait so you could display some type of loading indicator (like a spinner) just to give some type of feedback, but if it’s possible to poll the server and check some type of value (be it file output, execution time, etc.) then that’d be another way to help calculate it.
Great question.
In this post, I’ve used the options table which – like you’ve noticed – is likely the most expensive route since it’s hitting the database. I don’t mind doing it this way for smaller operations or for examples like this because the values it’s reading are so small.
Another route would be to place it into session variables, transients, or some other type of caching mechanism so that it’ll hit that instead.
In the post, I’m not trying to say that
get_option
is the way. It’s just a way. More technical things can be done, for sure!Hi,
Thanks for your posts. Always a pleasure to read technical posts.
It is better to use nonces when doing AJAX. Moreover there is no checking system in this code for importing/exporting data. I think it’s missing here. What do you think about this?