TL;DR: I listen to quite a few podcasts and want to make sure that I’m backing them up for posterity. There are apps for this, yes, but I’m a fan of reading how other developers work on their projects. I’ve never written about building something from the ground up. It’s personally edifying and it helps others who are also building things, as well.
This is the first post in a series in which I will talk about building a small application (that will eventually be a WordPress plugin) for backing up podcasts as provided by an XML export from Overcast.
This may be a painful read for some who are experienced developers (so maybe don’t read it). Or maybe it won’t be. But one of the things I recently heard, in a podcast no less, sums up both the point of this project and the project I’m working on:
[the podcaster] simply created a type of show he wanted to hear and hope others shared his taste and a similar desire.
I, like many of you, am a fan of listening to podcasts and do so for a significant amount of time during my week. My favorite podcast application is Overcast which provides an XML export of all of the podcasts to which I’m subscribed and to the episodes of each podcast.
For a long time (as in over a year which, given last year, has felt like a long time), I’ve wanted to work on an application for backing up my podcasts with the ultimate goal of turning it into a WordPress plugin.
But then I had the idea that maybe I’d start from scratch. And I don’t mean “software developer scratch.” I mean starting from nothing.
- No web server,
- No database,
- No libraries,
- Just an IDE and PHP,
- And then I’ll go from there.
Like anyone, the time to work on stuff like this is limited, but this is something I want to build for myself. And given that I like to read other people’s experience with doing this kind of stuff, I’m going to share the process from beginning to end.
I hope to document all of the little problems, frustrations, idiosyncrasies, good ideas, bad ideas, and random thing that go throughout the process of putting this thing together.
So if that sounds like something interesting, then feel free to continue reading.
Building Backcast, Let’s Get Started
Naming is Hard (Don’t Spend Too Much Time)
I’d like to have a name that’s kind of clever, that’s related to what I’m doing, but I don’t want to spend too much time coming up with anything.
Since Overcast is my favorite podcast too, why not make a play on a name like that? I want to back things up from Overcast. Is backcast a word? Apparently, “backcasting” is:
Backcasting is a planning method that starts with defining a desirable future and then works backwards to identify policies and programs that will connect that specified future to the present.Backcasting, Wikipedia
Okay, but what about “backcast?”
a short backward and often upward swing of a fishing rod, its line, and its lure in preparation for the cast that immediately follows.Backcast, Dictionary.com
Alright, cool. So it’s a word. It doesn’t have anything to do with what this app will do but Overcast is like Broadcast except it has more to do with weather than audio.
Good enough. Backcast it is.
Hopefully GitHub has a repository name available for backcast. After logging in and looking it up, it does.
So I’ll create the repository and clone a copy of it to my local machine:
And I’ll want to start a
develop branch off of which I can work on features to get things started so I’ll do that with a quick set of commands:
$ git checkout -b develop && git push
Writing Code, But Wait!
I usually set up Composer, various settings, and extensions right now. That’s not the goal though. The goal to write a little bit of code, get something working, and then go from there.
So no more waiting.
The first thing I’m going to need is an actual export of the data to which I subscribe. To do this, I’ll hop over to the Overcast website and login.
Then I’ll go to my Account page and see what options there are for exporting data:
Given that XML is a standard and I’m familiar in working with XML in PHP, I’m going to go with that. (Is there a third-party library outside of SimpleXML for this? I can’t recall so I’ll ask a couple of friends while I’m at it but there doesn’t seem to be.)
I think exploring the option for All data would be fun, but I don’t know of a library at the moment, I’m not interested in writing one, and I want to get from nothing to backing things up as soon as I can.
Plus, with a standardized format it’ll be easier to work with on other platforms if I ever want to do that. Then again, right now, I just want to get it working on the command-line and then eventually build it out as a WordPress plugin (possibly as something that’d play nicely with a headless installation – or both – who knows at this point).
1. Set Up the Initial Script
First, I need to make sure I can retrieve a copy of the XML file that Overcast uses and go from there. And though I’m not going to share my feeds, hopefully the content here will be enough for you to follow along with (or easy enough for you to clone and/or eventually download for your own use).
So how about this: Let’s store the URL to the feed in a constant (I don’t really like constants but, again, write, ship, iterate). I want to create a new branch for this work so I’ll run this in the terminal:
$ git checkout -b feature/download-xml-export
And now I’ll finally write the first lines of code.
First, I’ll need to define the constant which will reference the URL for grabbing the XML file. I’m unsure how this will work regarding authentication but I’ll cross that bridge when I come to it. (The code for this portion will be in GitHub but the gist of it looks like this):
#!/usr/local/bin/php <?php define('XML_EXPORT_URL', 'https://overcast.fm/account/export_opml'); echo XML_EXPORT_URL;
Next, I’ll change the permissions on the file to make it executable:
$ chmod +x backcast.php
Then attempt to run the following in the terminal:
I see the output which I expect, which is the value for the constant. So next up is trying to actually download the export file.
2. Download the XML Export
Because this URL is related to my account, there has to be some type of authentication around this. So let’s see what happens when I attempt to download it just using
wget before I start writing code.
And sure enough, it retrieves a generic to the working directory called
export_opml which essentially take you to this page.
Internet searches have shown that other developers have done similar work to this but expect you to download the export file first. I don’t want to do this and there’s no Overcast API.
After looking at the login code on the Overcast website, it looks like the app uses Apple’s CloudKit framework for allowing login via email and password and via Sign in with Apple.
After inspecting the request through the Network panel in Chrome, it’s possible to achieve what I need to do by using the cookie the site generates but it defeats the purpose as it still requires you to login via the website.
curl 'https://overcast.fm/account/export_opml' \ -H 'Connection: keep-alive' \ -H 'DNT: 1' \ -H 'Upgrade-Insecure-Requests: 1' \ -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36' \ -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9' \ -H 'Sec-Fetch-Site: none' \ -H 'Sec-Fetch-Mode: navigate' \ -H 'Sec-Fetch-Dest: document' \ -H 'Accept-Language: en-US,en;q=0.9' \ -H 'Cookie:o=[redacted]' \ --compressed
At this point, I can go down the rabbit hole of working through this or I can manually download the file and get to parsing it out. Ultimately, that’s the point of the application so I’ll start there and then come back to this later.
So I’m going to download the file via the web interface and drop it into the project directory.
3. Opening the OPML File
Before doing anything else, I’m going to add
.gitignore and make sure that the OPML file is not added as I don’t want others to have access to my feeds. I mean the goal is for an individual to download their own feeds.
$ touch .gitignore
And then I’ll add the following:
From, I need to add the
.gitignore file to the repository. I like to use
git commit (rather than
git commit -m so that I can add more details to my commits rather than just a simple subject; this is a low effort task to which I tip my hat to a fellow colleague, Sal, for urging engineers to do this more).
Now I run
git commit and push the file to GitHub. Since that’s taken care of, I’ll get to work on opening the file.
Since PHP offers a number of ways to read files (namely
file off the top of my head), I want to use the file that’s best suited for parsing HTML.
After a quick search, I find the following:
file— Reads entire file contents into an array of lines.
file_get_contents— Reads entire file contents into a string.
fopen— Opens a file handle that can be manipulated with other library functions, but does no reading or writing itself.
But wait. I’m going to use SimpleXML so I’m going to look into
simplexml_load_file. And this does the following:
Interprets an XML file into an objectPHP Manual
This will do for now so I’m going to roll with it until I hit an unforeseen problem. So I update the script to look like the following:
#!/usr/local/bin/php <?php define('XML_EXPORT_URL', './overcast.opml'); $xml = simplexml_load_file( XML_EXPORT_URL ); var_dump( $xml );
Run this on the command-line and see exactly what it is I hope to see: An object representing all of the podcast to which I subscript neatly organized with their attributes for the name, title, text, URL, etc.
This gives me what I need to build out not only the list, but to begin diving down into each of the podcast feeds and downloading the back catalog of episodes. Further, it gives me what I need to track the times of when things were last updated.
Until Part 2
Given the time it’s taken to research, write code, write this post, and so on, I’m going to have to pause it here and come back.
Remember to follow me on Twitter, Instagram, and on GitHub as well as this blog, of course, to track the progress and I’ll continue to do what I can to document this thing, warts, scattered thoughts, mistakes, and more.
- I should set up GrumPHP to help automate the code quality.
- Is it too late to start adding unit tests? Probably not but would have been nice to get that started first. Then again, this stage is basically a prototype.
- Is it silly to think about sharing some of this on Instagram Stories (or IGTV or whatever people are watching now?) Or maybe YouTube? Does it even make sense to do this on those platform instead of podcasting about the process?
- It probably would be a good idea to talk about actual planning of software projects using a proper process rather than a notebook (that includes a picture of Michael Scott, no less) and a pen but I’ve wanted to build this for so long I just want to write, ship, and iterate.
- I know lots of people start this kind of stuff up with domains and social accounts and all that but I’m just going to use this blog, Instagram, Twitter, and all the pre-existing stuff I already have.
- I need to remember that there are times where I may delete, unsubscribe, etc., from a podcast. Do I want to back those up? Have I deleted them because I finished or did I delete them because I didn’t like them? I’ll come back to this later.