Artificial Intelligence will inevitably transform the face of every modern industry. And podcasting is no exception. In fact, big changes are already well underway. New AI podcast tools are revolutionising the way some producers edit (and in some cases, even create) content.
So what can these AI podcast tools do? How do they work? And what implications will this tech have for the podcasting industry as it becomes more widely available?
AI Podcast Transcription
AI in podcasting isn’t anything new. Automatic transcription has been available for several years. For example, the Podcast.co platform has an AI tool that allows you to generate full episode transcripts with a single click.
Such features have been made possible thanks to developments in machine learning and natural language processing (NLP). And while speech recognition tech isn’t 100% accurate yet, it continues to improve by the day.
Having easy access to transcripts benefit podcasters in a number of ways. They are particularly handy for quickly reviewing episode content and drafting accurate, time-stamped show notes. And with a bit of editing, automatic transcripts can also be turned into long-form blog posts, helping to funnel SEO traffic back to the original episode recordings.
Transcripts aside though, there haven’t been many other widely-used applications for AI in podcasting. But that’s now changing thanks to a new generation of AI podcast tools being developed by companies like Descript.
Revolutionising Audio Editing Through AI Voice Doubles
Multimedia editing and transcription provider Descript recently announced a redesigned version of its audio editing software designed specifically for podcasters.
Descript Podcast Studio incorporates lots of forward-thinking features and tools. Notably, it gives you the ability to easily edit audio file as if you were editing a word document. Here’s how it works:
Descript turns your audio into text, broken up by who’s speaking, and it then lets you manipulate those audio files as if you were editing a text version of the script in a word processor. Delete a sentence or two, and Descript will automatically shorten the file to make the recording sound smooth and natural.
While this feature is no doubt extremely useful, it was another of Descript’s tools that generated most interest from podcasters following their announcement. Mainly because it has the potential to completely overhaul the traditional editing process.
Descript’s Overdub tool allows you to create an AI podcast voice double that can be used to overdub flubbed words or phrases and can even generate entirely new sentences on its own - all in your voice. Impressive stuff!
Podcasters can train the Overdub machine learning algorithms by reading a series of randomly generate sentences out loud. Once enough input data has been gathered, then it’s ready to use whenever needed.
Ultimately, this will make it easier than ever before for podcasters to make corrections to their content. Most producers would love the ability to create flawless audio without all the tedium of having to cut and paste from loads of different audio files.
Compared to AI podcast tools like transcriptions, this represents a huge leap forward in this space. And for this reason this tech is likely to become commonplace within podcasting as the cost price comes down over the coming years.
But one of the main concerns raised by AI software like Overdub relates to the issue of audio deepfakes.
The Deepfake Threat
Deepfakes are fake videos or audio recordings that look and sound just like the real thing.
Once the remit of Hollywood special effects studios and intelligence agencies like the CIA, today pretty much anyone can get their hands on deepfake software and create convincing fake audio clips in their spare time.
For example, Dessa, a machine learning startup, managed to clone the voice of Joe Rogan. Here are a few samples of the fake Joe Rogan talking about absurdities like sponsoring a hockey team made up of chimps or being a medical expert after hooking up his brain to the internet.
Joe Rogan was a pretty easy target to mimic. He has recorded nearly 1,300 episodes of his podcast, with each one lasting at least a couple of hours, so there’s a lot of audio to use as training data.
But Descript have proven you can create a convincing AI podcast voice double with much less input than that. And this is where the issue lies. It’s very easy to imagine how this technology could be used in malicious and corrosive ways.
Entire election processes could be disrupted by someone dropping a fake audio recording of one of the candidates days before voting starts. Or widespread panic could be created by a fake emergency alert warning an attack was imminent. Fake news is already causing societal problems, and deepfakes could take them to a whole new level.
This issue is something that Descript have obviously thought long and hard about. They’ve built safeguards to prevent their technology from being used in harmful ways. Overdub can only be done for your own voice, and only after going through the live data gathering process. So it can’t be used to create convincing audio deepfakes of other people.
But other companies making similar products in the future may not share Descript’s socially responsible stance. Indeed, it seems highly likely that fake podcast audio will soon begin to proliferate all over the internet.
Many podcasters hold authority and influence, and this is something malicious actors will want to hijack to manipulate opinion and cause confusion. So if you do happen to stumble across an audio clip of your favourite podcast host saying some bizarre or outrageous things, always keep in mind that it might not actually be them!
If this bleak scenario becomes reality, it will create a demand products and solutions that can distinguish real content from deepfakes. And a new game of technological cat and mouse will be born.
The Podcast That AI Built
Moving beyond simply enhancing the podcast editing process, others are working to fully outsource podcast content creation to AI.
PhD student James Ryan is working to build a procedurally generated podcast called Sheldon County. The idea is that the podcast will never sound the same twice.
Every time someone listens, they’ll begin by typing a random number into a website. This will set in motion a series of calculations that will create characters, relationships, jealousies, betrayals, and maybe even a murder or two.
These plot points are then turned into a text narrative, read aloud by a voice synthesizer, and zipped up into an audio file. That’s the goal anyway. But it’ll take a few years before these kinds of AI-created podcasts to reach maturity.
Even so, what Ryan has managed to create so far is impressive:
Rudimentary as it is, this proof of concept offers a little preview of the future. AI-generated podcasts like this will continue to improve over the coming years. But whether they will actually catch on remains to be seen.
The Future of AI Podcast Tools
It’s hard to imagine that vast numbers of people will enjoy tuning to podcasts created by machines. The experience may feel rather soulless. Storytelling is a uniquely human thing, and it seems likely it’ll stay that way.
If that’s the case, then the technology will simply become an interesting novelty. But it’ll be an impressive podcasting advancement nonetheless.