Audiobook Narration and AI

Like everything related to AI, with the ability to mimic human-sounding voices coming to the forefront, audiobook narrators now also feel threatened and, while I understand being afraid of something new and misunderstood, I see it mostly as an extension of our existing abilities and doing more with less.

First of all, I’m sad that the first thing I see in that article is “circling the wagons”. You can’t avoid new technology. It’ll come no matter what. Some form of legislation can be made to restrict research and development but if the technology is not immediately or excessively harmful (like nuclear weapons or engineered viruses), a lot of people will work towards it and, as a person affected, what you really should do is find how to adapt and where you fit in in the new order.

Secondly, if we move towards an industry in which voices are largely AI generated, the voice templates will still have to come from somewhere. Those providing the templates will still be a part of the process. They will be licensing their vocal likeness, like you do with a visual likeness. It will be up to them and their agents to make sure the contract benefits them and that they may have some right of refusal on projects that might not align with their views. And those around the narration will still be required. A director will need to make sure the reading is what the client wants, an audio editor will be needed to make sure it flows or to add effects or music, etcetera, etcetera.

I see it a lot like using any other asset in game creation, both in video games and tabletop games in which I have some experience. As a creator trying to save money, I would definitely go to the asset store or find some stock art to put in my product. Both in tabletop publishing as well as on Steam, I’ve seen many products reusing art and, yes, it paints the product in a specific kind of light but I can’t really fault someone for being short on means. Of course, were it to be possible, I would love to commission bespoke art for my product to make it unique and special, the nuances and the input I can have on it would make it better in every way but not everyone can afford it.

The way I see it working is that narrators would license their voice phonemes to the voice generation technology company. Narrators would recieve a flat fee up front for participating. A book publisher who can’t afford to arrange for a bespoke reading would contract the voice generation company. The AI voice should cost less than hiring a narrator in person but said narrator should get a cut of the contract and because the voice would be used in bulk, it would become a larger sum over time. Because the reading is AI generated, the audiobook will be worth less and be sold for less than one made with a bespoke narrator. The end-of-the-line customer would be made to understand that an audiobook voice is AI generated rather than specifically narrated, and it will be up to them whether they are willing to accept it, like buying a soft cover over a hard cover.

In the end, I think that just like ‘zero budget’, one-person productions on YouTube can exist next to hundred million dollar Holywood productions, so can fully voiced audio productions can exist next to AI generated synthetic audiobooks.


Posted in Practice, Thinking Out Loud by with comments disabled.