on AI in audiobooks

audiobooks design software thoughts voiceover

I have been avoiding writing about AI (Artificial Intelligence) and Machine Learning (ML) in audiobooks so far, but with the changes currently underway, not talking about it is no longer an option.

We’ve seen it happen in movies, so we have expected it to come to audiobooks for quite some time: deep fakes of actors who have passed away remind us that there is still an uncanny valley when they appear alongside actors on the screen.

In audiobooks, this would mean that some tech giant with deep pockets would likely do the following:

  1. Figure out who are the narrators that can have a major positive impact upon the sales of a book.
  2. Categorize these top narrators into “types.”
  3. Base these types upon race, gender, age, tone, genre, etc.
  4. Hire at least one narrator from each type to come in and narrate while we not only record their audio, we also analyse everything else going on from pulse rate to body heat to breathing.
  5. Use this data as the basis of an AI/ML model to narrate audiobooks.
  6. Profit.

In addition to the tech giants doing this, there are smaller, more nimble startups who have done a lot to build more flexible artificial datasets. Each of these smaller companies guard their own “secret sauce” very closely, because they know that if they come up with something truly innovative, one of the big companies will buy them out.

As recently as early January, I thought we still had about 2-3 years until the tech would be ready to bring AI audiobooks to market. Then, Apple dropped its bomb, and since then, the landscape changes daily.

AI Cannot Act.

To be clear: AI cannot act. These AI/ML models are not acting. They are word (and emotion) prediction engines. They are making their best guess as to what needs to come next, and they do make mistakes.

AI audiobooks will never be acted. That was never their goal. They are not going after the core audiobook listener who connects with the emotion, the thrill of a story well told. Their goal has only ever been to solve a simple monetary problem.

It can cost about 1/10 as much to generate an AI audiobook as it does to produce one with a human narrator. This of course means that a publisher with this mindset can produce 10 times as many titles for the same budget. And, since more than 90% of the books published every year still do not ever become audiobooks, this casts a much wider net without negatively impacting the bottom line.

How do I know this?

About ten years ago, when I brought my audiobook centric business to a business incubator/accelerator, all of the investors — the money people — wanted me to give up my idea of using tech to facilitate humans telling stories and focus on automatically generating the audiobooks instead.

I turned down a lot of capital investment because I did not want to be part of that future. Yet, here we are…

What do we do now?

To quote Douglas Adams, “Don’t Panic!”

Panic, anger, fear, hatred — all of these will work against us as we try to figure out how best to move forward.

And, I don’t think there is much value in taking a luddite stance against the technology since historically, that has never worked out well for the group doing the protesting.

Rule #1: Adapt

As I was saying in the post Approaching an AI Inflection Point, the only names that are known of actors during periods of earlier technologies are the actors who were able to adapt to the changing world presented to them.

When Charlie Chaplin understood the change that the “Talkies” would bring, he knew that the time of The Little Tramp was at an end. The character needed to be a universal everyman, and Chaplin’s heavy British accent would make being that character in a Talkie impossible without destroying his connection to a global audience.

Mae West transitioned from the Vaudeville and Burlesque stages into the highest grossing movie actress of the decade. Transition is possible. Adaptation is possible. But holding back the march of technology, not so much.

Rule #2: See rule #1

At this point, there really is no rule #2. We are too early in the state of change for anyone to predict where this will all end. Now is the time to explore how each of us might adapt the art of what we do into other venues, other media. Now is a time of experimentation. Now is a time to take some artistic risks, and to learn.

Previous Post Next Post