The 2020s have become the age of AI. It seems as if we're seeing new and exciting AI powered programs released daily.
Some of the most exciting developments have come in the AI voice generation space. There are dozens of options that all promise to be the best AI text to speech generator (or at least the best free AI voice generator available).
But which are actually worth using?
Well, you can only truly find through testing! So that's exactly what we've done.
We purchased each of the 10 tools for this demo and ran them through the same audio script test. We also dug deep into each of them to see what they do well, where they fall short, and what sets them apart.
I promise you, no one has actually bought each of these plans and tested them to get the same holistic comparison like we did for you in this article!
This article is focused on using AI text to voice transcription to create content. We aren't reviewing tools like Speechify that are used to read PDFs or other text out loud for you as a productivity hack.
What We're Looking for in AI TTS
There are a number of features that we're looking for.
- Realism. The less robotic sounding, the better. We want smooth pacing, correct inflections and tones that match the subject matter.
- Audio editing correcting. We want to be able to adjust speed of speech and correct pieces of audio we don't like. Ideally, we can quickly regenerate clips that we don't like. Some tools offered a lot in this area, while others keep users closer to the original outputs.
- Voice creation. Some softwares were able to generate completely original AI voices that aren't used by anyone but the creator. It's very cool if you want something that is extremely personal for your brand.
- Voice options. How many voices are there to choose from and how many different "actors" are available? We want to know which genders, ages, accents, and styles we have available as well as things like special reading styles. Some have things like "narrator," "whispering," or "promotional" voice styles.
- Price. Pricing can vary dramatically. Some tools are pay-as-you-go, while others are on a subscription billing program. We know that many creators just want something quick and free, so we paid special attention to the best AI text to voice generators with free plans. Most of the tools had some form of free plans, but most didn't come with a commercial license or the ability to download the files until you paid.
- Output limits. We wanted to know how much text we can input and generate at once per payment cycle.
- Commercial rights. This is a no brainer, but we need to legally be able to use our audio however we want. Do NOT download and use any audio unless you're certain that you have commercial rights to use it.
- Voice cloning. Also known as "deep fakes," this feature lets you create versions of your own voice or the voices of others. This is not available on each of these tools, and the output quality varies as well.
- Stand out features. We wanted to know if there were any features that were exclusive to certain tools. There were, and we made note of them.
How We Tested
We ran the same text to speech requests through each tool, so we could compare the results side by side.
We wanted to test different pronunciations, pacing, and inflections.
We used a sentence that is difficult for AI to create.
"The quick brown fox jumped over the lazy dog."
We also created variations of this to test exclamations and questions.
"Did the quick brown fox jump over the lazy dog?"
And "Wow! The quick brown fox jumped over the lazy dog! Oh my gosh!"
Next, we tested to see if the tools could get abbreviations correct. We gave it the following sentence: “R.S.V.P. for my A.I. webinar A.S.A.P.”
Finally, we wanted to test some words that are commonly mispronounced. With the help of ChatGPT, we compiled a sentence that includes 10 of these difficult words. The sentence is “The mischievous entrepreneur, an epitome of rural quinoa farming, cultivated a chimera of caramel anemones near a nuclear plant, sparking hyperbolic rumors.”
Other Testing Notes
A couple of other notes about how we tested...
#1 We chose male voices with English text to English-speaking voiceovers and American accents
English was by far the most common language supported by all tools we found.
We did our best to create a quick pricing and feature overview of each of the 10 programs we included in this test, which we provide.
We found that the male and female versions were typically the same quality. For brevity, we had to choose one, and we went with male and a standard American accent.
#2 We didn't edit the outputs
We wanted to see which tools provided the best output without a bunch of work. We make notes about tools that can provide better output with a little of tinkering.
That feature is important since no automatic generation has been perfect.
#3 All exports were done in .wav files if possible and with the highest quality possible.
Using .wav is standard for adding audio to videos. The most common use for text to voice seems to be for video voiceovers, so we felt this was the best choice. A couple of tools only exported in .MP3.
All AI Text to Speech Tools Reviewed
Here is the complete list, in no particular order.
Synthesys is more than just an audio program, offering AI-based tools for text-to-image conversion and other multimedia applications. With a wide range of unique tones to choose from, Synthesys provides users with flexibility in crafting their audio content.
Synthesys Pricing: Synthesys has several pricing plans available to meet different user needs:
- Audio Synthesys: $27/month
- Human Studio Synthesys: $36/month
- Audio & Human Studio Synthesys: $52/month
Synthesys Free Plan: Information regarding a free version of Synthesys was not found.
Synthesys API: Synthesys plans to offer an API in the future.
Languages and Voices: Synthesys supports 140 languages and offers 254 voices, giving users a variety of options to choose from.
Voice Cloning Available: No, voice cloning is not available on Synthesys.
- Offers several AI-based tools, including text-to-image conversion
- Provides a wide range of unique tones such as Newscast, Angry, Cheerful, Sad, Excited, Friendly, Terrified, Shouting, Unfriendly, Whispering, and Hopeful
Ideal User: Synthesys is suitable for users who require a variety of AI-based multimedia tools and want the flexibility to choose from a range of unique tones for their audio content.
Learn more at Synthesys.io.
Murf is an all-in-one tool that is best for teams that want to create faceless videos.
Murf Pricing: It offers a basic plan for $19/month, which includes half of the available voices and limited features. The pro plan, at $26/month, includes all available voices, 48 hours of transcription per year, and AI voice changer access. The enterprise plan, at $59/month, is the best value for teams with more than one member, as it includes unlimited voices and 5 users.
Murf Free Plan: Murf offers a free plan with no downloads and 10 minutes of voice generation.
Murf API: Murf does have an API and customization options.
Murf Trust Pilot Rating: Murf has an average Trust Pilot review of 2.9 from 8 reviews. It hosts audio and offers deep faking/voice building capabilities. This is somewhat concerning when compared to other tools on this list that have much higher reviews and many more total.
Languages and Voices: Murf supports 20 languages and offers 120 voices.
Voice Cloning Available: Yes. Also voice changer is available.
Unique Features: Murf was the only tool that included access to a library of stock images, videos. and audio that can be used commercially.
Ideal User: Anyone who is focused on creating video output with audio overtop.
Learn more at Murf.ai.
Listnr is an AI text-to-speech tool that offers a range of affordable pricing plans, catering to individuals, startups, and agencies. With a wide selection of voices and languages, Listnr is suitable for users who need a cost-effective solution for their voiceover needs.
Listnr Pricing: Listnr has multiple pricing plans to accommodate various user requirements:
- Individual plan: $19/month
- Solo plan: $39/month
- Startup plan: $59/month
- Agency plan: $199/monthAdditionally, Listnr offers a one-time payment option for 5,000 words at just $9.
Listnr Free Plan: The free plan provides 1,000 words per month, allowing users to try the platform without committing to a paid plan.
Listnr API: Yes, Listnr offers an API for developers who want to incorporate its text-to-speech features into their applications.
Listnr Trust Pilot Rating: Listnr has a strong Trust Pilot rating of 4.7 from 128 reviews, indicating high user satisfaction and reliability.
Languages and Voices: Listnr supports 75 languages and offers over 600 voices, providing users with a diverse range of options.
Voice Cloning Available: No, voice cloning is not available on Listnr.
Unique Features: Listnr offers short free outputs for users who want to test the platform before subscribing to a paid plan.
Ideal User: Listnr is an excellent choice for users seeking an affordable AI text-to-speech tool with a variety of pricing plans to suit different needs.
Learn more at Listnr.tech.
Play.HT is an AI text-to-speech tool specifically designed for bloggers who want to embed audio versions of their content and podcasters seeking an all-in-one solution for hosting their audio and reviewing analytics. With a vast array of languages and voices, Play.HT provides an excellent platform for content creators.
Play.HT Pricing: Play.HT offers various pricing plans to suit different needs:
- Professional plan: $29.95/month
- Premium plan: $49.50/month
- Custom pricing for Enterprise plan
Play.HT Free Plan: The free plan includes 5,000 words, allowing users to test the platform before committing to a paid plan.
Play.HT API: Yes, Play.HT offers an API for developers who want to integrate its text-to-speech capabilities into their applications.
Play.HT Trust Pilot Rating: Play.HT has a solid Trust Pilot rating of 4.1 out of 73 reviews, reflecting good user satisfaction and reliability.
Languages and Voices: Play.HT supports an impressive 120+ languages, claiming to support every language in the world, and offers 917 voices for users to choose from.
Voice Cloning Available: Yes, voice cloning is available on Play.HT.
- Voice redo button allows users to quickly generate variations without manual tweaking
- Built for seamless integration with blogs
- Transcribes articles using their WordPress plugin, enabling audio for written content
- Offers .Wav file output
- Zapier integration coming soon
- The only platform that shares analytics for audio and podcast hosting
Ideal User: Play.HT is perfect for bloggers who want to add audio versions of their content and podcasters seeking a comprehensive solution for hosting their audio and accessing analytics.
Learn more at Play.ht.
Genny (Formerly Lovo)
Genny, previously known as Lovo, is a versatile AI text-to-speech tool designed for users who desire a high degree of control over their audio and are willing to navigate a slight learning curve. Genny offers a comprehensive range of customization options and is comparable to Murf in video creation capabilities.
Genny Pricing: Genny offers a $27/month plan, which includes AI-generated voiceovers. They also provide a more comprehensive plan at $52/month that includes AI-generated video avatars in addition to audio voiceovers.
Genny Free Plan: Genny's free plan offers 20 minutes of voice generation, giving users a chance to test the platform before committing to a paid plan.
Genny API: Yes, Genny has an API for developers who want to integrate its text-to-speech capabilities into their applications.
Genny Trust Pilot Rating: Genny has an impressive Trust Pilot rating of 4.3, based on 64 reviews, reflecting its reliability and user satisfaction.
Languages and Voices: Genny supports over 100 languages and boasts a collection of more than 400 voices, offering a diverse range of options for users.
Voice Cloning Available: Yes, Genny provides voice cloning capabilities for users who want to replicate specific voices.
Drawbacks: The addition of emotions to the voices can sometimes result in unusable outputs, and there is a bit of a learning curve for new users.
Unique Features: Genny stands out for its highly customizable voices and its similarity to Murf in video creation features. It also supports .Wav output format for audio files.
Ideal User: Genny is perfect for individuals who want extensive control over audio customization and are willing to tackle a slight learning curve.
Learn more at Lovo.Ai.
Resemble.AI is an AI text-to-speech tool that offers a pay-as-you-go pricing model, making it a great option for users who prefer flexibility in their payment plans. With excellent customization features, Resemble.AI caters to users who want control over their audio content.
Resemble.AI Pricing: Resemble.AI operates on a pay-as-you-go pricing model, charging $0.006 per second. Pro plan pricing is not publicly available, and users need to schedule a demo to access it.
Resemble.AI Free Plan: 300 second trial with no credit card required.
Resemble.AI API: Yes, Resemble.AI offers an API for developers who want to integrate its text-to-speech capabilities into their applications.
Languages and Voices: Resemble.AI supports at least 35 languages, but information regarding the number of voices available could not be found.
Voice Cloning Available: Yes, voice cloning is available on Resemble.AI.
- Longest time to complete an export among the listed tools
- Pro plan features require signing up for a 30-minute demo
Ideal User: Resemble.AI is best suited for users who want a pay-as-you-go pricing model and require excellent customization features for their audio content.
Learn more at Resemble.Ai.
Big Speak AI
BigSpeak AI is an AI text-to-speech tool that offers a great free plan, providing 8,000 characters per month for text-to-speech conversion and 60 minutes of AI audio transcription. However, it appears to be using Listnr's API, and its customization features and language support are limited.
BigSpeak AI Pricing: BigSpeak AI has two pricing plans:
- Free plan
- Premium plan: $49/month
BigSpeak AI Free Plan: The free plan offers 8,000 characters per month for text-to-speech and 60 minutes of AI audio transcription, making it an attractive option for users who want to test the platform without any financial commitment.
BigSpeak AI API: No, BigSpeak AI does not offer an API.
Languages and Voices: Information about the number of languages and voices supported by BigSpeak AI could not be found.
Voice Cloning Available: Yes, voice cloning is available on BigSpeak AI.
BigSpeak AI Trust Pilot Rating: BigSpeak AI has an average Trust Pilot rating of 4 stars from 3 reviews, indicating satisfactory user experience.
- MP3-only export option
- No API available
- Appears to be using Listnr's API, with identical voices
Ideal User: BigSpeak AI may be suitable for users who want to test a text-to-speech platform without any financial commitment, but its limitations make it less recommended for more demanding use cases.
Learn more at BigSpeak.Ai.
Blakify is an AI text-to-speech tool that offers a larger than average selection of voices. While its pricing may not be as competitive as other tools on this list, it does provide users with full commercial rights, unlimited storage, and various plan options.
Blakify Pricing: Blakify has three pricing plans to choose from:
- Lite: $29.99/month (1 million characters limit)
- Pro: $59.99/month (5 million characters limit)
- Elite: $99.99/month (unlimited characters)
All plans include full commercial rights, unlimited storage, standard and neural voices, and the ability to cancel at any time.
Blakify Free Plan: Blakify offers a free 3-day demo trial when a card is added, allowing users to try the platform before committing to a paid plan.
Blakify API: No, Blakify does not offer an API.
Languages and Voices: Blakify supports 65 languages and offers more than 700 voices, providing users with a wide range of options to choose from.
Voice Cloning Available: No, voice cloning is not available on Blakify.
Blakify Trust Pilot Rating: There are no Trust Pilot reviews available for Blakify at the moment.
- MP3-only export option
- Pricing may not be as competitive as other tools on this list
Unique Features: Blakify offers a larger than average selection of voices for users to choose from.
Ideal User: Blakify is a suitable choice for users who value a wide range of voice options and are willing to pay a premium for the additional features provided by the platform.
Learn more at Blakify.com.
WellsaidLabs is an AI text-to-speech tool that offers a free plan but with limitations on file downloads. With higher pricing compared to other tools on this list and a good level of customization, WellsaidLabs may not be the ideal choice for users seeking more features and better value for their money.
WellsaidLabs Pricing: WellsaidLabs offers three pricing plans:
WellsaidLabs Free Plan: WellsaidLabs does offer a free plan, but users cannot download files, limiting its usefulness.
WellsaidLabs API: Yes, WellsaidLabs offers an API for developers who want to integrate its text-to-speech capabilities into their applications.
Languages and Voices: WellsaidLabs supports 15 languages and offers 100 voices for users to choose from.
Voice Cloning Available: Yes, voice cloning is available on WellsaidLabs.
WellsaidLabs Trust Pilot Rating: WellsaidLabs has a below-average Trust Pilot rating of 2.7 out of 7 reviews, indicating potential improvements in user satisfaction.
- Too expensive for the features and quality offered
- Free plan does not allow file downloads
Ideal User: WellsaidLabs may not be the ideal choice for users seeking better value and more features in their AI text-to-speech tool.
Learn more at WellSaidLabs.com.
ElevenLabs is an AI text-to-speech tool that offers a fantastic free plan with up to 10,000 characters per month but without commercial rights. With excellent customization features and affordable pricing, ElevenLabs is ideal for users who want to create unique voices from scratch or generate celebrity deep fakes.
ElevenLabs Pricing: There are five plans available at $5, $22, $99, $330, or a custom price for Enterprise.
The plans offer different features, such as character limits, custom voices, commercial licenses, instant voice cloning, and access to APIs. The Creator and Enterprise plans are intended for content creators and businesses that require more advanced features and support.
ElevenLabs Free Plan: The free plan allows up to 10,000 characters per month, but users cannot use the generated audio for commercial purposes.
ElevenLabs API: Yes, ElevenLabs offers an API for developers who want to integrate its text-to-speech capabilities into their applications.
Languages and Voices: Information about the number of languages and pre-built voices supported by ElevenLabs could not be found.
Voice Cloning Available: Yes, voice cloning is available on ElevenLabs.
ElevenLabs Trust Pilot Rating: ElevenLabs has an average Trust Pilot rating of 3.6 out of 2 reviews, indicating that users have had satisfactory experiences with the platform.
- Very few pre-built voices
- Still in beta, so some features for controlling sound and pace are not yet available
Unique Features: ElevenLabs allows users to create completely unique custom voices and offers extremely high-quality celebrity deep fakes.
Ideal User: ElevenLabs is perfect for users who want to build their own unique voices from scratch or create high-quality celebrity deep fakes, as well as those who require excellent customization features at an affordable price.
Learn more at elevenlabs.io.
What Do We Recommend?
As you can likely tell from the fact that all the audio in these experiments was hosted with it, we use and recommend Play.ht to creators who are very serious about the quality of their voice overs.
However, it isn't the cheapest option.
Text to speech has been around since as early as 1968 but the technology has advanced significantly since then.
The latest AI text-to-speech technology can do much more than simply say the words we tell it to. Now, it can recognize speech patterns, interpret them and create coherent responses.
It is not the same as regular text to speech tools because it can create its own creative answers from the input we provide it.
AI technology uses something called phonemes to form words and sentences, and it relies on intonation, accentuation, and pronunciation to create coherent and meaningful conversations in tones and accents that seem close to human.
AI voice text-to-speech uses machine learning technology to learn and adapt according to new data it receives. As the database of real voice actors expands, AI text-to-speech technology can create new and unique speech that is unique to the brand that uses it.
There are many commercial uses of AI text-to-voice software, some of which are:
- Entertainment industry:AI can create voice-overs for movies, TV shows, and commercials. Also, YouTubers can use it to create faceless YouTube channels or just add a narrator to their videos.
- Call centers and customer service: Although it's not idea, we can use AI to provide automated responses to general customer inquiries.
- Video games: A much more affordable voice-over solution for for characters and in game narration.
- Education: AI voice overs can be used to provide audio content for online courses and textbooks.
- Repurposing online articles into audio.
- Podcasts and audiobooks: AI text-to-voice software can be used to create audio content for podcasts and audiobooks.
- Navigation systems: Although not many people are making their own navigation assistants, this is a classic case use for AI text.
Overall, AI text-to-voice software has a wide range of commercial applications, and its use is growing rapidly in various industries.
Human narrators absolutely have a place in 2023 and beyond. Yes, they will be splitting more and more marketshare each year to these new AI programs, but many people still prefer to pay the premium price of hiring a professional narrator.
Although AI text to voice tools are powerful and can produce lifelike results, they still require time and effort, especially if you're creating long form narrations like you might have with something like an Audiobook.
Viva la human narrator!
You might have noticed I mentioned which programs had APIs. This means that the company lets other developers use their code in their projects. Essentially, if you wanted to make a software product that integrated text to voice to functionality in the same way as say, Play.HT, you could do that. It's not cheap and there are usually pretty long lists of what you can and can't do, but they are very exciting if you have a good idea for how to use them.
You'll notice a few omissions from this list that might have been on similar ones across the internet.
Descript wasn't include because it just isn't a good solution for straight text to audio. It's fantastic for changing your own voice and "overdubbing" but it is not a valid solution for straight text to voice. I'm somewhat surprised that so many people seem to promote it as if it is good for text to audio on its own. It is a great tool, and I recommend it for other things, but not for this.
A couple other tools that I have seen mentioned but didn't add were Respeaker and Respeecher: These tools both required setting up demos to use. Sorry, but no.