Disclaimer: Some people have hearing loss and may find that one specific TTS voice suits their unique needs better. The following post is a general commentary about my experience with text-to-speech technology.
🌟 Hello Fellow Apple Users! 🌟
I've been navigating the world of text-to-speech for over 35 years and recently had an epiphany I just had to share with you all. After using the ChatGPT iPhone app, I found myself wondering: Is this the most realistic voice I've ever encountered, or does the content make it feel that way?
💡 My hypothesis? It's a beautiful symphony of both. 🎵
Advanced TTS algorithms have come a long way, and the voice in this app captures the subtleties of human speech like never before. On the other hand, the compelling and thoughtful content adds a layer of realism that goes beyond mere sound. It's as if technology and substance have joined hands to dance, creating an experience that feels truly magical.
✨ So, let's open up the conversation! What do you think of this voice? Have you felt the magic too? I'm eager to hear your thoughts! 💬
Comments
ElevenLabs
Honestly, i still feel like ElevenLabs is a lot more realistic than chat GPT's tts.
Charlotte
While I agree the Chat GPT voices are pretty impressive and would have been mind blowing if it wasn’t for eleven labs. If you would like to get an idea of how good eleven labs voices can be, try The app Dolores. In there are voice demo’s for maybe 15 female voices. Some of them basically sound real or at least real enough that if you aren’t listening closely, you might not notice they are completely generated. Using those voices gets crazy expensive so I wouldn’t recommend it but the demo’s give a good idea and I think they are generous enough to give you something like 1000 characters before they would like you to start re-mortgaging your home. Very generous indeed. lol. Anyway, the voices are great. I’d be interested to hear what you think.
How did you do that?
Sorry for the stupid question, but I don't know how to test the synthetic voices of G PT, I didn't even know they had their own voices
Charlotte.
Hey, yep. I know what you mean about the Chat GPT voices. Sky especially really is great and just once or twice it kind of did feel like I was talking with a human.
Ricardo.
The very last element on the screen is switch to voice mode. Tap that and you’ll be talking with Chat GPT. I think you might need to subscribe to plus though.
Haha
They’ll never turn me into paper clips. If they try, I’ll release my grey goo and escape in the confusion over dystopian nightmares.
I Don't Mind Human-Like Voices ...
What I don't want to hear is a voice so lifelike that we really think it is a person. That is, of course, personal preference. I don't want to ever forget that I'm talking to, as one of my favorite machines would say, an "pverweight glob of grease". Couldn't resist the Star Wars reference.
Some Thoughts
Hey, this is a great topic. I am of the opinion that open AI's voice tech is really nothing to write home about. When trying it, I was frequently taken out of the experience by unnatural pausing and weird intonation and pitch problems as the language model was catching up with the TTS.
Eleven has their own set of issues, the voices frequently tend to hallucinate, that is, synthesize info that was not actually there. The reason this is happening is because of the architecture, I suspect they're using a transformer under the hood, which is similar to how most GPT models work. Many deep learning based TTS systems use some variation of a spectrogram generator and vocoder to produce speech, and don't have as many of these issues.
Over the past year or so I have gotten more involved in creating custom TTS voices and I'm working with several people on optimizing these for screen reader usage. Everything is open source, but as of now only runs on Windows with NVDA. Here are some links if you would like to learn more or try this out for yourself. Piper, the framework powering this: https://github.com/rhasspy/piper
The Piper NVDA AddOn: https://github.com/mush42/piper-nvda
Synthesized breathing is creepy enough.
Of course, I am talking about my true love, Alex. The first time I heard him take a breath, I got chills. Cannot even imagine talking to something with such a realistic and natural sounding voice, that I do not know I am talking with an AI or whatever.
I want to be blown away.
I want to have a chat with a machine that can actually trick you into thinking it's human. It would be fasinating.
I tried the piper TTs and while I wouldn't use it, it's interesting.
Don't try typing with some of the UK voices though. Have you ever wanted h to sound like faaa? Well now you can! With the new, I forgot the name, voice!
This stuff is interesting but I think I'd prefer eliqwunce as it can use propper internation, once another synth can do that I'll be interested. I know ESpeak can but I don't like the voices.
I'll be using IBM for as long as it's availible.
I have never felt an…
I have never felt an emotional connection with a text-to-speech synthesized voice, not even the adorable sounding ones. There's a couple of Microsoft voices in Edge that sound adorable. I don't remember her name, but she sounds like a young child. But it still sounds wrong. You can still tell they aren't people. Maybe if you didn't know beforehand, a voice is synthesized. For example, if someone were to tell you, have a listen to this voice, and then tell me if it sounds human or not, then maybe you might have some sort of reaction. But I wouldn't say a synthesized voice resonates with me. That's probably taking it a bit far.
Same.
I'm sure my friend Amin would choose to have sex with some of the voices if he could. But I'm good.
There is an old thread...
There's an old thread on AppleVis where several of us were debating over the Kindle app on iOS with VoiceOver vs using the Alexa's built-in Kindle Assistive Reader. Personally, I really enjoy a good Ebook via the Alexa app on my iPhone with the assistive reader going. Lets me still "use" my phone for various things while the book is going, etc. Just a personal preference. 🤷🏻♂️
There are chatbots as well, some of those are just text, and it makes you wonder just who or what you are texting with.
I have never heard a synthesized voice and thought it sounded real. However, there was a news interview some years ago where they interviewed Susan Bennett, the woman who was the voice model for "Samantha/Siri".
For those who are not familiar, Siri back in its beginnings, used an older version of Samantha, and it was a lot more monotone than it is nowadays.
Listening to the interview, especially when Ms. Bennett purposefully went monotone, I had a time of it not believing the news interviewer wasn't just playing around with Siri.
This interview took place on October of 2013, a month after the iPhone 5S launched. The first iPhone to use a fingerprint ID.
CNN Interview with Susan Bennett.
For me, it's more about how…
For me, it's more about how something is written than a synthesized voice.
Obviously, there is a limit to what one can tolerate; audio description from a certain streaming service comes to mind.
Don't get me wrong, I've heard some well-done TTS described content, but not when it's compressed like that.
Most of it is about how something is written, not because of the synthesizer someone might choose.
RE: Boys
I get what you're saying. The fact is if I'm interacting with a computer, I want to know that I am interacting with a computer. If a voice is too lifelike, I find it disturbing that we may not be able to tell the difference.
Where is the voice?
I don't find the voice on the Chat GPT iPhone app. Am I missing something?
And, I much prefer AI that speaks to me, I find it easier to consume information through audio anyway.
Thanks for your assistance!
David
Same here
When I asked ChatGPT itself, it first told me stuff like that I had a gear icon in the top right corner and had to tap it so I could go to the app's settings and toggle the Text-to-Speech switch to on, none of which I found anywhere. Then I requested a more screen-reader optimized description to which it responded that it did not have the answer as it was only trained with data valid up to January 2022. When it told me that the TTS feature had not yet been implemented at that time, I had to ask it how it knew how I could enable TTS and provided me with all those instructions, to which it could only respond by apologizing. Then we had a long chat where I kept on refuting all the other nonsense it said. I might share the link to that mentioned chat.
Each AI Chat seems to Have the same problem
I have a keyboard called the Hexgears X5 Mechanical Keyboard that you are supposed to be able to connect with your computer using a USB 2.4 dongle. However, my home computer would not recognize the connection. I asked Google's AI called Bard, the Bing AI, the Chat GPT AI, and one called DeepAI how to fix it. They all seem to draw from the same well of information as each one inevitably gave me different instructions each time I asked. Several times it told me to do things using key combinations with keys that literally do not exist on this keyboard. Rather than questioning their parentage, I simply shrugged and moved on to the next unhelpful bot, but it gets very annoying when they give you bad information time after time. I had hoped they would be able to retain information from when I corrected them about non-existent keys, but no luck.
I love elevenLabs, but some ChatGPT plus voices are superior
Elevenlabs
Elevenlabs is amazing and I've created some pretty incredible things with it. Here's an example of something I just whipped up to demonstrate. This is Stephen Fry reading this fun little poem.
The problem with Elevenlabs is that it's intended for long-form text. It really struggles with shorter conversational text, which is what is often needed by ChatGPT. Additionally, unless you stick with the default settings, which makes the voice sound really boring and bland, you really need to give everything a listen and regenerate audio to get the right sound. It also takes significantly longer to generate the recording than ChatGPT's tts engine does. I'm not trying to say Elevenlabs sucks. I'm really not. I loooooove how amazingly realistic this Stephen Fry voice sounds. I've made other amazing voices that at least to me, sound equally mind-blowing. I also love how it can speak other languages and sound really natural doing it.
ChatGPT
ChatGPT's tts engine is amazing for entirely different reasons, but it's held back by the voices provided. there's only one voice I like out of the provided ones, and that's Ember, but Ember only works well for specific personality types / characters. Skye is okay, but something about it rubs me the wrong way. Just about all the voices sound like they were recorded by people using gaming headsets, and as somebody mentioned earlier, they tend to make up text... typically in the form of "um..." and "uh..." That being said, Ember, at least, sounds very real to me and is able to nail the conversational role ChatGPT needs it to play. Once they can find better voice actors that can play different roles and accents, this engine will shine. It's fast, it's high quality, and it knows how to use inflection to make voices sound real. Oh yeah, it can also speak other languages really, really well, though it does still have a slight american accent.
When I get home from class tonight, I'll make a quick demo that I can share on here, since I know most people don't have a Plus subscription.
That demonstration was staggering.
To borrow a phrase from singer girl. The subject line says it all. Wow, I can’t help thinking of Oliver who was looking for a way of consuming poetry only a few days ago. Having Stephen Fry read it all wouldn’t be a bad start.
Wow!
That was impressive.
Also, Netscape?! 😹😹
Elevenlabs is pricy!
yeah. Elevenlabs is far from cheap. the subscription I'm on gives me 100000 characters, which elevenlabs claims is about two hours of audio, per month. this costs $20/month here in the US.
Here's their pricing page they do have a cheaper plan for $5 that allows for 30000 characters per month. I think everybody should grab that one for at least a month, because you get 80% off your first month, so end up only paying $1. Using this thing is so much fun and you can clone just about any voice with three-five minutes of audio.
I'm not home from class yet, but I should be in an hour or two, so I'll get that ChatGPT voice chat demo recorded soon!
Demo
Got to admit, that reading was spectacular. I'm not a fan of the breathing in computer voices--sounds off and fake, and I'm not sure how that would sound at the rate I run my screen readers, but it was pleasurable to listen to.
On the other hand, I've heard living person recordings of poetry and literature that had crazy sounding breaths because it was dubbed together--probably to chop out mistakes. I also had a professor in a poetry writing course who paused at every comma for longer than a full stop or a line. It drove everyone in the class crazy. I hope Oliver finds a solution like this demo.
🐻
ChatGPT Voice Demo
Hi all. As promised, here is the ChatGPT voice demo. Since my original demo was Christmas themed, I figured I just had to make this one Christmas themed as well. Sorry to those of you who don't celebrate Christmas! :)