Hello guys!
Just wanted to provide a quick update about Chat Gpt advance voice mode since it seems to be kind of forgotten and open AI was too quiet for a while.
So I saw a post yesterday on Instagram, saying that they were finally! And I say finally! Rolling out the advance voice mode for plus users and team users. However, the post also mentioned that it was not gonna be available in the United States, United Kingdom, Switzerland, and other countries that I can’t remember at the current moment when writing this post.
So I felt very disappointed once more with open ai. but I went to my app, I updated it, and now I have access to the advanced voice mode!
So I did not understand why they said it was not gonna be available in the United States. But I’m very happy and now I have access to the advanced voice mode. So check your devices if you’re still a plus subscriber because you might have access now.
It was a little tricky. Because at first it gave me an introduction that advanced voice mode was here, had me select a voice, And then when I continued, it just continued with the regular voice mode that we all know. So I close the app and I reopened it again and there it was.
But it is
Limited it gives you about two hours of talking time or something like to be able to be used.
I think it’s really cool. I can see that the communication is definitely more flawless. But to be honest, I don’t know why we waited so long for it. I feel it’s a very useful thing, but I don’t see all the hype that was behind.
Of course This is my personal opinion. And I was very excited to have it, but Maybe I just need to play with it more. Or maybe it’s just the fact that we had waited so long for it. it is still very limited and also, you can do some of the things that were shown in the video, but know way near all the stuff they showed in the regular keynote presentation, and also the Accessibility video from Andy.
But nevertheless, I am happy to try it and just wanted to make this post to make you guys aware to see if anybody else has had the chance to update and try it out?
Comments
the post I saw said area in…
the post I saw said area in the EU, not the US.
Advance voice mode in the United States
Here I share the official post that I saw yesterday from open ai’s official account, where it says it will not be available in the whole United States, yet I was able to have access. So I was so confused.
https://www.instagram.com/reel/DATv3qQyKfc/?igsh=MXg3cHB6eDdxZGZqdQ==
I'm using a VPN to access it…
I'm using a VPN to access it in the states. I'm in the UK.
I do find it a bit frustrating it's a low sample rate voice. PI AI has a much cleaner, fresher voice going on.
No video or screen sharing!
I finally got it, but all it does is talk! It can’t see anything, which feels a bit disappointing. It hasn’t even tried to be charming! As they jokingly say on The Verge Cast—if it doesn’t make an effort to impress you, is it really AI?
Yeah, the vision version…
Yeah, the vision version that was demoed is a different update which we're still waiting for.
It does also seem dialed back. it can't sing, for example, or tell between people's voices. It is certainly faster, and more expressive, but not quite the product we saw in march, or whenever it was.
Advanced voice mode quality worse than before
The voice with the advanced voice mode is significantly worse than before for the following reasons
I have used the Juniper boys so all my comparisons now and before our juniper.
First, is Fidelity regression. The advanced voice mode Juniper voice Fidelity is so much poorer and sounds very very lossy format compared to how Juniper used to sound crisp and clear before with much better fidelity
Second, tonal quality regression. Juniper‘s voice is now huskier with just a noticeable hint of an English or Australian accent. Either way the tonal quality, which is different from Fidelity, has degraded and now Juniper is so much less pleasing to the ears
lThird, Juniper speaks so much slower. the earliest juniper voice had just perfect speech rate and this new one speaks so much slower. there is no live video feature like the spring world presentation of open AI nor is this advanced mode remotely close to the giggling flirtatious very expressive life like voice demonstrated in spring. That particular live video in and live microphone input with the giggling in flirtatious AI spring presentation voice is expected to reach all ChatGPT plus subscribers including in the US at least, by the end of fall which would be December 21.
Quick Reminder of Forum Guidelines
Hi all, In light of a couple comments, I wanted to take a moment to remind everyone of our posting guidelines. In particular, please remember that AppleVis caters to people of all ages and ability levels; thus, we ask that people please carefully consider their word choice and not use language that could be considered offensive or discuss topics which may not be appropriate for all audiences.
I wonder if the voice…
I wonder if the voice quality is degraded based on demand. I think contention is probably quite high, new toy and all that.
I've not noticed the voice quality getting worse, it's always been lo low fi on iPhone but seems much better on mac.
I might get better performance when it comes to the UK, which I hope is soon. I'm paying the same as US subscribers so I should get the same product. Scaling of such computing power can be difficult though.
Voice?
Hi guys, lucky you... You were complaining about its voice but here in Italy, at least for me, there is nothing new in the app. Lucky you guys!
Loving it.
It does seem that with increased demand we don't get us much access to it. I think that this will change over the next week/couple weeks when the hype has worn off. I absolutely love the voices and don't notice anything low quality about them, no breaking up or robotic sounding voices. Juniper is as good as ever but right now I go between using Maple and soul. Even without the advanced mode when my time has run out soul is so much nicer to use with Basic voice mode
Multimodality?
anyone getting the multi-modal features aka the vision capabilities etc?
No, they've not been…
No, they've not been released, neither have there been further details of when they will be released.
Chat GPT steps down the…
Chat GPT steps down the sampel rate when voice mode opens. You can hear it if you have a screen reader running. It becomes something more like a telephone signal, losing top end. I'm guessing it is an 8 k or 16 k sample rate. It's okay, it's just not very good when you compare to other voices from AI. I guess it is to minimise bandwidth.
Re: Multimodality
As Ollie says, no word on this at all yet. I wonder if they have found it hard to implement or to scale. No competitor has released anything similar either so it seems to be a tough one to get working.
Meta mentioned something…
Meta mentioned something about it coming to the rayban glasses later this year, though I'm not sure what form that would take. I think it basically just takes a picture every second and then processes it in the usual way. In theory, unless you make an enquiry regarding the image at the time they remain redundant until processed. IE, it will film storing 60 images per minute, but if one is silent, there is nothing processed, only when there is a quiry are the images sent for processing... At least, that is how I asume it works. I have noticed image processing is quite slow on the standard GPT though taking up to twenty seconds, which will, of course, not be fast enough for live commentary.
Here is an article…
Here is an article mentioning the live video AI coming to Meta AI later this year:
https://techcrunch.com/2024/09/25/meta-updates-ray-ban-smart-glasses-with-real-time-ai-video-reminders-and-qr-code-scanning/
Again, I don't know how well it will work for us considering how the current version isn't very good at reading things, or giving specific details we actually need. It's a shame as a fairly simple tweak, telling Lama the user is blind and to adjust descriptions accordingly, seems it could be fairly easy.
Not anywhere near an audio engineer but my voice quality is quit
what someone has said above, The distortion when starting a chat I hear quite often on people using chat GPT free. I use it on mine the pro version, it is quite clear and comparable to any of the voices you would hear when using the reader app for example. I would upload an example of me using it and let you hear the voice quality but I don't think it's possible to upload a clip here
New Voices
I'm getting a message in my ChatGPT app that says it's starting to roll out to all users but I currently don't have access. In particular, I'm interested in the new voices but even these have not come to my app. Very disappointing! I keep uninstalling and reinstalling with no luck.
the voices are compressed on…
the voices are compressed on plus too. If you try pi ai, or even just listen to the clarity of siri, it is far higher than that of chat gpt. It's just narrow bandwith, you can hear voiceover stepping down too. I think it might also be down to listening as something similar happens when recording on other apps. Maybe it's just using one speaker too.
So what's the advantage of being Plus
Guys, I have another post asking this, but so at this moment what's the benefit of having the premium suscription against the free one?
Thanks
Meta
If I'm not wrong, Zuck did show off a bit of multimodality in his live demo didn't he? and the capabilities of the AI in the meta glasses will most likely get a boost once the newer lama models are onboarded. And as it is, I don't know if Meta'd be specifically limiting the responses of the AI in the glasses given the Lama models are open source anyway.
Karina, I think...
You might get more goes, of all models. I know I just used o1 preview, not sure if you get that free. Also, I think AVM is only for plus ATM. I hope there is more I've forgotten.