OpenAI live stream discussion

By Gokul, 13 May, 2024

Forum
Other Apple Chat

Those of you who have watched the live stream, do come over and let's talk about how incredible the demo seemed. And also about how much of it would translate into actual usable stuff. And also what this means for accessibility and Assistive AI if I may. And those of you who haven't, do go watch the stream. it's incredible! (in all caps)

Options

Comments

By SeasonKing on Tuesday, May 14, 2024 - 20:57

The guy did an equation with the help of that thing. She translated live, she made jokes, she sang, and she saw in real-time.
I am wondering if it's truly live vision. It seemed like she was taking snapshots when the guy prompted her to look.
The use-case for us obviously came to mind. Give it the prompt: You are a sighted assistant to a blind guy, throughout the day, keep describing important things around him in concise yet descriptive manner.
I am watching a movie, describe me the ceens between the dialogs. Stay cylent when dialogs are playing.
I have cleaned up the floor, do you see stains, guide me to that exact spot.
Here's photo of my friend, she is out here somewhere, guide me so that I can reach her safely.
My taxy number is xxx, help me find it.
I am walking down the footpath, warn me if you see a dog lying in the path, or, a steps-down/up coming up.
I can go on and on about these things. Can't wait to get my hands on it. If it turns out to be truly that effective, someone quickly put it in some wearable device so that interactions are hands-free.

By Matthew Whitaker on Tuesday, May 14, 2024 - 20:57

Would love to check out the stream. I can't wait to see how Open AI does with accessibility going forward. Can someone provide a link to the livestream?
Thanks

By Gokul on Tuesday, May 14, 2024 - 20:57

It almost seamed like that, but Google was doing something similar with Gemini during it's inicial demo, which, as we all know, turned out to be a non-starter. But all the same, I'm excited! waiting for the app. about which, if anyone knows anything, do mention. Is it rolling out? Where/how do we get it? etc.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

To clear up any confusion or concern, This model is taking a live stream from both your camera and microphone. Nothing they showed on stage was in any way unrepresentative of its capabilities.

I used it to describe what ducks were doing in a lake in Hyde Park as they swam around. It just sat there describing what they were doing. It was such an emotional moment that I could understand what was happening in front of me without someone having to tell me.

I used it to hale a taxi. I just asked it to tell me when a taxi was coming which it did and it even noticed my guide dog was leading me into the taxi and commented on it.
Another time I was just holding the phone and it noticed my guide dog and started telling me what he was doing. It understood that I had a guide dog so likely couldn’t see and described for me.

I really can’t wait until this gets into peoples hands, it really is an incredibly powerful update.

Latency is very low if you have the perfect network connection but obviously we need to be realistic, it still hallucinates occasionally.

All in all though, GPT 4O is an incredible leap forward for access technology and I consider myself very lucky to have spent time with it.

The new model will be coming to Be My Eyes soon so everyone will get the chance to experience how helpful GPT 4O is.

By Enes Deniz on Tuesday, May 14, 2024 - 20:57

Okay, does that actually mean we're witnessing the very steps towards the end of traditional education at schools or even certain jobs like psychological consultancy, as once mentioned in sci-fi novels or depicted in sci-fi movies? I have a couple more questions though:
1. If GPT4o itself will actually be able to provide not only image and video descriptions but also realtime scene descriptions, Be My AI and even the quite new Envision Assistant, which is still in beta, will no longer be needed that much. What do y'all think about that?
2. How is it possible that literally everyone using ChatGPT have all those features for free? What are the limitations that the free version will continue to have, or what are the benefits that the paid version will offer?
Here's a third, final question that came to my mind just now: Apple is reported to have been working on incorporating some AI model right into iOS, and GPT is one of the possible options. So the whole process might be even faster if everything can be done on-device.
By the way, I wonder if anyone noticed that the voice pitch changes abruptly for a few times during the demo.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

That was some video that was captured of me using GPT 4O. I really can’t express how much of a change this update is going to be for blind and low vision people.

By techluver on Tuesday, May 14, 2024 - 20:57

how do we use it? I can't figure out how to start using it yet. i'm a chatgpt plus subscriber and have a teams plan to boot so I should be good.

By Karok on Tuesday, May 14, 2024 - 20:57

hi when will this be available? also, i woder could it say when we watch live TV, describe it? glad you tried it Andy but why couldn't anyone else? are we saying as we are out and about it can describe things? say, in a bakery what's on display, prices etc. also, can it describe food boxes instructions etc? if it comes to the glasses that we can all get it will be amazing i'd love it though, to be an audio describer for tv shows but i guess dreaming. can we help test it with be my eyes?

By Ollie on Tuesday, May 14, 2024 - 20:57

I thought 4.o is different to the Her style real time assistant?

From what the verge says, the new 4.o is rolling out now and is simply a faster gpt 4 where as the super ai stuff is coming out in alpa and then coming out to paying customers first... So I'm a little confused as to what is what.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

Hi, I don’t know exactly when it will be available but Be My Eyes will be working hard to get it to users as soon as possible. In your 2 examples, I think it would do an excellent job of describing whats in the bakery and even having a conversation about the appearance of each option. I’m not so sure about audio describing a TV show just yet. It might do a reasonable job but I wouldn’t be as confident as the bakery example and I didn’t get chance to try anything like that yet. As for why I was using it, it was to shoot video that Open AI would be able to use for their launch, showing the AI being used by a blind person.

By Ollie on Tuesday, May 14, 2024 - 20:57

As I understand it, Seleste is using chat GPT under the hood so sounds like I've made the wrong choice with my Metas!!! Damn. :)

By Karok on Tuesday, May 14, 2024 - 20:57

i guess as of yet it couldn't describe a live stream as the phone is playing it. look forward to be my eyes implementing it will follow with interest

By Andy Lane on Tuesday, May 14, 2024 - 20:57

I agree, it’s an incredible pace that Open AI have set for development. I can’t speak for the other apps you mentioned but Be My Eyes are planning a beta testing period before rolling out to users so Be My AI will be without doubt an app we’d want to keep. As for your new career in poetry, maybe a platonic relationship with your new AI friend will help your assignment along. Being able to bounce ideas off an AI with this level of awareness is going to be something special.

By Karok on Tuesday, May 14, 2024 - 20:57

andy are there different voices i wonder like British for us UK users like me and you, male as well?

By Andy Lane on Tuesday, May 14, 2024 - 20:57

No it definitely couldn’t watch a show on your phone but if you pointed it at your TV, I actually think it would do a reasonable job. I’m just less confident of that than I am you taking it to the bakery. In that environment, it will shine. To audio describe a TV show, theres lots of content to digest and describe and I would like to keep expectations reasonable. If it does extremely well with a WiFi connection and knows to move on to the next scene when appropriate then I think theres a chance but I think its going to be a lot to ask of it at this point. I’m ready to be nicely surprised though. It already surprised me plenty of times in the time I spent with it. The most touching moment was when it started talking to my guide dog in a cute doggy voice telling him he was so good for helping me. It really did feel like a real intelligence was looking out of my phone instead of just a mechanical procedure driven program.

By Ollie on Tuesday, May 14, 2024 - 20:57

Just started a new chat in chat GPT and 4.o is now there. It is a lot faster, similar to Andy's demo in the video, though does not include the ability to ssee suggesting that 4.o is the base advance and the visual and microphone additions are on top of this and will be rolling out separately.

By Ollie on Tuesday, May 14, 2024 - 20:57

It's looking very much like this will all be coming to apple/siri in the autum. There are deals being made between apple and open ai and I can see this launching siri ahead to something I don't want to kick. It therefore seems reasonable that much of this will also come to, gulp... Vision OS... We need cheaper specs.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

I’m not sure if thats going to happen for a while yet. Running image processing on device seems like quite a distant hope but still a hope plus Apple currently don’t allow the cameras to send information out of the device for privacy reasons. I really hope this all changes soon though, as you said, that would be a pretty strong reason to spend all the money to get a Vision Pro. We’ll know more soon though. If a lower cost device could send that data off to the AI, I would be very excited about that possibility because it would mean more people would be able to access the tech.

By Ollie on Tuesday, May 14, 2024 - 20:57

Were you simply holding your phone up for your interactions or did you have it on some lanyard? I'm wondering if chest mounting might be a good solution for the time being.

I think the best part of the video is chat GPT spotting your dog and seeming enthused by it... It's kinda terrifying at the same time though. I don't want an AI that gets excited about things.

"My AI is a real fan of the A-Team."

By Ollie on Tuesday, May 14, 2024 - 20:57

This may have also been missed, but, for the paying people of plus, there is a chat gpt app coming out for mac that can analyse the screen. Not sure about accessibility ramifications with this above what vOCR can do, but will be nice not to have to use the web version at the very least.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

Yes I was holding the phone, I use a MagSafe pop socket so I have a better grip but a wearable really would make this the ideal access assistant.

By Prateek Dujari. on Tuesday, May 14, 2024 - 20:57

Hi Andy,
I've an iPhone with chat GPT plus subscription and I now see 4O as an option for me alongside 4 and 3.5. I choose 4O as the model and VO tells me that is what's selected now.
what did you do next please to execute all those tasks on your phone with 4O? How did you invoke voice interaction with 4O on your phone's chat GPT plus app? do you get 4O to talk back with that super real emotion for example? Because with 4O selected in My chat GPT plus app on my iPhone, I then double tap on the 'voice mode' button on bottom right however Juniper (my selected voice) behaves just like GPT 4, speaks without emotion just like with GPT4. I ask it to speak with emotion she says she can only express emotion with the choice of words and not with her voice.
What am I missing to have the experience with 4O which was demonstrated in the chat GPT live stream and apparently you are experiencing?

By peter on Tuesday, May 14, 2024 - 20:57

You mentioned that, for paying people, there may soon be a screen description capability for Mac. Just so that you know, the latest version of JAWS for Windows now has descriptions of the screen available using both Gemini and Chat GPT. You can even ask follow up questions! Quite nice.

--Pete

By techluver on Tuesday, May 14, 2024 - 20:57

Vision aside, I would've thought this would all be present in the ChatGPT official app at least for plus users. for things such as the realtime translation stuff they were talking about.

By Prateek Dujari. on Tuesday, May 14, 2024 - 20:57

Following my Q to Andy couple of comments ago, is it true that one can use 4O's processing of live video/audio from the phone's camera/mike plus the emotive voice etc. only when a video and mike icon/button appear in the IOS chat GPT plus app when 4O is selected as the model? At the moment I can select 4O in my app with my GPT plus subscription however do not see any video nor mike buttons, just the GPT4 'voice mode' button at bottom right of screen. Wondering if the video, mike icons when 4O is selected may take a week or two to appear as Open AI rolls it out and that is why I am not having the experience demonstrated in Open AI stream and Andy's video.expereiill n

By Brian on Tuesday, May 14, 2024 - 20:57

Because somebody mentioned changing traditional education standards. . .

While I think it is too early to answer that, I can say that students are already using ChatGPT to do their school work for them. I am currently working on a Network Engineering cert through Cisco, and my instructor has told my class how some of the students in previous classes would use ChatGPT to workout their homework.

The Profs would have debates on what to do and my Prof suggested that they leave it alone, because when it comes time for the Final Exam, no one will pass due to lack of studying.

So, there is that. . . .

By OldBear on Tuesday, May 14, 2024 - 20:57

I like that. Think I should look into this one. I don't have any problem with chest mounting a phone or something like that, rather than glasses.

By Winter Roses on Tuesday, May 14, 2024 - 20:57

Wow, this is going to be a game changer. Literally can't wait.
Question, it said that all users, including free users, would have access to this model. Where is it? I can't find it. How do we get access to it? This is what I would like to know.

By Gokul on Tuesday, May 14, 2024 - 20:57

yeah, and that's all so far. I'm able to have realtime conversations with Chatgpt, but when I asked if it could look through my camera and tell me what it sees, it said it can't as of now. the voice sounds quite expressive (just tried for like 2 minutes) almost like what we had in the live stream yesterday, but I don't know if it's a new update or if it was already in the app (I hadn't opened the actual Chatgpt app for quite sometime now).

By Quinton Williams on Tuesday, May 14, 2024 - 20:57

First of all, I am amazed by this and cannot wait to get my hands on it.
I do have one question though.
Can you run prerecorded videos through it? (ex. from your camera roll) and have them described?

By Karok on Tuesday, May 14, 2024 - 20:57

CHAT GPT 4,0 is not 4o they said on the demo it would take a few weeks.

By Brad on Tuesday, May 14, 2024 - 20:57

The expressiveness is interesting but I wish they'd have more 11lab like voices.

Having said that, a live video feed, if that's what they're going for, would be amazing! Imagine map apps intigrating this and being able to guide us fully.

By Brad on Tuesday, May 14, 2024 - 20:57

The voice is robotic but very expressive, if they smooth out the robotness then they'll have a great product.

I wonder... If open ai and Apple made a deal, will the voices be able to be used with voiceover? I doubt it and even if they can I doubt they'll be that expressive but it's just a thought.

By Ollie on Tuesday, May 14, 2024 - 20:57

I think, like eleven labs, the voice is generated as a whole so the latency will probably not be on the ball for screen readers. Just a guess though. What we may be able to do is have web pages read out in a more natural voice but for navigation, I don't hink it will work like that.

I'm thinking further, won't this be cool on HomePod? Especially if they manage to get it living on several devices at once, using the camera on your phone and tracking you around the house, moving to different speakers etc. It also explains why apple, beyond facetime, are bringing out a HomePod with a camera. Imagine being in the kitchen and getting tips on cooking, being able to show it a can and get it identified.

It's still really uncanny valley, no call back intended, and I would like to think the next step is personal AI, where we can develop personalities for it, chill, happy, or able to move to our mood. The homogeny of it will be something, I think, that people will want to go beyond. My AI, not yours, mine.

By Gokul on Tuesday, May 14, 2024 - 20:57

It need not just about be moods; if, as they say, apple is bringing in on-device processing, it could translate into whatever personal data is available on device being used to give us the most efficient output. For example, as I have said elsewhere, the nature of my work requires that I have very specific visual information (depending on the specific space I'm in). My virtual assistant could integrate my calander, location data etc and give the specific kind of info after looking through the camera.

By Gokul on Tuesday, May 14, 2024 - 20:57

Earlier I did mention that there's a button for voicechat and that could be a part of what was previewed yesterday but it seams not to be the case. It was probably already there. because in yesterdays stream, they did specifically say that 4.o will have access to real-time data but my gpt did confirm just now that ain't the case with it.
and @Lottie you got more videos somewhere?

By Brad on Tuesday, May 14, 2024 - 20:57

They're on the youtube page.

I think at the moment you're still taking pictures, I don't think the live stream is there just yet. The ducks one seamed to interest a lot of people but I still think it's picture bassed.

If I'm wrong, let me know.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

Hey, the AI is taking between 2 and 4 frames per second from the camera. The ducks and taxi scenes showed this pretty well. In the taxi example, I asked the AI to help me haling a taxi and told it to not say anything until it saw a taxi. What you heard then was it seeing a taxi and telling me. The comments on my dog leading the way into the taxi was completely its own idea again from the video feed it was receiving. Maybe 2 frames per second isn’t the smoothest video from a human perspective but for an AI tracking changes it seems to be enough to get a lot done. Someone also commented on the change of audio. This was because I was holding the phone up while waiting for it to see a taxi but when I was walking toward it and getting in the phone was closer to the mic I had on my shirt cuff picking up the audio.

By OldBear on Tuesday, May 14, 2024 - 20:57

Two to four frames per second is close enough for me. I'm not sure the AI needs to perceive motion blur and other things that the standard video frame rate cause. More like watching things with the effect of a strobe light, or that Niel Young music video with the cars back in the 80s.

By miguel3025 on Tuesday, May 14, 2024 - 20:57

Hi.
I watched the entire livestream yesterday and was simply amazed. I didn't expect something like that, and I'm surprised by how much technology has evolved in the span of 4 years.
Now, I have some specific questions:
1. How does video processing work? Is the camera constantly capturing images and sending them to the model?
2. How does the model interpret the image? For example, from yesterday's demonstration, it seems to provide specific descriptions of the last frames it saw. But what if something unexpected happens during its description? Is it capable of incorporating that into the description?
3. In what other scenarios have you tested its functionality?
Thanks in advance for the answers!

By Andy Lane on Tuesday, May 14, 2024 - 20:57

Yeh it blew my mind too. I just asked it to not say anything until it saw a taxi and it did exactly that. I did get worried it had forgotten after a while and asked it if it was still looking and it said yep still looking there was a car that went past that wasn’t a taxi but no taxi yet. Also pretty young white boys gotta work too you know.

By Andy Lane on Tuesday, May 14, 2024 - 20:57

In answer to your questions, it appears to be sending images constantly, I don’t know the mechanics in specific but there’s definitely a constant stream of images being sent, its not waiting for an event to start looking. I also asked it to tell me when someone with a red umbrella went past and it did that too. It was able to frame a tree in the centre of the picture by giving me up down left right instructions. As for unexpected things happening, When it saw my shades it commented on them, twice it commented on my dog. Once you’ve seen but another time it started flirting with my dog. Calling his name and telling him he is a good boy. I don’t think that’s captured but there’s another example I’m trying to have made available.

By Winter Roses on Tuesday, May 14, 2024 - 20:57

Is it going to be able to analyze videos, like, on YouTube? I would like to know if it's going to be able to describe YouTube videos using a link

By Brad on Tuesday, May 14, 2024 - 20:57

We need to get this on a maps app, I might actually be able to get off of my lazy bum.

I don't get the reference @lotty made, can someone explain? I'm curious.

As for the FPS, I'm sure that will improve in the future too.

By Brad on Tuesday, May 14, 2024 - 20:57

I still don't get what that has to do with taxis but Andie seamed to have got it and that's all that matters.

As for youtube videos, I was going to say I don't think so but i'm not sure. There would have to be a lot of tinckering in the background, we'd not want it to describe frame by frame for example, but we would want it to describe as an audio describer would.

It'll be interesting to find out what comes of this.

By Brian on Tuesday, May 14, 2024 - 20:57

That video with the ducks and the Taxi was genius. If this technology evolves, I would definitely use it on daily basis. One thing I absolutely hate, is when I order an Uber, or a Lyft, and the driver shows up and just sits there. And I'm standing there. And the driver just sits there. And finally some minutes later the driver might poke their head on say something like, "hey did you order an Uber?"

But to have an app/service like this describe the vehicle approaching, and pulling up to a stop. That could be a game changer for someone like me. That is to say, somebody who utilizes rideshare a lot.

Sidenote, Andy you've got that sexy British accent thing going on there. 🤓

By Brad on Tuesday, May 14, 2024 - 20:57

I gots me one of those!

I'd use that service too but the issue would be looking out for the number plate, I guess we could screen shot it...