Those of you who have watched the live stream, do come over and let's talk about how incredible the demo seemed. And also about how much of it would translate into actual usable stuff. And also what this means for accessibility and Assistive AI if I may. And those of you who haven't, do go watch the stream. it's incredible! (in all caps)
Comments
AI in professional settings
While there may be situations where the carefully-considered, supervised use of AI can be beneficial for accessibility in a professional setting, AI alone is not, in my opinion, up to the task of meaningfully addressing unemployment among blind and low vision people for several reasons.
One is that AI models can and do hallucinate, meaning that at least for me, I am not comfortable making professional decisions based on AI-generated descriptions of required visual information that I otherwise can't access and verify. In addition, I would not want prospective employers to get the impression that AI tools are a replacement for presenting information in a natively accessible manner for the reason mentioned above. In fact, I believe that if such attitudes took hold among prospective employers, colleagues, and clients of blind and low vision professionals, it could potentially have the opposite effect of increasing access to employment and further exacerbate existing disparities. I know that I wouldn't want people's impressions of my qualifications to be tainted by their feelings toward AI or the limitations of such technologies.
Finally, the sharing of confidential data with an AI model, even if the developer's privacy policy states that it is not retained, could be potentially problematic in certain industries, like education, healthcare, and law.
Re: Release pushed back and AI in professional settings
Ollie, what source do you have on that?
Tyler, I agree with you on this one. Ther might be a few situations where AI could be useful, but generally speaking I don't see it having an impact, at least not in the short term. When AI becomes more of an integrated part of business applications and companies find safe ways to utilize it, then it might be different.
Next Few Months for Chat GPT Plus
https://help.openai.com/en/articles/8400625-voice-chat-faq
the important part is:
"GPT-4o real-time voice and vision will be rolling out to a limited Alpha for ChatGPT Plus users in a few weeks. It will be widely available for ChatGPT Plus users over the coming months."
to be fair, I don't think this is a delay, people just heard chat GPT plus and ignore the limited alpha. It's a long way out.
Patience vs Perseverance
Hey Ollie,
I've been saying the same thing for the past several days, though maybe not so bluntly. As for productivity in an educational/professional environment, I am all for it. Yet, I do not believe for one minute that AI is quite there. That does not mean I have any concern that it will never reach that milestone, rather I think everyone needs to realize, and understand, a fundamental fact; this is all essentially new technology.
We all want this to be a huge success, and we all have our reasons/needs/desires as to why, but consider that this may not reach any of your goals for months, perhaps even years from now. Everything we have read or listened to concerning AI and LLMs is, for lack of a better description, glorified media hype; designed to get everyone excited about a "what if" scenario.
So, praise and plead, insist and demand, and discuss and promote "your reasons" why AI will be all that it can be.
...but stop demanding a release date on something that is no where near ready to make all of your hopes and dreams come true.
Glorified media hype-pothisis
I have a bit of a disagreement there. Let me just illustrate it with a use-case. Some 3 months back, if you gave me a worksheet worth 500 rows of data and a tonne of graphs/charts/pivot tables on it, as a blind person, in most cases I wouldn't have been able to make much sense of it. If I were rather brilliand with numbers, and if I had a thing for visual representations of numbers/data etc, I would have had to struggle through the process, taking 10 times the time taken by my sighted counterpart. Today, I could input the same data in any of its forms into gpt 4o or claude opus and get any kind of analysis/insite in almost the same time as my above-mentioned sighted counterpart. I'm only limited by the kind of questions I can think of asking. What is more, with 4o, I can turn the same worksheet data into any visual representation I can imagine such as graphs/charts/whatever with quite a high degree of fidelity. I could then use the same representations to make compelling PPTs, again using AI, again with a reasonable degree of fidelity. The maximum I'd need to do is to ask someone sighted to take a look because of the "trust but verify" principle. And all this, I can do in almost the same time as my sighted counterpart employing the same technology. Did I think that I had the potential to be this productive 3 months ago? no. I'm looking forward to a time when AI can get the map input data and represent it to me in a form other than audio. And it ain't that difficult.
Blunt is sometimes the most…
Blunt is sometimes the most expedient mode of communication. :)
@Brian
I'm not sure AI will make any of my hopes and dreams come true, that usually involves unlimited wealth and magical powers...
One thing I wonder is how the chatting with AI will affect how people interact with actual people. I suspect you can be extremely rude and verbally cruel to the AI and it will continue to be happy-sounding and cheerful. I know some people who are already on the edge of being that way to actual people. Will the AI give them a swear jar to dump their abuse, or will it end up just getting them used to ripping up the frail, social contract that keeps us all from going at each other's throats?
That's great, Gokul
Do (you) have access to Chat gpt 4o? If I were to present (you) with said example of a chart/graph/etc, could (you), with accuracy, give me the relevant information I may happen to need, at any given time?
Can (you) do this now?
Somehow, you have managed to both make my point, and yet miss it entirely. 🤷
I said this already, that we all (myself included) have ideas of what (we) will do with the technology once it is widely distributed, but for now all (we) can do is debate the (im)possibilities based on conjecture and content.
My point was not about never achieving independence and success in our daily and/or professional lives. My point was, as I said before, to take a step back and just, breathe.
This technology is coming. This is a fact based on everything that has so far been reported/announced. However, impatiently posting demands to have access to 4o right now, is not going to make it actually get into your hands any faster. This is also a fact.
Please consider that.
how select?
I wonder how will they select alpha testers? surely blind people's needs will be more specific and targeted for the visual elements than sighted people?
Will
They aren't going to be able to differentiate between blind and sighted people.
What I'm wanting to find out is when be my eyes gets access to the API.
I'm excited to watch this develop and improve while I wait for access since there will most certainly be videos about this as it slowly begins to roll out.
Contact them, state your…
Contact them, state your case, but I imagine it will either be random or preselected on some internal basis.
I disagree with Brian, stop breathing entirely and dream on kids... Dream on.
Brian
In answer to your questions, yes, yes and yes. Let me explain myself: as a paid user of gpt_+, I have access to some of the features of 4o, which includes the improved data analysis/interpretation capabilities. even before this, with the gpt3.5, you could do a lot of this, but with 4o, it's become much more accurate, capable and stable, to the degree wherein I can confidently say I'm "number-indipendent" in my workplace so to speak( do note that I deal with charts, graphs, dashboards etc on a daily-basis).
I can do it now or anytime because I have access to my account both through my PC and through my phone. what's better, I can click an image of any data(including drawings) in the phone and share it with the app.
Well, I wouldn't say I could give you hundred% accurate analysis if you were to present me with a chart/graph/worksheet, because it'd be arrogant of the best data analyst out there to make such a claim under perfect conditions, and I am nowhere near any of that. Like I said, I am always limited by the kind of questions I can ask and the kinds of things I can imagine. But having said that, yes, I can make an attempt within reasonable human limits.
months away i guess
if the product with visuals isn't out for months so let's say the autumn or end of this year i doubt we'll see it in be my eyes until fully released but it seems nobody but them knows the answer to that question, as to when we can try it, but i doubt before the gpt app itself, to be honest. bit of a let-down though, showing that off, and in my view showing it as a ready to test product only to hear wait 3 6 or 9 plus months to be fair so by then people may just see it as oh look, the image areas are here, and forget the hype. we won't know how good all this is until real-world testing and they are constantly refining the model. obviously, through user testing it will get better each passing day, but i doubt we can even try it on external apps, in my view, until it actually is released, by Chat GPT themselves.
I'm not bothered.
They had to take down a voice and might be being sued, they're good but arrogants will be anyones downfall.
Re: months away i guess
Exactly.
I wonder what the ripple…
I wonder what the ripple effect will be as Apple, who pride themselves on being beyond reproach to the extent of nausea in the Cook era, will make of this considering their rumoured partnership for IOS 18.
iPhone 4o
I imagine a world with iOS 18 and Siri 4o.
In some famous celebrity's voice, of course. Can't wait to check out the "new" voices in iOS 18. 😝
Blah blah
Open AI is going to be fine. The drama between Scarlett Johansson and open AI is pointless. I’ve done the match side-by-side and they didn’t use her voice, just someone else who sounded similar.
the issue will be if it is…
the issue will be if it is found that Open AI purposely imitated SJ. There is some president from the 80s with Ford where a voice was impersonated for the gain of the company. The whole thing, including Sam Altman's 'Her' tweet, indicates that they were using Scarlet Johanson's character, Sam, from the movie as a template, going as far as to make it sound like her. Whatever happens, it's going to drag on and the SJ sounding voice is gone which was sadly the best of the bunch.
Open AI have also just signed a deal to train on a huge news group owned by Rupert Murdock. Basically, Open AI is going to be train on Fox news and alike which doesn't sound like a good idea to me. A thin on facts and high on outrage AI doesn't sound very pleasant.. they should be training from outlets widely held to be centreist, leaning neither left nor right, or by balancing the training set by having a medium far left news group too.
Basically, they're not quite the shiny saviours of humanity they've been billing. This along with the intentional misdirection of what is coming and when paints them in a rather different light.
Not saviors, no
They're just a big tech company that has seen a growth from almost unknown to one of the biggest players in a really short timespan, of course therer will be setbacks and bad decisions along the way, like for all other companies (the whole Sam Altman circus in november and now the Sky voice situation for example). I don't see them as better or worse than many others in that regard. Ollie, what do you mean by "the intentional misdirection of what is coming and when"? Feels like I have missed something there.
Regardless of how I feel about the company, the fact is, that they currently have the best technology for AI description of images. As the other big players (Google, Meta, Anthropic, maybe Microsoft too) release better models that advantage might disappear, but for now, if we want good image descriptions, OpenAI's models are the ones to go to (seeing the performance of Gemini and Claude in JAWS just reinforces that sentiment). Personally I hope that more models will reach the quality of OpenAI, the more alternatives we have the better, and it is not a good thing being "locked" to just one.
if is sj
if the voice is SJ so what least she in the lime light in any case it will be based off a voice of some sort
Big tech company
Exactly! the point that we most often miss in such discussions is the fact that AI, LLM, whatever is just another tool. And whoever makes it doesn't do so out of any altruistic intentions. They need to be conscious of the economics to stay afloat. Expecting that they will, on their own, adhere to only ethical practices is just hoping for a false paradise. Therefore it becomes the conscious duty of the end-user to be aware of that fact, at the same time deriving the maximum possible benefit out of whatever is available out there.
An app that guides you to take decent photos
The app lets not only the visually-impaired but also sighted users to take good selfies using the rear camera with the screen facing away from the user so that the captured photos are of higher resolution. It detects faces and various other objects, and provides spoken prompts to position the device or whatever is (intended to be) captured. Here's the link:
https://apps.apple.com/tr/app/eyesense/id1353368137
Note that the app is maintained by a company based in Turkiye, and does not appear to support any languages other than Turkish as of yet, though contacting the developer for better localization/language support is always an option. Thing is, you can just have the sentences translated or memorize them as they are, as there are already a few of them.
the misdirection aspect is…
the misdirection aspect is the general consensus about when the vision and audio aspects were coming to the platform. I think a lot of people subscribed to the app on the basis of what they saw at the demo last week. The language they used was rather hedged and I think misleading, though that's for the court of the internet to decide.
Am I not correct in thinking that Open AI is a nonprofit entity? Though, I have to admit I'm not sure what the fiscal duties are of such organisations.
Re: Ollie
Yes, I can kind of see why people think that. For me it was pretty clear that what they showed was going to take some time to get out, with all this talk of "alpha testers" and so on.
No, they are not a nonprofit, rather a "cap-profit", organization as I understand it (they however started as a nonprofit), so while their company structure is not like other big tech, they more or less operate under the same conditions (and this is one of the criticisms they have faced, that they have become focused on the commercial side which was not the intent when they started).
Ah, hence my comment about…
Ah, hence my comment about them thinking themselves 'good' doesn't stand any more.
I'm expecting to see other companies catch up soon and with various different approaches. Apple's will be interesting, if it ever gets into it. Though they are built to make money, one of their key USPs is their wholesomeness, at least, in comparison to other titans. It will be a narrow channel to build remaining a functional AI whilst not being too tied down by safety rails, but then also promoting liberalism etc. It seems that, eventually, AI will have its own political flavour too. if only we could have our own AIs and have them trained on what we want... though that would lead to the echo chamber effect too.
I turn my back for two weeks...
And all this happens - bit of a whirlwind it seems.
The demos are incredibly exciting. When away I was trying to find a good way to get a sense of where I was. The Meta Ray-bans were OK at feeding me piece meal details, and Piccy Bot was good at giving me a lot of detail after the event, but it was all a long, long way from the ducks video and having something actually appreciate what was in front of me.
The one thing I have very mixed feelings about is the personality. I'm not sure I've ever felt the need to become best friends with my computer before. I guess I will enjoy using it but I wonder what the social impact is of being able to basically create your own perfect personality who will happily flirt with you at every opportunity, laugh awkwardly at your unfunny jokes and generally just act like you are the best person in the world. I wonder what impact this will have on relations with actual people. A little like if everything you learned about the opposite sex was by watching pornography, it's not going to end up with very healthy relationships.
I was really interested to hear it interpret graphs. This is something I struggle with at work - I just ask a sighted colleague to help me see what is there. I guess I can probably do this already but it was pretty amazing to hear it working like that.
I was listening to a verge podcast on this last night and they were saying that they thought the AI wasn't so much getting smarter as getting more convincing. It has a certain Weird Science feeling about it. Nerdy computer programmers creating their own perfect woman in a lab. Also interesting to hear their take on google's move to AI and its move away from providing you access to other people\s information to basically telling you what to think.
Anyway I will definitely give it a go when it is available for free. If it was usable from my Meta Ray-bans then I would be subscribing already.
Don't talk to me like I'm your AI...
@mr grieves, As I posted before, I'm more concerned with the so called AI personality taking certain people's foulness with a seemingly happy attitude, followed by those certain people getting used to treating actual people, such as me, in the same way they interact with the AI. We haven't seen any videos showing how the AI responds to... a manipulative sociopath, for example, or how the AI responds to constantly being told it is useless/worthless/stupid/under-programmed.
I remember back when I first started using SIRI,, I kept thanking it after its response without thinking about what I was doing. Then someone told me that was silly, so I stopped. It did say, you're welcome though.
cool demo of voice en vision functions of ChatGPT 4O
Hello,
Just found this video on youtube:
https://www.youtube.com/watch?v=VnHrr1v0GEM
That's Ember
That voice is already in the Chat GPT app for iOS, the same as Sky was. But of course in the video, Ember is a lot more expressive. I like that voice actually. I imagine they won't bring Sky back though. I like the way it told the user which route to take on the underground.
Now this, on a smart glass...
Could literally be the game-changer in terms of accessibility. Yes, I know, it's dangerous to depend on AI while travelling;you could lose your internet anytime; but think of navigating safe indoor spaces like airports, railway stations, shopping malls...
The next will be a manic…
The next will be a manic dream pici girl.
Where are all the sexy boy AIs for those that way inclined? I'm waiting by the phone for Altman to give me a call.
That's just dark.
That's just dark.
Anyone heard Sky reading long texts?
Oh, I wish I could get her to read all my books. It's the ultimate in TTS I've always dreamed of.