The ChatGPT app has found a place in my iPhone's dock. I use it for many things, both serious and fun. Part of me is convinced that it's going to turn into Marvin the Paranoid Android from The Hitchhiker’s Guide to the Galaxy. There it is, brain the size of a planet, and I'm constantly asking it to answer very simple or repetitive queries. Yet, it always remains eager to assist with any question.
AI is a vast topic, and Morgan has already written an excellent post about it for AppleVis. Here, I want to explore how AI models have the potential to increase accessibility, both now and in the future. I'm calling it an exploration because I'm still discovering and experimenting with the capabilities of AI models. It's also a chance for you to explore with me, to tell me in the comments how you're using these models, what has worked well or hasn't, and your hopes and fears for the future of AI accessibility. I'll be mostly talking about ChatGPT, because that's what I'm most familiar with, but feel free to discuss other models and AI apps in the comments.
Describing Images, Real or Imagined
This is probably the most obvious way AI is currently being used to increase accessibility. Many of you will already be using Be My AI, the feature of the Be My Eyes app that provides AI-generated image descriptions. These descriptions are generated by GPT-4, the model powering the paid version of ChatGPT. Be My AI is a versatile and flexible tool. Its ability to answer follow-up questions is extremely useful for getting detailed descriptions of the aspects of an image you're most interested in, or requesting more information and context to help you understand its meaning.
The capabilities of models like GPT-4 extend beyond merely describing existing images. They are also invaluable for generating new ideas for visual content. For instance, if you are a content creator who has been blind since birth or for a long time, it might be difficult to generate ideas for the visual aspects of your projects when you haven't had recent exposure to visual content. You could try asking ChatGPT to generate textual descriptions of possible images or designs. I recently tried asking it to generate some logo ideas for a project, and the ideas it generated have helped me start thinking about possible designs. It won't entirely replace sighted assistance with design, but it might allow you to have a little more input into the process.
Text-Based Educational Content
When you're trying to learn, whether in a formal educational setting or for your own curiosity, it can be hard to find resources that don't depend on visuals in some subjects. What if you want to learn about the visual arts? What if you're trying to grasp mathematical concepts or discover more about science subjects normally taught through images, such as astronomy? ChatGPT can explain just about any subject and can customise its explanations to your needs. If you need explanations of visual concepts, ChatGPT can provide them.
As with any source, think critically about any information an AI provides. ChatGPT has an immense amount of knowledge on just about any topic you might be curious about, but it does have biases, doesn't know about very recent events, and won't always have detailed information about very obscure or highly specialised topics. However, every source of information has its limitations and weaknesses, and ChatGPT's ability to engage in conversation and tailor its responses makes it an excellent tool for getting exactly the information you need, explained in a way that'll make sense to you.
Accessible Games
Most readers will already be aware of the limited range of accessible games on Apple devices and other platforms compared to what’s available for sighted people. ChatGPT doesn't entirely solve that problem but can generate an endless supply of accessible and customisable games and fun activities, from trivia games to text adventures. If you're asking the model to generate any kind of story or fictional world, it'll probably work better if you specify what kind of scenario you want. The old principle "garbage in, garbage out" applies. If you give it a generic prompt, you'll get a generic response. If you craft your prompt thoughtfully, the model will be able to build on your idea. Alternatively, you can create fictional worlds in collaboration with it, for example, engaging in role-play or building a story together, taking turns to write one sentence at a time.
You could also try games and fun experiments that test the capabilities of the AI by giving it guessing games or seeing how it responds to different prompts. One that I've tried is to ask ChatGPT to make guesses about events from after its training data was last updated. Without browsing the web, it won't know about events after that point. Begin by asking it when its training data was last updated. I'm suggesting you ask ChatGPT directly, rather than giving the date here, because it might have changed by the time you read this, and because it'll be different depending on whether you're using GPT-3.5, which is the free model, or GPT-4, the model that's only available to paying subscribers. Next, try asking it to make guesses about events that have happened since then, whether major world events, news about your favourite band or TV show, or anything else.
There are lots of games to try, so keep experimenting and let us know what works, or doesn't work, in the comments.
The Future
I've been describing what AI models and apps like ChatGPT can do now, but AI has the potential to make even more accessibility improvements in the future. Apple is reportedly planning to add AI capabilities to Siri. When Siri was launched in 2011, I remember being amazed that you could ask "Do I need an umbrella?" and it would understand that you were asking whether it was going to rain. Now, in 2024, when we have services like ChatGPT, and when Siri has seemingly regressed, Apple's offering seems inadequate. With other companies launching AI devices like the Humane AI Pin and the Rabbit r1, Apple will need to catch up quickly.
I recently told Siri, "I'm not your friend anymore because ChatGPT is better." In response, Siri launched the ChatGPT app. I can't argue with that. I told it I like ChatGPT, so it gave me an opportunity to interact with ChatGPT. Yet it highlights Siri's limited ability to understand language and context. It picked up on the fact that my query included the name of an app on my device, and launched that app, but didn't understand the rest of what I said.
A new and improved Siri, which utilises the latest advancements in AI, could be very powerful if Apple gets it right. If AI language models become more integral to the way we interact with our devices, this could make our screen readers much easier to customise. When we want to customise existing screen readers like VoiceOver, we have to navigate a complex selection of settings. Sometimes, the available settings don't allow us to customise in quite the way we want. I hope that in the near future, we'll be able to tell our devices, in natural language, exactly how we want them to behave at any given time. I'd like to be able to tell VoiceOver exactly what information I want it to read and exactly what I want it to skip.
I hope we'll soon be able to use AI to interact with technology in more accessible ways. If I need to use an inaccessible app or website, I'd like to be able to ask my AI assistant to interact with it on my behalf, explaining to it exactly what I need it to do. AI like this is already being developed, although it remains to be seen whether anything of this sort will be available on Apple platforms. Apple might be reluctant to allow Siri, or third-party AI assistants, to interact with apps and websites in the way this would require.
There's also the potential for AI to make tasks easier and more blind-friendly, even when the usual ways of doing those tasks aren't entirely inaccessible. Whenever I'm writing a document, I format it with Markdown because I find it much easier than formatting with a traditional word processor. When I've finished writing, I convert my Markdown to the type of file I need. Markdown can't do everything, and for a long time, I've been looking for an app that can convert my Markdown to a Word document or PDF file while allowing me to specify exactly how the converted document should be formatted. I hope that this, too, will be something AI assistants will be able to do in the future. While it's possible for blind people to format documents with a word processor, having the ability to control document formatting through text-based interactions would be much easier and would help to reduce formatting mistakes.
This blog post has only scratched the surface, so please share your thoughts in the comments. Let us know how AI has made a positive difference in your life, ways your experiences with AI have been less positive, or your hopes and fears for the future of AI accessibility.
Comments
Image Recognition is Amazing...
Lysette, thanks for this wonderful post. I have indeed used apps such as Be My Eyes and Seeing AI. My experiences have been hit or miss, but the fact that we can now get this type of assistance with things is amazing. I've had these apps tell me the contents of bottles and cans from my refrigerator, as well as items from my freezer. One neighbor in particular has been very diligent and helpful in guiding my hand to the exact location of the bar codes. I think there are 2 reasons for my hit-or-miss experiences: my aim and the fact that I'm still on the iPhone 7. Not that the iPhone 7 has been bad by any means, in fact it's been quite the opposite for me. But I just need to upgrade. I can't wait to check out the native detection features on my new phone. I do however, have some skepticism regarding AI helping out with writing. I've attempted to compose emails before and constantly been interrupted with the word-prediction feature. There doesn't seem to be a way to turn it off either, unless I have missed something. Perhaps it will work better with Braille. I have just started using my NLS eReader as a Braille display alongside speech output provided by the wonderful screen reader known as VoiceOver. I got one of the HumanWare devices in April of last year, and thus far am impressed with it. So those are my thoughts as of today.
Very good post. Looking forward to the future.
I love the idea of living in possibility. What we’ve been able to do versus where we were five years ago 10 years ago 20 years ago is amazing. I hope that AI will let us be part of the world in a much more integrated way. As far as Siri, it can’t get any worse :-). I was asking it for sports scores Last night and It couldn’t even do that. And I know that in the past, it was able to give me scores and statistics for specific sports and teams. Not only has Siri not kept up, but it’s actually regressed and gotten worse. Another example is I used to be able to ask it to open specific settings like message settings, or phone settings or Safari settings. Now it just opens up the general settings page. In all honesty, I’m getting fed up with voiceover and even though android isn’t quite as usable right now, I’m thinking that it’s getting better. and, I have no doubt that eventually Microsoft will release a mobile operating system and surface phones. I’d actually much rather use Jaws or nvda Than voiceover. Copilot on Windows 11 has actually been a huge asset. I’m eagerly anticipating Windows 12 because that’s going to have much much more usable integration with AI.
As I’ve mentioned in past posts, I think the Vision Pro could be a great first step towards making the lives of blind people much more in step with cited people. On the phone side, I really would like iOS 18 to show me that Siri can be much more usable. I would love to be able to tell my phone the changes I want When editing anwhen editing a document instead of going through clunky touch gestures that have become more and more buggy. I’d love to be able to tell Siri to find specific things on a message thread instead of having to swipe around and being knocked out of focus. Really love to tell Siri to enable something without it forcibly pushing me onto a screen and making me hunt for it. Would really love to dictate something and it be 99% correct instead of the usual 90%. And I know that seems like it’s not that much. But when you’re talking about dictation, one percent is massive. When voiceover originally came out, it was a godsend. It was immensely important and worthwhile. As time has gone on it’s become more and more tedious to use. Don’t get me wrong. I’m very happy to have the ability to use it. Without it, my iPhone would essentially be a paperweight. But I think it’s time to move to whatever the next stage of accessibility is. And I think artificial intelligence is a huge part of that.
Formatting documents
Let me first say that I enjoyed this optimistic vision for how AI can make our lives easier. We're already using AI in many ways, and the applications of AI technology will inly increase in breadth and depth.
Your description of formatting documents using Markdown caught my attention. Let's not forget, before the introduction of WYSIWYG text editors, plain text markup was the only way to control formatting. It was the Apple Lisa, then the Apple Mac in 1984, that showed us a different, more visual way. Great if you're a sighty, but for blindies, the old text markup was, and is still, a better option.
I haven't tried this, but I wonder if AI could already be used the check the formatting of a document? It seems like ChatGPT ought to be able to answer questions like, "If this double spaced?" or "what font is this text?"
Thanks, Lysette. Great to see you blogging again.
Formatting and boring stuff
Maybe AI could do some looking for odd spacing and other difficult things to catch with a screen reader. You know, those things that a publisher's reader catches at a glance, then tosses the manuscript file in the computer trash...
I've used the Seeing AI description of scenes to get an idea of whether a picture I've taken is likely to be good enough to show to sighted people. It does OK, but I would probably like one of the apps that you can ask specific questions about the picture more for that.
As I stated in a previous post...
Having screenshots described by AI and asking it where to click on the screen to activate a certain option in situations where I can't use VoiceOver to perform this task, is something I've done more than once. I just turn off VoiceOver, tap this part of the screen and turn it back on.
Great post
Exciting times ahead.
I'm looking forward to the first wearable that brings all of this together and can be worn for extended use as part of my normal life. Fingers crossed for it to be in the next couple of years and to have a price tag that doesn't break the bank! ☺️
Image Recognition
Ekaj, I actually haven’t tried taking a picture directly with Be My AI. I’ve only ever used it to describe images I found online. As you mentioned, the need to aim the camera properly makes that more difficult.
I hope that soon, we’ll be able to interact with AI in the same way we interact with human assistants from Be My Eyes or Aira. I’d like to be able to connect to an AI assistant, tell it by voice what I want it to do, and have it process my video feed and guide me to where I need to place my phone’s camera, then tell me the information I need. Some of the needed components already exist; AI can already provide information about images, as we all know, and ChatGPT has a voice chat feature. The challenge would be getting it to process live video rather than still images. More speed would also be helpful; ChatGPT’s voice chat currently works by transcribing your speech, generating it’s response to what was transcribed, and then having the voice read the response, which means there’s a delay between when you finish speaking and ChatGPT starts replying.
In any case, as fantastic as Be My AI is now, it would be even better if it could provide guidance on camera placement. If what I’ve just described couldn’t work, perhaps they could incorporate something like the tools that already exist in apps like Seeing AI. This way, the app could tell the user when a document or a face was in focus, possibly after having the user specify what they’re trying to take a picture of, and then once the picture has been taken the image could be sent to GPT4 for a description.
Siri
Ash Rein, I recently asked Siri to find local taxi companies, which was definitely something it could do in the past, and it told me it couldn’t help with that. I even tried saying I was drunk and needed to get home (not true, but I’m sure it used to be programmed to respond to that by offering to call a taxi). On this occasion,, it just told me to be careful.
Formatting
Paul and OldBear, I just tried asking ChatGPT about the formatting of a Word document, and it seems it only has access to the text and can’t provide details about the font. However, when I pasted this post into ChatGPT, it was able to describe the Markdown formatting in detail. It shouldn’t have any problems correcting Markdown errors.
Paul, your comment reminds me of something I’ve been thinking about, which is the cyclical nature of tech. In a way, having access to AI language models is a bit like going back to the DOS era. Current AI is much more sophisticated, of course, and much easier to use since we can tell it what we want in natural language rather than memorising commands. The similarity is the both rely on text-based interaction, making them more accessible to blind users.
surprised this post hasn't got more traction
I first saw this then forgot. Sorry this'll be a long one. I should preface this by saying I don't want absolutely everything to be voice activated. I still want to control things with physical buttons like ovens and microwaves in particular. Firstly Lysette, that 'Web agent' thing sounds amazing. I've been wanting something like that for years. The ability to book flights; the ability to fill in PDF, Word and some online forms which aren't designed with screen readers in mind and the ability to click on that button I know is there but I can't get to it with JAWS or whatever. I'd bite someone's hand off to get an AI assistant like that. I really haven't got much time for Alexa. Some assistant as far as I'm concerned. I know it can control your thermostat, lights and some smart ovens which is good, and you can set it up to turn your heating on and off at certain times, but that's all it can do which is any use. I haven't used my Alexa for ages because it can't do anything particularly useful. Things like ChatGPT and Be My AI blow Alexa out of the park as far as research and looking up things goes. AI is already amazing at identifying text in foreign languages. I've copied and pasted screenshots of stuff into Be My AI and it has given me the text perfectly, no typos or missing accents. This text contained text rendered as an image. It was weather terminology structured as a table with the English terms on the left and the Norwegian on the right. Before Be My AI, I would have had either to scan this with OCR, or do a google book search for the content on the page, sometimes this works if there's a preview of the book, this worked for that particular book, but that's not always the case. I've googled stuff when using Be My AI if I haven't been sure when it's a language I don't know well, and the spelling is perfect according to Google. So I know it's right. This is like nothing I've seen before. The old days of Kurzweil and finereader on Windows are well and truly gone. Even OCR on Windows 11 doesn't come close. I haven't properly bothered with scanned PDFs on Windows for years because I know I'll always get less than perfect results. But now with Be My Ai, what a difference. The difference between me scanning a PDF on Windows with JAWS and NVDA OCR versus Be My Ai is unbelievable. It's like night and day. No messy formatting to contend with or tonnes of typos. If subtitles are hardcoded into YouTube videos, I have to pause the content and then OCR something if I want to know what they're saying. And the OCR isn't perfect even when I select the correct language. But now with Aiko, I can do a screen recording of the video I'm watching, then send that video file to Aiko for transcription. I first tried Aiko with English podcasts. I was surprised at the accuracy, it hardly made any mistakes and was practically word perfect. I knew it could support lots of other languages, but I was dubious as to the accuracy with other languages. I tried it recently with a Norwegian video from YouTube, a video with hardcoded subtitles, and it actually worked and came up with something which was more than legible, it actually transcribed pretty accurately, even though the speakers were speaking quite fast. There certainly wasn't any slow BBC equivalent going on. I can pause a YouTube video and take a screenshot, share it with Be My AI, then it'll describe what's happening in the video. Truly amazing since I'm interested in creating YouTube content, and I want to know what type of things the people I follow put in their videos. And Paul, if you take a screenshot of a webpage or word document and send it to Be My AI, it'll describe the formatting for you, especially if you ask it questions. It gets the formatting right, but not colours. I just put text in red and it said it was in blue. Apple's image recognition does this on webpages with text rendered as images. It describe something I was browsing as black with a pink background, and another thing as black with a green background. I also took a screenshot of a form with no audio captcha on my Windows machine. I sent it to Be My AI and SeeingAI and they both gave me the wrong result, even with full screen mode enabled in Brave. However, when I tried this with VoiceOver recognition , apple correctly identified the CAPTCHA right from within the email I sent. And before anyone says it, I didn't put my email and password in before I sent my screenshots across! And yes, I want a video AI assistant like AIRA too.
You didn't add the table.
BeMyAI is amazing.
SO far i've just got it to describe pictures to me, I wasn't aware it could do more than that and now I know that; i'm even more excited for the future.
don't want to get done for copyright
Subject says it all. Don't fancy getting done for posting images of copyrighted material online. Taking screenshots of educational material is legal for personal use though. If you try out any image PDF, it works. You'll have to go page by page, it's a slow process, but if you really need something it's worth it. I tried the first few pages of a pdf I'd previously tried to scan with JAWS OCR on Windows. The Windows scan was a complete mess, some of the text came out hardly readable, but when I scanned the first few pages with Be My AI the difference was unbelievable. I could read everything clearly. The difference was incredible. I've never seen anything like it with these types of documents.
Ah ok.
Personally i'd not care but that's why you're not me :)
I think this app is amazing! I was able to scan my tablets for my cough/cold this morning and find out how many I needed to take and when. Before I'd try seeing AI and if I was lucky, I might be able to get that info, now; I'm almost garenteed it and that makes such a huge diffirence.
How to get Be my AI
I've downloaded By My Eyes, but have no idea what to do to get Be My AI. I've never had to use the Be My Eyes app before.
Once you've signed up.
It should be the second tab?
I don't think you have to be on the beta to use it.
Thank you!
Lysette,
Thank you so much for your recent piece about ChatGPT and AI. I agree that this new world of artificial intelligence is going to be transformative. And, it absolutely is a wonderful learning tool for the blind. I have no vision at all, but my wife and I bought a new car last year. While we were on a vacation in January, I decided to learn everything I could about car brakes. A very long time ago, I used to drive, and now that I owned a hybrid vehicle, I really wanted to know how brakes worked in the past and how they function now. But, I wanted all this information in a blind-friendly text. And, it was Great! I enjoy a subscription to ChatGPT and use it, to be quite honest, all the darn time. For me, it is all about learning something new. Sometimes it is braking systems, sometimes it is about the known atmospheres of planets. And, I just make sure that ChatGPT knows that I depend on descriptive text.
I enjoyed reading your blog and really appreciate your relaxing writing style. Thanks for a job well done!
Warm wishes from Texas,
Morgan
Friendly Reminder
A lot of people either fail to realize or forget that AI has been in our lives for a long time. Think about when you play video games, both retro and current generations. When you use computers and smart phones. When you use virtual assistance such as Siri or Google Assistance. When you interact with chatbots on business sites such as Amazon. Just think about any other activity that involves machines being fed data and applying that data. The key word nowadays is generative AI. Just a little reminder.
Real-Time AI for People who are Blind
When I think about AI for those of who don't see, I think about what some of the real-time advantages will be at school and work.
BeMyAI is just a start. When we have glasses that really take advantage of AI, we'll simply look at a PowerPoint, or even a tv program and the AI will be able to generate audio descriptions in real-time. No picture taking. It will know what you're looking for and give it to you. Can we really replace audio described content? I don't know but I suspect the answer is yes!
PowerPoints on a screen at a conference or in a class? No need to wait for the usually inaccessible version. Through your AI glasses you'll get it all in real time, along with everyone else, unless that guy next to you just won't shut up so you can hear.
Menu on the wall at the take-out counter? No problem. Tell your AI to look for a menu and it will!
I'm particularly interested in concepts like the Seleste glasses. Why? Because they are going to charge a monthly fee rather than a one-price concept. Companies that sell for a fixed amount simply won't have the money to keep up with the on-going research costs necessary to make changes that will be coming fast and furious as this new technology advances.
For those of us who can't afford the monthly charges, just show your VR Counselor or VI teacher how the glasses make you more employable and make classrooms more accessible and less frustrating and they'll hopefully pay for them.
It's truly going to be a new era. Yes, there will be hiccups along the way, but, what an exciting time to be alive!
Seleste glasses.
They do seam very interesting but a bit slow at the moment.
Having said that, they're very cheep, if you have the money, they want $100 for the glasses and you get a free month, then they want you to pay $50 a month, that's not a bad deal at all.
If they could put some kind of map function into these I'd be very interested.
I don't know if they deliver to the UK at the moment but I'll check back in a couple months to see what they can do.
There youtube channel.
IF anyone is interested, here's selests youtube channel: https://www.youtube.com/@SelesteCo
I think these guys might be the first to actually make a live feed AI for the blind and if that's true, I'm probably going to buy it ASAP.
Oh they've not shown that yet but I've ask on their channel and i'd not be surprised if that's what they're aiming for.
I don't get it.
What was that email meant to show?
I'm looking forward to seeing how far seleste goes.
I don't care much about what the thing looks like as long as it isn't to bulky and does the job.
Ah, ok.
It's not that your thoughts aren't welcome here, it's just that if you go against what the majority are saying, you're going to get push back, that will happen in any community.
I don't care what the thing looks like as long as it's not super bulky. These glasses shouldn't be because according to the website they're thin.
I won't buy them yet but I might some time this year, depending on what features are added.