I wonder if generative AI could fix voiceover bugs

By Ash Rein, 7 August, 2023

Forum
Accessibility Advocacy

It occurs to me that generative AI could probably be used to check and make fixes to the base code for macOS, iOS, iPadOS, watchOS. It could probably definitely be used to improve the code for voiceover screen reader. That would probably mean a much more Streamlined experience. Bugs could be essentially fixed overnight. I’d really don’t know how many coders frequent this website. And maybe somebody with the right experience could take the underlying code and ask. Gpt or something related to make the appropriate changes and fixes. What do you guys think? Something like this is probably going to be inevitable as the technology progresses.

Options

Comments

By Enes Deniz on Wednesday, August 23, 2023 - 11:35

How exactly do you intend to obtain the code?

By Justin on Wednesday, August 23, 2023 - 11:35

You could probably use AI to fix some issues, but there's a lot of accessibility that requires a human to analyze and make judgment calls on. Just look at website overlays that claim to fix everything but often make things worse.

At the end of the day, inaccessible products isn't a code problem, it's a people prolblem, and you can't rely on code to fix a people problem. The only way to truly do accessibility right is to bake it into every step in the software development lifecycle so it becomes second nature. I know this all too well, I embed within development teams to teach them how to do it.

By Ekaj on Wednesday, August 23, 2023 - 11:35

I am by no means a coder, but I'd be interested in something like this. I've honestly not experienced some of these bugs over the years which people have had, but perhaps this would work.

By Ash Rein on Wednesday, August 23, 2023 - 11:35

I don’t know that voiceover or iOS or macOS is open source. But it’s all based on UNIX. (as far as I remember.). I think a lot of the accessibility overlays for websites are just skins. This would literally be a. Gpt based AI looking through the code and making corrections. I remember reading an article about how someone tried to do something regarding coding and chat. GPT essentially corrected and improved the code in a matter of 30 seconds. And it scared the hell out of them. Because they realize that they could be expendable. And I believe the code was solid. There is probably a lot of coders out there who could get their hands on iOS base code, and then run it through chat GPT, in terms of accessibility and see what comes up. I don’t know who to talk to. I’m not a coder myself. at least I have not done it for years. But it’s just something that occurred to me and I think it could solve a lot of our long-term frustrations. Of course, Apple would have to accept the revised code. And I don’t know that they would do that immediately.

By Ash Rein on Wednesday, August 23, 2023 - 11:35

I completely agree with you regarding what we referred to as “AI.”

None of this stuff is actually creating anything from scratch. It’s all based on pre-existing information. And it doesn’t do anything unless you specifically ask it to. It is essentially a buzzword. However, it is a very important step in technological and software advancement. It’s not really artificial intelligence. But Right now, it’s what people have been Trained to think of.

By a king in the north on Wednesday, August 23, 2023 - 11:35

There are a few obstacles that have to be overcome for this to be remotely possible…

Context length: Currently, the most powerful model for coding is GPT4, which has a limit on how much text you can feed to it and get from it. Currently, this is 8000 tokens, which would be approximately 1798 words, nowhere near enough context to process an entire codebase, certainly not as large as VoiceOver probably is.

Cost and performance: These models cost a lot to run, and if you tried to get it to go through an entire codebase piece by piece, your costs will very quickly rise. Not to mention that they sometimes tend to slow down considerably when demand is high. We're probably years away from these models getting great performance with not much hardware needed. We'll need a big breakthrough in hardware performance to make running these models cost-effective.

Quality: While the ability to generate code looks impressive if you're not coming from a coding background, quality drops substantially with the complexity of the code. You can probably create a web app or website with it, but anything with many dependencies like screen readers are a no-go. If you try to get it to use libraries and frameworks, it falls apart, mostly because the code turns out to be extremely inefficient, using too much memory, or it simply hallucinated most of it, which means the code is pretty much crap that you can't use.

That's just a few of the issues when it comes to getting it to fix a codebase like VoiceOver. There are a few more I haven't mentioned, but these are the most important

By OldBear on Wednesday, August 23, 2023 - 11:35

When software starts programming itself, either that supposed Mayan prophecy about technology turning on humans will become self-fulfilling, or it will be like a Samuel Butler novel, where the machines start breeding with each other in a Lamarckian Free-for-all. At least it's not actual artificial intelligence... yet.
On the other hand, I think it will be helpful when they start using AI to drive screen readers and accessibility. And I don't mean AI being run on some remote server requiring a network connection, but in the actual phone or device.

By Holy Diver on Wednesday, August 23, 2023 - 11:35

I think we're a few years, or perhaps decades away from this being practical. Likely by the time what we're calling AI advances to the point it can do this it will have already made all our screen readers as they work now museum pieces. Imagine your computer knows exactly how you like to navigate web sites, can immediately eliminate all the ads and just tells you the thing you're looking for without needing to jump around by headings, using the find function etc. We could each have our own tailor-made screen readers based entirely around our usage patterns and preferences for exactly how we want to find information, what's irrelevant to us, what keyboard shortcuts we want to use etc. When that happens, probably much sooner than Apple will open source voiceover, this conversation won't matter anymore.

By PaulMartz on Wednesday, August 23, 2023 - 11:35

The easiest way to create new bugs is to change existing code in the name of fixing an existing bug. If you don't know why the code is written the way it is, you probably have no business changing it. Generative AI is not capable of this kind of critical thinking. I speak from 30 years of experience developing software.

Since the start of the year, I've been dreaming of a generative AI-based SIRI replacement for our iDevices and Macs. Such a tool would render VoiceOver issues moot, not to mention accessibility issues in general. You would simply tell the new generative AI SIRI replacement what you want to do, and it would do it. Want to put a date in a text field, but can't navigate the entirely inaccessible calendar picker interface? Just say,, "Find the date field and enter August 7, 2023." and, shazam, it's done. No more clumsy navigating, no more unlabeled control problems.

This seems much more feasible than expecting AI to fix bugs in code, at least near-term. Consider the soon-to-be-released Be My Eyes virtual assistant. Take that technology one step further, and we're there.

By Siobhan on Wednesday, August 23, 2023 - 11:35

First, Whoever said you need a human to fix bugs in any website, OS or the like is absolutely correct. Years back, I did a consult with a pretty large company. When I was asked to find something, I used my advanced keyboarding commands to do as they asked, and the person said, "She can get to it but it's not right in front of her, end users won't know how to do that." Second, I also agree the layovers on websites are way more harm then good, because a program says, do this and yay blindies will be great! Human blind person says, the hell have they done this time? Paul made a good point about how creating new bugs would be possible not fixing what's there based on the idea AI doesn't know how the bug exists in the first place. The last point i was going to make is about the Be My eyes virtual assistant coming out. My only gripe and it's a monor one is that the podcaster edited out the wait times until there was a response given. I understand why, cut down on file size, length of recording time of them personally... However it seemed more instant then would be in the real world. so maybe it might take five seconds for the edited version whereas my might be thirty. Not a huge pain in the butt just a part of my observations. I will be happy to try out the assistant when it comes out.

By Ash Rein on Wednesday, August 23, 2023 - 11:35

I really don’t believe that an actual artificial intelligence would be destructive. I don’t believe that they would eventually evolve to destroy us. That’s really schlocky sci-fi writing. It’s likely that the artificial intelligence would revere us because we are the creator and be more inclined to want to treat us like a valued pet. Essentially, we would be free to do whatever we wanted and the day-to-day nitty-gritty stuff would be taken care of. Futurists tend to believe that after some point, the artificial intelligence would eventually hit a bit of a wall. And the only way to move forward would be to eventually integrate itself with human beings. Terminator is a really fun concept to watch. It’s not really something that would practically happen.

It’s like saying that aliens would immediately attack us for our resources. And the likelihood that is very low. Because they can also be so advanced that they’ve reached a point of benevolence. When it comes to the chat GPT technology, I tend to take the longer view. I acknowledge what it is capable of doing now, and what its limitations are. And I really think in terms of what it might be in 10 years, 20 years, 40 years. Ultimately, I believe that there will be improvements, one way or the other. This was more of just a conceptual thought. Not really meant for actual practical use. I do think that somebody would probably be smart enough to take base code and run it through chat GPT, and make some use and improvements out of it. But that person might be far in between. I tend to rather believe in trying, and failing than believing it’s going to fail and not trying.

By Enes Deniz on Wednesday, August 23, 2023 - 11:35

Why can we not input the bugs ourselves as if sending feedback to Apple? AI sure doesn't know what a bug is, but we can instruct it to create or modify the code in a way that fixes whatever specific issue that we enter in detail. Don't know if such a thing is practically possible, but it's just an idea and a guess.

By OldBear on Wednesday, August 23, 2023 - 11:35

Oh come on, where's your sense of gloom and doom? Surely, the air fryer and the robot vacuum are going to rise up one day and start chasing us around the house, now that we've connected them to WiFi.
I think my other point was that adding AI to screen readers to carry the load of accessibility would be a better solution, or enhancement, than trying to get the AI to fix bugs in the screen reader. Just getting AI tools to assist in labeling web elements like buttons on sites like Amazon and Google et al on the designer's end would be an accessibility enhancement.

By Tayo on Wednesday, August 23, 2023 - 11:35

How about having AI run translate services that do proper translations.
Google translate could use an ai upgrade. We keep hearing about how ai has been taught to speak such and such a language, but translation services aren't getting any better.

By Enes Deniz on Wednesday, August 23, 2023 - 11:35

Didn't quite get the point but Google and others have already been working on what's called neural translation engines. By the way, if you plan on leaving the entire task of translation or localization to AI, as Microsoft has already tried and failed, you'll most likely end up having a bunch of mistranslated strings that don't make sense at all.

By Tayo on Wednesday, August 23, 2023 - 11:35

AI has been misnamed. I would never expect my computer to perform tasks perfectly without my input, so i would never suggest that AI be given sole charge of translating something as complicated as a spoken language. I've heard of the neural engine, but is it anything more than a concept at the moment? Sorry, don't answer if this doesn't make any sense; it's already quite off topic and probably outside the scope of this forum.

By Dominic on Wednesday, August 23, 2023 - 11:35

Proly in a few years this might work but I don’t think now

By Ash Rein on Wednesday, August 23, 2023 - 11:35

Although I believe AI will probably change how we interact with our phones. And, it will change accessibility for many types of people. I also think it is a few years away. I like to conceptualize possibilities. Even if they sometimes seem outlandish. Fixing voiceover for now is a major priority for me. Testing is a huge part of my day. I live and die with these bugs. And honestly, many of them linger despite that reports, screen recordings, sys diags. I bring up GPT as a concept because I honestly wonder if there is another way to resolve these voiceover bugs.

Truthfully, I almost wish we had a third party screen reader to move towards. I wish Freedom Scientific or NVDA would jump into the iOS, Mac OS side. There has got to be a reason why they aren’t.

By Siobhan on Wednesday, August 23, 2023 - 11:35

Ashrine you're right. apple is so focused on privacy controls, which in most aspects they should, they don't want anyone doing anything to mess anything up. 99.9% of users wouldn't but then you have the people, a poster you and I have interacted with on multiple occasions might not be so lucky. I do agree, Voice Over needs to completely rebuilt from 1984 or such

By PaulMartz on Wednesday, August 23, 2023 - 11:35

Ash Rein wished Freedom Scientific would create a Mac screen reader.

A close look at Freedom Scientific's JSL - JAWS Scripting Language - reveals two things. One, Windows still supports the same window API that it developed back in the days of Windows NT, and two, that's what JAWS is built on top of. JAWS is robust and functional on Windows primarily because of Windows' underlying stability.

Are there any programmer APIs on Mac that have been around for 30 years? I don't know, but I doubt it. Apple always seems ready to replace old working APIs with newly developed technology.

This might explain Freedom Scientific's reluctance to develop a screen reader for any Apple platform. Screen readers require a tight integration with deep system APIs. If there is no guarantee that those APIs will exist long-term, coding to them would be like trying to hit a moving target.