So we had a few posts about an article, since gone. I actually chased down the article and read it. Here's the link.
https://aaif.io/blog/native-speakers-why-ais-most-powerful-users-are-blind/
First, this is the most interesting thing I've seen about AI, so let me summarize. Essentially the argument is that since AI describes things with language, and takes language as input, if you get the accessibility issues correct now, while we're defining the standards, it's all accessible. On the one hand, duh! Of course if you build accessibility in from the beginning, you've got accessibility. But the argument is sort of interesting.
Basically the argument is that things like accessibility trees for web pages are language, exactly the sort of thing that AI agents want as input. I have no idea what MCP is or how it's not accessible now, but they seem to feel that if we have accessible MCP standards, we get all kinds of accessibility for AI built-in, form the ground up, because it's already using language, just like a screen reader.
So where does slop come in? Those now deleted posts kept using a term that baffled me, "semantic surface". That's what made me look up the article, I wanted to see if it actually defined it or not. Here's the thing. I think AI made it up. Because "semantic surface" doesn't actually occur in this article. They do talk about a "tool surface", and I confess, I have no better understanding of what that means than I do of "semantic surface". But at least it's a term that's actually used in the article.
While I'm at it, let me address an argument from the latest deleted post. Their argument is wrong, did they bring "the murder weapon" themselves? No, and here's why. The contention is that the article says we should keep screen readers, but some other blind dude said screen readers are bolted on accessibility after the fact, always playing catch up. So we're defeating the whole purpose, the argument goes, by keeping these old dumb screen readers around, that are always behind, assuming the wonderland materializes, it hasn't yet.
Here's why it's wrong. Screen readers are playing catch up because, say a new program comes out. If it does something non-standard, somebody has to write code to translate that into language. Say somebody's website has a new way to specify links. Well, then the screen reader has to be told, basically, this is a link, this part tells you it's a link, that part tells you where the link goes, and so on, whatever you'd want to know about links.
But if everything becomes language to make AI agents happy, then the screen reader already *has* those descriptions. That's literally the point of the article, why it thinks that if we get this right, there's the promise of more accessibility. If, the article's argument goes, we get AI accessible and AI is using language and screen readers are using language, then there you go, everything's using language and you can communicate.
In the old DOS days, and this might still go on for all I know, menus were often done with highlighting via color changes. SO you knew you picked "open" because "open" was in a different color from the other options. Well, you have to tell the screen reader, if this option is in a different color or has a different border or is blinking or whatever, it's probably the one that's currently selected.
Contrast this with something like ARIA on the web, where you have a button, and it has roles/states. There, you get it telling you, "I'm selected". I mean, ideally, when people code correctly. Nobody has to guess. That's the argument, AI agents talk like this too. Relevant quote: "Google has said as much outright: its own guidance for building agent-friendly websites points developers to the accessibility tree – the semantic layer assistive technology has always used – and describes it, for an AI agent, as a “high-fidelity map” of the page’s interactive elements that strips away the visual noise of CSS. The same artifact built for blind users is the one the world’s largest browser vendor now tells developers to expose for agents."
This article isn't saying that screen readers are outdated bullshit tech we should abandon because reasons. It's literally saying that what screen readers are doing today with things like accessibility trees are great for AI agents, so if we make sure the AI agents are accessible, we can take advantage of this, presumably in both directions. Screen readers already give accessibility trees and provide a good model for creating new things accessible to AI, and the AI should be able to give back new accessibility trees for screen readers. Again consider a button on a website. If we have to tell AI it's selected, not use colors or borders or whatever, then we can equally tell a screen reader the same thing, and the other way round, naturally.
It's not that complicated. You don't need to invent things like "semantic surface". Hell you don't even need to use "tool surface", but you do you, article. Mind you, this argument has issues. Notably, a lot of blind people seem pretty excited, understandably, about things like AI image description. Well, if it can do that, why not hand it the same old website, highlighting and colors and all, and let it figure things out? In other words, the article assumes that descriptive language is the best way to feed things to AI agents, and it will continue to be. But why shouldn't we assume that people might think that feeding it visual things like images and highlighting and all is better, since that's how it will learn about those things?
Anyway, because I'm a language nerd, I thought the actual argument about how accessibility trees are language and AI agents also want language as input, and that potentially means more accessibility for us was pretty interesting. And I managed, hopefully, to describe what's going on without a lot of fancy complicated words that obscure what I'm saying or even change what the article's saying. Apologies if I got anything wrong, because I didn't use any AI, just my dumb brain. Trust me, it has all the issues. I just woke up and I'm gonna need a nap now, or at least more coffee. this is too much thinking for me.
Comments
Slop
It's all pretty much AI slop. Sentences with just a few words. Or couple. One. Breathless. Excited. Bullcrap. Anyway I wish people would write first and then ask AI to proofread, but that's not the world we live in. Like I know this person can write. They just choose to breathlessly spin the wheel and accept whatever superficially intellectual words that come out. This is why when I write my thoughts, in a blog or forum post, I don't use AI.
The language stuff is interesting.
But also interfaces. People dig voice interfaces, not just the blind, but it seems kind of weird to me. People are all, I want to talk to my computer, and have it talk back! I mean, that's great, until you're all "OK there's a list of fifty things and I have no idea what's in that list, start reading". OK, then how do you pick? You have to remember the list enough to go, that thing there. There are just things that voice interfaces aren't that good at.
Plus, and we deal with this all the time, consider noise. You're out somewhere, and you need to check something on your phone. If it's noisy enough, it's hard to hear, even with headphones. Other people have also commented that even though you can do things with Siri, they don't necessarily want to be talking at their phone, depending on where they are. So you have this push for voice-based interfaces to AI and Amazon and things that may or may not be AI.
That's cool and all, but it's not like you necessarily want everything to be just that exclusively. To expand a bit,there's also a reason things are done visually, e.g. diagrams, charts, and so on. I know my wife used to do a lot of stuff with numbers, and she'd be telling me stuff, and it was like, "this thing had a 3% growth rate but a 2% decline of the category", and there were just nested percentages and all. I couldn't follow her stuff from one end of a sentence to another half the time, but then, to be fair, I'm not great at math. But she'd be describing existing charts in the text of a report.
So I mean, I'm not sure language solves everything here, as such. There are still difficulties. But I think it's pretty interesting that the way screen readers do things can actually feed into making AI better, in theory anyway, if this article is to be believed. But also that it could mean improvements for us. That's the part that really stood out to me, when I went and read it for myself. It's really a two-way street here, not AI upending/revolutionizing everything because pre-AI everything was old and dumb.
Really, it reminds me of audio books, and this is exactly why I agree with the general design folks, accessible design is good design, and this is actually what the article is talking about, things like accessibility trees are good accessible design for us, and they're good design for AI too. You look at audio books, they came about so we could have access to books, but they're hugely popular now, for everybody.
That doesn't mean everything we need or do will have that happen, braille is agood example of something that's pretty exclusive to us, but it does mean that in a lot of cases, accessible design is good design, generally speaking. The more we can build that in from the start, the better. And this is happening outside of AI too, has been for quite a while, hence things like Voiceover and Talkback on phones.
Interesting note, on this podcast about Voiceover.
https://www.youtube.com/watch?v=X3JMknrBXDc
The guy who started it talks about how the iPod shuffle, I think, ended up borrowing the Voiceover interface and how somebody's sighted kid used Voiceover in school to goof off on his phone. So again, accessible design is useful outside of just being accessible.
I teach and I hate AI
I am in daily encounters with AI. I work in academia and I have to deal with this frequently. I am even nostalgic to my students natural mistakes.
Funny that the US constitution and biblical texts are identified as AI writing by detectors like TurnItIn.
https://medium.com/@michellehwd/ai-wrote-the-us-constitution-says-ai-content-detector-f24681fdc75f
Let me be very clear.
I'm not attacking anybody. So let me talk about slop for a bit.
By inventing terms, "semantic surface", AI is actually making everything harder to understand, IMO. First, that term isn't in the article. Second, the thrust was AI is going to change everything, because if this "semantic surface" happens, all visual stuff and hence screen readers will be pointless. But that's actually not what the article is saying at all.
The article is saying that screen readers are doing things in a way that's actually really useful for AI agents, and that if we get agent accessibility correct, this can feed back into screen readers and give us more accessibility, because AI agents and screen readers both use language to describe things, and they'll be speaking the same language, essentially.
I don't care if somebody uses AI. I don't care if they use it to entirely write something, proofread, construct arguments, whatever. What I care about is the end result. Is it good communication? In this case I don't think it is, see above. It's causing more problems than it solves. It's making bad arguments and producing bad writing, by which I mean, it's saying things that aren't a correct understanding or even summary of the article in question.
I use three AI things, that I know of. Schwung, which is an addon for Ableton Move, Super Liam, recoded largely with AI, and Logic Pro's stem separation feature, which I think claims to be AI-powered, but googling doesn't turn up a definitive answer. Schwung in particular gives the Move a built-in screen reader, you don't have to have another device with you so you can connect to a web page to use the screen reader, which is the way Ableton has it set up by default. So I mean, a person decided to add that feature, but they used AI, so fine, I think it's fair to say AI is giving me more accessibility, in that case.
What I'm getting at here is that I'm trying to talk about AI, how we use it, what it can and can't do, and so on. Also about the article in question, what it says or doesn't say, and what it could mean. I'm not trying to say anything, good or bad, about people. I mean I am commenting because that's where I found out about all of this, and those were the arguments being made that I think are a problem. But I'm not saying anybody's a bad writer or shouldn't use AI or that AI in and of itself is the problem or whatever. I don't think that's productive, or particularly interesting or helpful.
But I think these arguments do matter. As an example, see my previous comment about voice-based interfaces. Suppose we wave a magic wand and get rid of screen readers entirely. Now, everything is you talking to machines and the machines talking back to you. I think there would be issues there. While that might be an unlikely possibility, that was kind of the argument being made, screen readers are an old interface we'd abandon once the AI magic happened and we got the amazingly accessible future. That doesn't mean it would have to be entirely voice-based, naturally, but a lot of AI seems to be doing that and as I said in the previous comment, it seems fairly popular among both the blind and sighted. So it's a distinct possibility as a primary interface.
Re: MCP
The only hit I got on the acronym was, "Model Context Protocol".
This was from Google's AI search results. 🤷
Give us the information underneath the GUI
I think this comes back to something pretty simple: stop making us fight the GUI if the app already knows what is going on underneath.
This is the same problem we’ve had with custom controls for years. The app knows what the control is. It knows if something is selected. It knows if something is a button, a list, a tree view, or whatever else. But if all it gives the screen reader is “button, button, button”, then we’re stuck guessing.
That is why this AI agent stuff is interesting to me. Not because I want AI to replace screen readers. I don’t. And not because I think everything should become voice-only. I actually like being able to talk to a machine, but not all the time. Sometimes it is late at night. Sometimes you are in public. Sometimes it is too noisy. And some people cannot speak to a machine at all, or cannot hear it speak back.
So the win would be choice.
Use the GUI if that works for you. Use a screen reader if that works for you. Use Siri, shortcuts, an AI assistant, or whatever else. But underneath, the app should expose the actual information and actions properly.
If something is selected, tell us it is selected. If an action is available, expose the action. Don’t make a screen reader or an AI agent guess from colours, borders, icons, and unlabeled controls.
It would also stop some of the silly workaround stuff. If an app is not accessible, people end up trying to build add-ons, scripts, or unofficial versions just to do what the official app should have done in the first place. Then the company says you are not allowed to do that. Well, fine, but then give us the information properly.
I shouldn’t have to vibe code my way out of every little accessibility problem.
I’m also not sold on using cloud AI for sensitive things. I understand why someone might want AI to read a medication label, but personally I would not trust it for medication, banking, credit cards, or anything like that. Maybe when more of this runs privately on-device, but not yet.
So yes, build the useful structure. Just don’t turn it into another hype train.
An old term in the back of my mind
Screen Scraping. I remember that being an issue of screen readers, that they had to resort to getting information from the video output or something because nothing was exposed to them by apps or the operating system.
I don't know anything about coding, so the big, dumb questions that pop into my mind are: Is AI going to become the new screen scraping for screen readers because developers find it a burden to bolt on the accessibility gobbledygook? Or is AI going to make it easier or even automatic for developers to expose the app gobbledygook to the screen reader?
The article's argument.
The article's claim is that you won't need to use anything remotely like screen scraping because the AI speaks language and so does the screen reader. So you'll get the accessible information. But as Cool TUrk points out, there's sort of an issue.
Suppose Awesome Shopping Cart Inc. has a web API that let's you do everything. It has a website and a mobile app, but they're both inaccessible. OK, well, we do get all of the underlying stuff through the API, but *somebody* has to build an interface to that API. In other words, somebody has to write an accessible website and/or mobile app based on what the API gives you access to.
Are we going to see the same thing with AI? Presumably the idea is you either talk or type at the AI, or we build AI into screen readers. If, so says the article, you make the process accessible, in other words if the AI tools are accessible, then we get to build whatever we want, assuming the AI can do it. So you're either building temporary things on the fly, or building a permanent AI app/addon for your screen reader.
On the one hand, great, it gives us more options, always assuming it works. OTOH, you get to try and figure out how to get a machine to do what you want, again. Or somebody does, if the AI results are sharable. While I'm *always* a fan of more options, I think there are a lot of considerations that tend to get ignored. Again, Cool Turk points out some people might not be able to talk to an AI, or might not want to, for various reasons. That covers typing as well.
There are a lot of non-AI supposedly non-coding solutions, you can build an app, and you don't need to program! Except you do. It's just not super complicated programming. Instead of writing a whole program, maybe you write a string or two saying, connect this as a data source and pull data from there. AI will make that easier, in theory. But promises aren't actuality, which is largely my point with AI, there's a lot of hype. I absolutely agree with this article, let's get the standards accessible. AI isn't going away, it *is* making improvements in accessibility, issues and possible issues notwithstanding, so let's figure out how to improve the accessibility of the whole thing, and keep it that way.
And to return to my original point, the language thing is pretty fascinating. A technology developed for and by us, over years, is actually kind of just what AI needs. And it's what AI can give back as output, if we get it right. So in theory, that means increased accessibility. But let's be clear. You, or somebody, will still have to figure stuff out. AI will help, that's the idea, but it's not going to do stuff by itself.