[Work In Progress]: Vosh - a third-party screen-reader for the Macintosh

By João Santos, 13 November, 2023

Forum

App Development and Programming

After getting fed up with the general neglect of MacOS accessibility from Apple, and having wanted to work on something meaningful for quite some time, I decided to attempt something that for some reason nobody seems to have tried to do before: write a completely new screen-reader for that platform. This isn't an easy task, not only due to the amount of work required to even get close to matching a mature screen-reader in terms of functionality, but also because Apple's documentation for more obscure system services is nigh on non-existent. Despite that, and since I've already overcome a lot of hurdles that I thought to be show stoppers, after a single week of work I already have something to show in a very embryonic stage of development. The idea is to gage the interest of the community in a project similar to NVDA for the Mac to be worked on in the coming years and to which other people can contribute.

The project is called Vosh, which is a contraction between Vision and Macintosh, though the name isn't set in stone yet so if you wish to suggest something different feel free to do so. The code for the project isn't available yet since I haven't even done my first local commit as I'm still learning the ins and outs of Apple's consumer side of the accessibility framework and thus will likely end up refactoring the whole thing, but once I feel comfortable with the structure of the code I will post it to my personal GitHub profile for everyone to see and modify as they please, and will also start accepting code contributions. At the moment the only thing this project does is to allow navigating the accessibility of every app, reading everything and moving the keyboard focus as it attempts to read the contents of every accessibility element, though there's a lot to do to even get close to matching the level of sophistication of VoiceOver, which employs a number of hacks to make things feel as smooth as possible to the end-user. One such hack is the flattening of the accessibility tree which tends to be very shallow particularly on Safari where the accessibility tree can actually be quite deep. It is my intention to make navigating with Vosh as close as possible to navigating with NVDA. For that reason I will try my best to copy its behavior and laptop keyboard commands as much as possible so that Windows NVDA users can feel more at home while using a Mac.

Before posting this I also posted a demo video to YouTube where I show that it is already possible to do some very basic navigation of websites like reddit using Vosh, though very inconveniently both due to the fact that I haven't implemented semantic navigation or the aforementioned accessibility tree flattening hack employed by VoiceOver. I apologize in advance for my diction since, while I write English every day, I rarely speak the language, and it is also not my native language, so if you don't understand something I say feel free to ask here. I also apologize about the quality of the video, as the original file was almost a gigabyte in size so I kind of over-reduced the resolution resulting in a huge degrade to image quality that I wasn't quite aware of until someone sighted actually watched it.

Options

Comments

Developer of year

You deserve an award for your bravery against apple 🍎

Absolutely intresting

I assume that this is an open source project which we all can contribute to.
Aslo, I don't know which language you are building this one in, but take some inspiration from NVDA would be a good thing :-)

What a good idea

I really think this is a good project. Will it only be available for Apple Silicon devices though?

Code

The code isn't available yet because I haven't even made a single commit, but yes, it will be open-source once I refactor the project to abstract as much of the poorly documented consumer-side accessibility framework as possible.

The project is currently fully written in Swift and targets both Intel and Apple Silicon Macs running Ventura or later (though I might relax this restriction if there's demand for it). I intend to make it scriptable with AppleScript, Apple's flavor of JavaScript, and maybe Python or Lua. I am not too familiar with NVDA add-on development, nor am I sure whether it's feasible to make it add-on compatible, but this is something to think about later once I get the core of the project working properly.

Oh

What’s you make it? Is there a file that you could possibly make the code bypass the restrictions on hydro problems because I have a MacBook Air 2017 bunny macOS Monterey no really want to test

Easy there Dominic ;)

He's barely scratched the surface of this whole thing. Haven't watched the vid yet, will do so later. I wish you luck as Apple's quite tight lipped about their code as I no joke, knew someone who tried to publish JFW source code. Didn't get far as I reported him to a good friend who's a tech guy or was at the time. I hope this pans out, I also wish Apple would really go back to the drawing board and stop releasing no shiny things year after year with no hope of even doing any work on their own voice, Alex which I still like and don't care for the multititude of screen voices that are shipped but that's me. Good luck.

Very Interesting...

Hello. I am just seeing this now, and it sounds intriguing. Although I've always liked VoiceOver, I'm perhaps starting to see where people are coming from. Safari in particular, has become a bit problematic as has Mail in some aspects. I've been thinking to myself recently how frustrating it must be for beginners who are running or who are perhaps going to upgrade to Sonoma. I think a sister of mine might be in this situation. She just got her second MacBook, and is currently running Ventura but received an update which actually says "upgrade now". I'm thinking this might be Sonoma, but am going to work with her again over the Thanksgiving break. But I think there's still hope for Apple. I myself haven't had quite as many problems as others. That said, I would very much like to try out your offering and wish you nothing but the best with this project.

excellent work

I haven't watched the demo video yet but based on what you've described, this is amazing and a much needed thing for the community.
It seems like paid/built in screen readers tend to stagnate (Jaws is a prime example of this), and it takes upstart devs like you to give them some competition. It will be very interesting to see how this develops, and I am very excited to follow this project as it goes along. keep up the good work.

Definitely cool

I'd love to see this! The more screen readers written and controlled by blind people, the better.

Best of luck

Hello,
I wish you good luck with this, and I hope you manage to reach your goals. I will certainly do my best to test it and report any problems.

I also want to make sure you keep translations in mind, and this is probably one area I could contribute to, translatinng the screen reader, once we reach that point, of course. But, since you're also a non-native English speaker from what I understand, I guess you already understand the importance of this.

Once again, good luck and looking forward to seeing the progress you make!

I am so stoked for this!

Just finished the YouTube video. The first thing I noticed was, that I did not at any time hear "Safari Not Responding". Seriously though, and I realize that what you have is a project in its infancy, but what you "do" have shows potential.

A lot of people have been asking, likely for years, for NVDA to come to Apple. NVDA is written in Python, and while that is certainly doable, I applaud you for your choice in programming language.

Best of luck in this project, and I am eager to see how it evolves over the next few years. 👍🏼😎

Witnessing yet another thing I used to think was so unlikely...

Okay, I'm a Windows user but just wanted to congratulate you. I thought of some names like Applecable/Applecation, implying that it makes it possible to use your Apple computer, or Magictosh, which is a weird name made out of phonetic similarities between the first few sounds of "Macintosh" and the word "magic". Anyway, I only have one point that makes me unsure: How will this screen reader receive regular updates? Apple releases OS updates more often, and these updates often contain bugs unknown to us untill we perform tests and actually encounter them. The stability of Windows, as far as I know, is something that you don't have on MacOS so you'll have to keep up with the pace as you adopt to all the changes that Apple makes as new updates roll out.

the beauty of open source

re: how will it receive updates:
VoiceOver is closed source. that means if apple decides they aren't going to fix something, there's not much we can do about it.
that's what is so nice about Vosh. It looks like Vosh will be open source.
Therefore, if something is found, then it comes down to convincing a dev to fix it. anyone will be able to view, modify, fork, etc. the code. so this project will live a life of its own, and the bugs that creep in will eventually be fixed by community contributions. if someone wants the bug fixed badly enough, it is doable by any dev, and not bottlenecked by apple.

Very cool!

I haven’t had near the VO problems experienced by many, but Safari and Mail do sometimes stop responding.
So, I have a few questions. Once you fix the flattening issue, will you be able to navigate with just arrows like NVDA, or will you still have to do a lot of interacting? Will it work with any voice we want to use that is downloadable in mac os? I would love to see the NVDA team get involved, and for this to just become NVDA for Mac. One thing I request… Please please keep this working on Intel. Apple is doing some pretty cool stuff with their new chips, but many of us don’t have the funds to upgrade, or have other reasons we are sticking with Intel. If this works like what you are saying, it could breathe new life in to older mac’s, much like the Jishuo screen reader did for older and lower specked android devices. In my brief time using Android, I found that screen reader was a lot better than Talkback, but that was a few years ago. Will we have a key to toggle it on or off? Will it work in login screens?
I like Voiceover a lot, but having options is always a plus. Lastly, would this stay just on Mac OS, or would it also be brought over to iPad, iPhone, etc? I think Mac is where it is most needed, but as it expands, some may wish for continuity and to have the same screen reader and settings carry over between devices.
I think you have a good thing going here. Keep up the good work.

I am happy to donate to the…

I am happy to donate to the developer of this app. I know this is a very demanding work and. I would be really happy to donate something as a thank you message.

amazing work!

Amazing work! this is indeed similar to using NVDA on Windows. I'm willing to test and provide feedback. Yeah, as others have stated, Safari and Mail has some issues, but with the introduction of this third party screen reader, I can see it as an option. I have Sonoma, so I'm guessing it will work with it as well. It uses Samantha as default in the video demo, however, with Eloquence now being an option for some, it will also work as well. Hats off to you, and do keep us posted.

At Justin Harris

First I don't think that I can completely get rid of the not responding problem since ultimately that's a problem on the app's side. Currently Vosh just exits with an error message when an app does not respond for a while, a consequence of the naive architecture that I implemented which is why it needs refactoring. One thing I can do, which may or may not slow down the screen-reader when a new window opens, is to cache the entire accessibility element tree and apply changes in response to accessibility notifications, which in theory should alleviate the not responding problem a lot, though this needs to be done and tested.

Second I cannot yet answer whether it will be possible to do caret browsing as smoothly on Safari as on Windows even after flattening the accessibility element tree, but it is my intention to make that a reality. VoiceOver attempts to do caret browsing but sometimes for some reason it gets stuck in an element, and since I haven't investigated why yet, I cannot tell whether I'll be able to do better.

Third Vosh already works in log-in windows when it's running, which is a consequence of giving it permission to act as an accessibility agent in System Settings, meaning that I didn't had to write any code to make that happen. Currently it's just a windowless application that's marked as an agent so it doesn't appear in the dock or in the task switcher, and the only user interface that it has is an icon in the menu extras bar which, when clicked, pops a menu with an Exit option. At the moment it doesn't respond to any key combinations when stopped, but by the time I make it ready for distribution I will distribute an agent daemon along with it that will listen for a specific key combination.

Fourth and given the fact that iOS is such a closed and walled garden environment, at least until Apple is forced by legislation to change their stance about the openness of iOS, it won't be possible to do anything about VoiceOver on that platform.

Just Curious

Hi,
I'm just curious which APIs you're using for this. How do you get the Accessibility UI and how is it allowed to control the application? Which API does it allow that your program can move the keyboard focus and detect on which element it currently is and what information it contains?

APIs

The consumer-side accessibility API are all the Carbon-era CoreFoundation functions and data-types starting by AX in the ApplicationServices framework, though unfortunately there's very little documentation for it, and the documentation that exists is quite incomplete, requiring reading through C headers distributed with Xcode to get the whole picture. The input event tapping is done using CGEvent from Quarts in the CoreGraphics framework, and CapsLock management is done using IOKit's Human Interface Device (HID) API for which there is zero documentation so you definitely have to read the C header files to understand.

Will it read notifications from apps such as Messages?

Will it read notifications from apps such as messages? also, is it possible to use keyboard navigation just like you could with VoiceOver in apps such as safari, Mail, Etc??

Notifications, navigation, and voices

As for notifications, at the moment I don't think they are being read since I have restricted Vosh to only work with regular apps for my own sanity while debugging, though this restriction is arbitrary and I will make it work for everything once the core is stable enough.

As for navigating applications, that can already be done as shown in the video, since the ability to move to the next, previous, parent, and children elements is already implemented, however since I haven't decided on a criteria to do any kind of accessibility element tree flattening yet, navigating this way is quite tedious, but you can always use the shortcuts provided by the apps themselves to move the keyboard focus which in turn makes Vosh move the accessibility cursor.

As for voices, which I forgot to mention in a previous comment, I cannot guarantee full access since in some versions of MacOS Apple only allows the public speech APIs to access the enhanced voices. I think that this limitation has been lifted in Sonoma, since as you can hear from the video I'm using the Samantha compact voice which is the default, but am not entirely sure.

I'll donate to this.

I'm a windows user but think this is amazing!

I truly didn't think it could be done, I'm so glad to be proven wrong.

I would test, if it's made available

I would test, if it's made available. I can also test out the voices such as eloquence and other default synthesizers to make sure it works properly.

I'd test as well

I'd gladly test this as well. Glad to know this is possible.

great progrss

I am overjoied that Bosh is starting to become a reality, in it's early stages. VoiceOver, bloated as it is, has critical issues, and the video proves that Safari not responding is a problem exclusive to VoiceOver. I am ecstatic, you are doing this badly needed idea of Bosh, and hopefully, when Apple get wind of this project, they will start fixing bugs and they will blatenly realize that opensource options exist, and if they don't fix their bugs, VoiceOver will be abanden by the blind, in favor of Bosh. excellent work! I am exsited to beta test Bosh, and I'm sure, in time, VoiceOver will become substandard, as Bosh keeps growing.

I am a beta tester , and find Apple seem to fix bugs reported. but brake fixes between new betas. so Bosh will eventually be stable and rock solid, like NVDA. let's blow VoiceOver into the abiss!

Thank You!

This is great! VoiceOver has become very unstable and counterintuitive, and having an always-updatable open-source screen reader would be amazing! Are there any plans to implement image recognition features (like those in VOCR) into the screen reader? If so, I would definitely switch from VoiceOver. I am happy to beta-test.

a good start

I'm impressed with what you have done so far, it may have some good chances in future. I also would be willing to help you test stuff if needed.

Any plans for bringing this to IOS as well?

As the subject states ar: their any plans to bring this to IOS as well?

Oof

Bringing it to ios might be tricky. I mean you could olny do this via appstore, and you'd have to pay $99 for the developer account, and there's a high likelyhood that the ios and mac versions would have to be separate, maybe. Or then again, depending on the coding language, it might very well on on both platforms from the start.

would love to collaborate

I’ve been building a suite of frameworks for building a screen reader for macOS in Swift for about 6 years. It’s not particularly feature rich, but it’s sturdy, production quality code I’d love to share https://github.com/rustle/SpeakUp Shoot me an email at doug@getitdownonpaper.com if you’re interested.

I would also be happy to…

I would also be happy to beta test.
Another feature request is language switching and the ability to adjust pitch, rate, and all those important things that go in to making our voices sound the way we want.

@Doug

Hey Doug, thanks for sharing! I was finding it hard to believe that no one else had attempted to make a screen-reader for this platform before, but unfortunately my searches turned out nothing, so I incorrectly assumed that I was the first.

I skimmed through your ScreenReader Swift package, which I might end up using since it looks a lot better than my current code and is something that I was planning on doing in the future myself once I was done building a mental representation of the AX framework, and have a few questions:

Are you pre-caching all elements in a window and then updating them based on notifications or are you querying the AX API as needed?
If you're pre-caching, which is something that I'd like to attempt myself to try alleviating the dreaded busy / not responding problem with VoiceOver, or if you have tried that before, does it slow down screen-reader responsiveness in new windows noticeably?
Are you running AX API requests in a dedicated thread per application, which is something that I'd like to try to prevent a slow responding application from delaying the screen-reader due to the AX API calls blocking execution, or are you just using Swift's structured concurrency?
Would you mind documenting the ScreenReader Swift package to make it easier for actual consumption?

Thanks!

/ unfortunately cannot contribute with code but

If you need sounds for UI, actions, start up, quitting etc, I can give a hand.

@Igna Triay

I'll definitely need some help with audio icons in the future, since that's an area in which I'm severely lacking in both experience and creativity..

You are fulfilling a dream

Really, this is great! I would love to be able to leave Windows and be completely in the Apple ecosystem. I'm a web developer and have never been able to match the productivity I have on Windows on Mac. If it's anything like NVDA, a lot of interesting things could be done!

This sounds fantastic

Really excited to find out where this goes. I'd love an alternative to VoiceOver.

I hadn't thought this was possible on the Mac. I wonder why no one has done this before. Is it because Apple is forever breaking things under the hood? I seem to remember hearing a Freedom Scientific interview saying that they had a version for Mac many years ago but it didn't turn out to be viable and would never do it again. I may have dreamt this!

Thank you so much for sharing what you have done so far and I hope to be able to try it out someday soon!

This sounds really interesting

I actually really like this idea and hopefully in the future this idea can come to light fully. Good luck

JUST AMAZING!

Wow! It's amazing to witness the birth of something this ground-breaking.
Please, please please keep up the brilliant work!
You have my full support!

@João

1. There's not a lot of caching right now but it's something that I considered in the design and should be doable.
2. I have messed with lots of different caching implementations. You can get a decent amount done by building the tree lazily during navigation and caching with rebuilds based on notifications. You'll still need lots of tinkering for more complex UI that changes a lot to work well.
3. It's not a dedicated thread application per se, but to your point of handling unresponsive app, it builds on top of Swift's actor model and could be customized as far as the desired behavior when an API call doesn't return promptly.
4. I'll see what I can do to improve documentation now that someone is actually looking at it. Don't forget to poke around the AX and AccessibilityElement packages that support ScreenReader, there's a lot going on down there.

I am beyond excited for this

A few days ago, out of our frustration, I wrote an email to Tim COok's public eMail address to implore any of Apple's senior team to take ea long hard look at MacOS accessibility and to clarify their commitment by actions, not just words. I received a fairly generic looking response from APple's accessibility team which shows the eMail did get passed around internally at least. But I cannot tell you how happy I am that someone is working on an alternative solution. I really don't want to leave the Mac as a platform as I can't get onboard with modern Windows, and Linux is too far behind on the desktop. I also rely on Logic. It will be a game changer for the Mac to have a screenreader alternative, especially an open source one that can be regularly fixed, improved and updated by its of real users. I've wanted to try myself but as you rightly say APple's documentation is poor to say the least. I'm a dev though my Swift is basic, but I'd be glad to contribute in any way I can. Can't wait ot test this for myself and wish you every luck and success with the project, as well as offering heartfelt gratitude for your work thus far.

This is amazing idea and it has so much potential!

I’ve been using voiceover for years. I know the ins and outs of it, and I’m very used to voiceover.
I’ve used other screen readers such as NVDA and jaws, I’ve always preferred voiceover.
I’d be willing to test and provide feedback. Couple of questions though.
Voiceover has something called positional audio where sounds are played in 3-D where they appear on the screen. I have an idea that maybe you could do this but with Apple spatial audio that is used on AirPods. You could give them more three dimensional representation of where things were on the screen which will be very interesting to see.
Also, I am very used to voice over navigational cues, and was wondering if they would be available with the screen reader? I have a jaw sounds scheme which is based on voiceover sounds, and I was wondering if that would be possible here as well?
I thought this day would never happen, but it has. I’m super stoked for the future of this project.

Spatial Audio, Audio Icons, and Audio Ducking

Regarding audio icons, that's something I want myself, and will be counting on the community to provide those assets in the future.

Regarding spatial audio, I'll have to research that, but I think it's possible and not too hard as it seems to be supported by AVFoundation and there's documentation.

Another audio-related thing that I'll have to research is audio ducking, and this one will be hard since my preliminary search found nothing relevant online so I guess I'll have to dive into CoreAudio's C headers to attempt to figure it out.

I can also provide support, as a sound designer.

I might also be able to provide sound effects. You might even consider adding multiple sound themes/schemes with the option to install them as part of the installer itself or download them later by browsing a menu and previewing samples.

epic stuff

while I'm sort of ok with VO for the moment I am annoyed with a lot of things, which includes the bugs in web where it keeps jumping around, funnyly enough most in apple's own developer website, and in apps such as discord and visual studio code, which I assume it's because of electron which is also web, And most of all, automatic language switching with arabic, which I reported like 4 times and had nothing to see out of it, so I really hope this would fix that

also I would donate a monthly payment if one is available because it would be really amazing if it went well.

Also a developer so might be able to contribute, even if checking C headers of all things makes me freeq out a bit

Linux on MacOS

O.T I know, but have anybody tried installing a Linux Distro for inhansed Accessability?
https://youtu.be/ZFx6R26aRHw?si=l_0CwpRMRqS5fRR4

Another vote of support

You are a valiant developer. I know nothing about coding, but would be happy to donate should the opportunity become available.

Awesome!

I have long awaited the arrival of a community-created and maintained screen reader for MacOS! I agree that Apple's support of their accessibility tools, particularly the relationship between Safari and VoiceOver, has languished and would be excited to see a potentially more user friendly and robust pairing.

I'd love to contribute to this project, both from a programming perspective and with testing feedback.