Hello applevis community,
I saw this very interesting article in a blind related mailing list and thought it would be a good idea to repost it here. I am just a cs student for now so can't pretend understand any of this, but there are people here much more clever than I am... Wishing you a good read!
https://blog.monotonous.org/2026/01/12/macos-accessibility-with-pyax/
By TheBlindGuy07, 22 January, 2026
Forum
App Development and Programming
Comments
Not much of a breakthrough
I don't like to be a source of negativity, but what that does is exactly what the Element module of Vosh does and has been doing for years, nothing more and nothing less. The only difference is that Vosh is 100% Swift and thus its Element module provides a pure async and safe Swift interface, whereas that's a Python API. In both cases Apple's public and completely neglected accessibility consumer interface is used, which from my observations a few days ago, Apple may actually be using an interface likely from a private framework called CoreAccessibility. Chrome doesn't suffer from this problem because it runs on Blink, which is a fork of WebKit, so it benefits from Apple's accessibility implementation.
Powerful tool
Sure, the Element module provides similar abstraction, but the handy thing here seems to be the CLI which makes this a powerful tool for deep dives into what is going on under the hood.
Nevertheless, I don't understand why thi has been implemented in Python instead of making it a Swift library and then a client around that.
Key combinations
I did actually provide key combinations to do some of this stuff in Vosh for debugging purposes. One of them just dumps the system-wide element, another dumps the whole element hierarchy of an application, and the last one dumps the hierarchy of the element currently in focus. The last dump also includes all of the parents of the focused element, but not any siblings of the focused element or any of its parents.
In my last job, a coworker built a proper user interface around the aforementioned Element module but mostly aiming at the sighted. Essentially you could simply point the mouse at anything on the screen and press a global key combination to get all the relevant accessibility information captured.