New Computer Use Feature in VOCR v3.0.0-beta.4

By Chi Kim, 9 May, 2026

Forum
macOS and Mac Apps

Hi All,

VOCR v3.0.0-beta.4 is out with an exciting new feature called Computer Use.
You can now ask AI to control apps using mouse and keyboard commands.
Check out the demo where I put VOCR through a series of different UI tasks.
https://github.com/user-attachments/assets/c465d6e8-236c-4a93-980b-ef237f5c87ef
If you already have VOCR installed, you can update it by simply running Check for Updates.
If you are a new user, you can download the latest VOCR here: https://chigkim.github.io/VOCR/
You can read the release notes to learn more about all the changes: https://github.com/chigkim/VOCR/releases/
You will need an API key from OpenAI or Claude to use this feature. In my testing, Gemini and local models are not quite there yet.
It's not perfect, but I was able to perform simple tasks that were not accessible with VoiceOver.
I would love to hear what kinds of things you are able to do with it.
Hope you enjoy!

One more thing: I created a computer use add-on for NVDA users as well. To download, the NVDA addon, go to the releases page and search for Assets. https://github.com/chigkim/NVDAComputerUse/releases

Options

Comments

By Zach M on Saturday, May 9, 2026 - 22:05

This is super awesome! I just watched the demo, and I am thoroughly impressed.

By Dave Nason on Sunday, May 10, 2026 - 10:24

Member of the AppleVis Editorial Team

Hi,
Thanks for sharing the demo. I wonder, can you use an API key from Mistral AI?
Dave

By Chi Kim on Sunday, May 10, 2026 - 15:36

You can use it as long as it has an API endpoint compatible with OpenAI Chat Completions. Gemini, Claude, OpenRouter, and even local engines like Ollama and llama.cpp support this.
That said, you need a model that at least matches the quality of GPT-5.4 or Claude Sonnet 4.6. In my testing, even Gemini 3.1 struggled at times.

By Mlth on Sunday, May 10, 2026 - 16:21

This is very cool! Thank you for building it!

By Ashley on Sunday, May 10, 2026 - 18:46

Just brilliant, and works perfectly so far. actually a lot faster than computer use directly through Codex

By Ines on Saturday, May 16, 2026 - 12:15

This add-on for the app is truly amazing. Thanks to it, I was able to click something in the Pages app to perform a task that’s impossible to do using only a screen reader. My friend used it to solve a text CAPTCHA and passed an exam with a perfect score using this add-on. Of course, it’s not perfect, because it’s still limited by the capabilities of artificial intelligence itself, but even at this stage, it’s incredibly helpful.
Oh, and my friend also used this add-on to order himself a pizza and choose the right toppings. That wasn’t possible using only a screen reader. It really does help. It’s no longer just a toy β€” it can genuinely make a difference.

By Matthew Whitaker on Sunday, May 17, 2026 - 02:18

Wow. Great job on this.