Hi All,
VOCR v3.0.0-beta.4 is out with an exciting new feature called Computer Use.
You can now ask AI to control apps using mouse and keyboard commands.
Check out the demo where I put VOCR through a series of different UI tasks.
https://github.com/user-attachments/assets/c465d6e8-236c-4a93-980b-ef237f5c87ef
If you already have VOCR installed, you can update it by simply running Check for Updates.
If you are a new user, you can download the latest VOCR here: https://chigkim.github.io/VOCR/
You can read the release notes to learn more about all the changes: https://github.com/chigkim/VOCR/releases/
You will need an API key from OpenAI or Claude to use this feature. In my testing, Gemini and local models are not quite there yet.
It's not perfect, but I was able to perform simple tasks that were not accessible with VoiceOver.
I would love to hear what kinds of things you are able to do with it.
Hope you enjoy!
One more thing: I created a computer use add-on for NVDA users as well. To download, the NVDA addon, go to the releases page and search for Assets. https://github.com/chigkim/NVDAComputerUse/releases
Comments
actually, cool!
This is super awesome! I just watched the demo, and I am thoroughly impressed.
How about Mistral AI?
Hi,
Thanks for sharing the demo. I wonder, can you use an API key from Mistral AI?
Dave
Works With the OpenAI Chat Completions API
You can use it as long as it has an API endpoint compatible with OpenAI Chat Completions. Gemini, Claude, OpenRouter, and even local engines like Ollama and llama.cpp support this.
That said, you need a model that at least matches the quality of GPT-5.4 or Claude Sonnet 4.6. In my testing, even Gemini 3.1 struggled at times.
Very cool!
This is very cool! Thank you for building it!
Just brilliant
Just brilliant, and works perfectly so far. actually a lot faster than computer use directly through Codex
Thank you so much for what you're doing.
This add-on for the app is truly amazing. Thanks to it, I was able to click something in the Pages app to perform a task thatβs impossible to do using only a screen reader. My friend used it to solve a text CAPTCHA and passed an exam with a perfect score using this add-on. Of course, itβs not perfect, because itβs still limited by the capabilities of artificial intelligence itself, but even at this stage, itβs incredibly helpful.
Oh, and my friend also used this add-on to order himself a pizza and choose the right toppings. That wasnβt possible using only a screen reader. It really does help. Itβs no longer just a toy β it can genuinely make a difference.
love it
Wow. Great job on this.