[New Add-on] Vision Assistant Pro: Your Interactive AI Copilot for NVDA (Powered by Gemini)

By mahmood, 2 December, 2025

Forum
Windows

Hello AppleVis community,

I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.

It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.

🌟 Key Features:

  • 👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."

    • Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
    • Full Screen Vision: Scan the entire screen layout.
    • The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
  • 🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.

  • 🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.

  • 🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.

  • 📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.

🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.

📥 Download & Installation: You can download the add-on directly from GitHub:

Download Vision Assistant Pro v2.0 (Direct Link)

Just open the downloaded file and confirm the installation in NVDA.

🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro

I developed this to help our community become more independent. I would love to hear your feedback and suggestions.

Best regards, Mahmood

Options

Comments

By Umut KORKMAZ on Tuesday, December 2, 2025 - 12:17

First of all, thank you. The gestures you have defined may conflict with the gestures of other add-ons. Example: NVDA+Shift+T=Instant Translate. Therefore: The ability to change add-on actions from within Input Gestures could be added. The add-on should be translatable into other languages. I would like to translate it into Turkish. Thank you again.

By Ashley on Tuesday, December 2, 2025 - 12:34

looks great, I will have to give this a try. One suggestion would be 'agent mode', where you allow the AI to click and interact in windows on your behalf according to your instructions.

By mahmood on Tuesday, December 2, 2025 - 12:38

Hello, you're welcome. Thank you for your kind suggestion. Both will definitely be added in the next version, which will be released soon.

By mahmood on Tuesday, December 2, 2025 - 12:39

Hello, yes I had this idea, but because a window is open for user instructions, the click goes elsewhere! I will try again. I hope others will also join this open-source project.

By mahmood on Tuesday, December 2, 2025 - 12:58

Hello community,

I am happy to announce that Vision Assistant Pro version 1.5 is now available! Based on your feedback, this update focuses heavily on stability and making the AI interactions more fluid.

✨ What's New in v1.5:

  • 💬 Interactive Refine Dialog: When you use the "Refine Text" menu (to Explain, Summarize, or Fix Grammar), the result now opens in a chat window. This means you are no longer limited to a static result; you can ask follow-up questions about the selected text directly!

  • 🌍 Huge Language Expansion: Added support for over 20 new languages, including Turkish, Czech, Danish, Finnish, Greek, Hebrew, Korean, Thai, and more.

  • ⌨️ Input Gestures Support: You can now customize all keyboard shortcuts. Just go to NVDA Menu > Preferences > Input Gestures and look for the "Vision Assistant" category.

🛠️ Critical Fixes: * Stability: Fixed COMError crashes that occurred when selecting text in specific apps like Firefox and Microsoft Word. * Connection: Added automatic retries for server errors (HTTP 503) to make the add-on more reliable during busy times.

📥 Download Update: You can download the new version directly from GitHub: Download Vision Assistant Pro v1.5

Simply install the new file over the old one and restart NVDA.

As always, I appreciate your feedback and bug reports!

Best, Mahmood

By mahmood on Tuesday, December 2, 2025 - 14:03

Done. You can download and use the new version.

By Umut KORKMAZ on Tuesday, December 2, 2025 - 14:32

The new version is nice. Thanks.

By mahmood on Tuesday, December 2, 2025 - 18:09

You're welcome

By britechguy on Tuesday, December 2, 2025 - 20:49

This add-on sounds very, very useful indeed, yet I have seen not one word of discussion about it on the NVDA Add-On Developer's group in advance of its release nor on the NVDA Screen Reader Group on Groups.io or the NVDA Screen Reader Discussion on Google Groups. All three of these groups are "your target demographic" and it would make a lot more sense to announce the debut of it on the two NVDA user groups, in particular. So would getting it into the NVDA Add-On Store.

By Umut KORKMAZ on Tuesday, December 2, 2025 - 20:54

I also subscribe to NVDA groups. Since e-mails come from many groups, sometimes some messages may be overlooked.

By mahmood on Tuesday, December 2, 2025 - 21:18

Hello. Yes, you are correct. I am not a member of any of these groups because I mostly program for myself. I have registered the add-on in the NVDA add-on store, and it will likely be released soon. The reason I published it here was that I use iOS, and I have been active on this site. I saw that this site also has a section related to Windows, so I decided to publish it here in addition to the NVDA add-on store.

By Guilherme on Tuesday, December 2, 2025 - 21:32

Hello, I haven’t had time to test it yet, but I found the idea of the add-on very good. Let me just ask you a question: is the Brazilian Portuguese language available in this addon?

By Brandon X on Tuesday, December 2, 2025 - 22:47

i’m not quite sure if I’m doing something wrong to be able to use the smart dictation feature? Every time I try to use it, it says it’s listening, but then it says no speech recognize when trying to type what I am trying to dictate. I believe I set up everything correctly within the settings. Any help would be very helpful! Thank you so far for creating a pretty cool add-on!

By britechguy on Wednesday, December 3, 2025 - 00:47

You really need to market your work to the demographic it's meant to serve, and in venues where the major pools of users are. The only reason I brought this up is because an Apple-centric site is not one where the majority of NVDA users will be looking for NVDA news.

Please do take the time to join the previously noted groups and let people know about your work. It will almost certainly be embraced with open arms, and far more quickly, if you do so.

By mahmood on Wednesday, December 3, 2025 - 06:08

Hello, yes Portuguese is on the list. Test it and let us know if you have any problems.

By mahmood on Wednesday, December 3, 2025 - 06:09

I checked and there is no problem! Maybe the problem is with the microphone settings in Windows!

By mahmood on Wednesday, December 3, 2025 - 06:11

Yes, you are right. I also published the add-on in the NVDA add-on store but I am not very comfortable with groups. Thanks again.

By mahmood on Wednesday, December 3, 2025 - 06:12

You can introduce the add-on in groups if you like. The add-on is open source, and everyone can see its source and get it from GitHub or soon from the NVDA add-on store.

By mahmood on Wednesday, December 3, 2025 - 07:52

Hello community,

I am thrilled to announce the release of Vision Assistant Pro version 2.0. This is our biggest update yet, transforming the add-on from a simple tool into a smart, learning assistant.

Based on your requests, I have added an auto-updater so you never miss future improvements, and a "Context Memory" feature that makes chatting with the AI feel natural.

🚀 Top New Features:

  • 🔄 Built-in Auto-Updater: No need to manually check GitHub anymore! The add-on now automatically checks for updates. You can also force a check anytime by pressing NVDA+Shift+U.

  • 🧠 Conversation Memory: The AI now remembers what you are talking about.

    • Example: You translate a text. You can then simply ask "Summarize it" or "Extract dates" without re-selecting the text. The context is preserved in "Refine Text", "Vision", and "Document QA" dialogs.
  • ⚡ Smart Translation Cache: Translations are now instant! If you translate a sentence you've seen before, the add-on retrieves it from memory instantly without using your API quota or internet connection.

  • 📋 Clipboard Translation: New shortcut: NVDA+Shift+Y. Useful for web browsers or apps where selecting text is difficult. Just copy content and press the shortcut to translate immediately.

🛠️ Improvements & Fixes: * Strict Language Output: I have optimized the prompts to ensure the AI speaks ONLY in your target language (no more accidental English intros like "Here is the text..."). * Stability: Fixed a crash bug caused by special characters (like braces {}).

📥 How to Update: Since this version introduces the Auto-Updater, you need to install this update manually one last time. Future updates will be automatic!

Download Vision Assistant Pro v2.0

Enjoy the faster and smarter experience!

Best regards, Mahmood

By Fettah Pınar on Wednesday, December 3, 2025 - 14:34

Hello.
Thank you so much for sharing such an amazing, excellent plugin with us and for developing such a beautiful plugin.
I have a suggestion for you regarding the plugin.
Could you add a text-to-speech feature to the plugin using Gemini TTS?
Just like in the Native Speech plugin, we should be able to enter text into a text box or select any file, choose the Gemini model and the voice we want to use or select voices in multi-speaker mode, save voice files in MP3 or other formats, and configure voice settings.
Is it possible to add Gemini TTS integration to the plugin?
Best regards.

By Simone Dal Maso on Wednesday, December 3, 2025 - 14:37

Hi,
Many compliments on this add-on; it’s really well done!
I know very well that a developer can’t possibly keep track of every keyboard shortcut used by every add-on. It’s simply not feasible. Having worked on an add-on myself in the past (Nao), I know you do your best, but you can’t perform miracles.
That said, this point is quite important: you have three commands that conflict with NVDA’s laptop keyboard layout, and one that even conflicts in both layouts. I kindly ask you to change them. Here are the ones involved:
nvda+shift+d: this is the audio ducking toggle. It’s extremely useful, and your add-on overrides a core NVDA command in all layouts.
nvda+shift+o: reads the navigator object. Used in the laptop layout.
nvda+shift+a: Say All with the review cursor.
nvda+shift+s: reads the currently selected text in the control or document.
I hope you won’t take this as criticism; you’re most likely using the desktop layout and didn’t notice the conflicts. I understand you chose very convenient shortcuts, but add-ons really shouldn’t override NVDA’s core functions.
Thanks again, and take care.

By mahmood on Wednesday, December 3, 2025 - 14:40

Hello. You're welcome. I hope this NVDA add-on is useful for you and all visually impaired individuals. Yes. I will try to include it in the next version so that you can open the text-to-speech section with a shortcut key and convert your text to speech.

By mahmood on Wednesday, December 3, 2025 - 14:45

hi. "Yes, you are absolutely right. I am using the desktop layout. Do you have any key suggestions for this layout? I'm concerned about choosing other keys and potentially conflicting with another layout, like the laptop one.

By mahmood on Wednesday, December 3, 2025 - 14:52

Is it good to use Windows instead of the Shift key?

By Brian on Wednesday, December 3, 2025 - 14:57

For anyone having issues with hotkey conflicts, there is a beautiful little add-on, on the NVDA add-on store, called "check input gestures". What this does is check all of your add-ons and compares their associated hotkeys , to the default hotkeys of NVDA. If there are any conflicts, a little dialogue will pop up, allowing you to Press enter on any of the hotkeys from the list, which takes you immediately to "input gestures" so that you can reassign them accordingly.

It is efficient, and very handy.

HTH.

By Fettah Pınar on Wednesday, December 3, 2025 - 14:58

Hello again.
Thank you very much for your reply.
I hope this feature arrives as soon as possible.
By the way, I have some comments about the add-on.
I couldn't find some add-on shortcuts in the input gestures, for example, I want to change the shortcut for recognizing the current navigator object, but this shortcut doesn't appear in the input gestures.
Similarly, the shortcuts for some features don't appear in the input gestures.
Another question: will the option to use Gemini 3.0 Pro and other Pro models (alongside Flash models) be added?

By mahmood on Wednesday, December 3, 2025 - 15:00

Should the keys be changed? Considering that the Input gestures in NVDA are changeable?

By Fettah Pınar on Wednesday, December 3, 2025 - 15:06

Yes, I want to change some keys.
For example, I want to change the shortcut for the current browser object, but I can't.

By mahmood on Wednesday, December 3, 2025 - 15:09

The reason other models were not used is that other models have a cost, and those who do not have an API connected to a billing account may encounter errors and become confused.

By mahmood on Wednesday, December 3, 2025 - 15:10

Missing options will be added soon

By Fettah Pınar on Wednesday, December 3, 2025 - 15:10

The following keyboard shortcuts do not appear in the Input Actions dialog box and cannot be changed.
NVDA + Shift + 6
NVDA + Shift + V
NVDA + Shift + O
NVDA + Shift + A

By mahmood on Wednesday, December 3, 2025 - 15:30

Hello! I checked and only the insert shift v key is missing, which I will add. The rest of the keys are registered.

By mahmood on Wednesday, December 3, 2025 - 15:34

Soon, the Shift key will be added to the shortcut keys to resolve the conflict issue. For example: Ctrl + Shift + Insert + T

By Fettah Pınar on Wednesday, December 3, 2025 - 15:41

Hello again.
The interface translations have not been included in version 2.0; the plugin can only be used in English.

By mahmood on Wednesday, December 3, 2025 - 16:41

Hello. Yes, I know. This is an open-source project and I cannot do much alone. I hope other friends will join the project soon and make it more complete.

By mahmood on Wednesday, December 3, 2025 - 17:11

Hello. I checked and Native Speech works well in NVDA. So I don't think there's a need to add TTS.

By mahmood on Wednesday, December 3, 2025 - 20:51

Hi everyone,
I released a quick update (v2.1) to fix keyboard conflicts, especially for those using the NVDA Laptop layout.
Change:
All shortcuts now use NVDA + Control + Shift + [Key].
For example, Smart Translation is now NVDA+Control+Shift+T.
You can download the update from GitHub as usual.
Regards,
Mahmood

By Guilherme on Thursday, December 4, 2025 - 08:50

Hello, congratulations on the add-on, it’s very good! I would like to request something if possible: at my job I often receive photos with text, and many times I only need the text and not the descriptions. Would it be possible for you to add a function to the add-on that performs OCR on the screen and returns only the text? It would help me a lot.

By mahmood on Thursday, December 4, 2025 - 08:56

Hello. You can use custom variables. Please read the add-on guide.

By jay on Thursday, December 4, 2025 - 14:00

Hello,

I've tried this add-on, and I have to say it's very good. However, I'm curious to know when transscribing a audio file, where does it go? When I tried doing this, I get "Uploading... but then nothing indicating the progress of the file being uploaded, or the results of the transscription. So, I guess I'm a little confused as to where these things are located. Any help would be greatly appreciated.

By mahmood on Thursday, December 4, 2025 - 14:05

Hello. If successful, a box will appear where you can both view the text of the audio file and chat about the audio file. Sometimes you need to use Alt+Tab to focus on the window.

By Guilherme on Thursday, December 4, 2025 - 15:01

I read the add-on guide about custom variables and didn’t understand how to use them. Give me a practical example of how and where to use them, please: let’s suppose I want to take a screenshot and have it return only the text that is written on the screen — how can I do that?

By mahmood on Thursday, December 4, 2025 - 15:41

For example, you can use this command in the custom prompt section of the add-on settings.
My OCR:[file_ocr]

By Guilherme on Thursday, December 4, 2025 - 15:53

Right, but then, which command do I use to activate this prompt?

By mahmood on Thursday, December 4, 2025 - 16:30

Press Ctrl+Shift+Insert+R. Please read the document.

By Fettah Pınar on Thursday, December 4, 2025 - 16:31

Hello.
Yes, Native Speech works well, but since all Gemini features are combined in a single plugin, it would be nice to use the text-to-speech feature through the same plugin instead of using another plugin.
So, I suggest including this functionality in this plugin without having to install the Native Speech plugin, since the Native Speech plugin hasn't been updated in a long time.

By mahmood on Friday, December 5, 2025 - 07:42

Yes. If you are using a VPN to communicate with Gemini, you must also use a VPN to use this add-on. A proxy option has also been included in the settings, which requires you to have a server that forwards your requests to Gemini, eliminating the need for a VPN.

By mahmood on Friday, December 5, 2025 - 13:06

Hello everyone,

I am happy to announce the release of Vision Assistant Pro version 2.5!

I have tried my best to implement many of your suggestions in this update, most notably standardizing the keyboard shortcuts (switching from NVDA+Shift to NVDA+Control+Shift) to prevent conflicts with system commands and the Laptop layout.

A note regarding support and future updates: With this release, the add-on is now stable and robust, covering all the core functionalities intended for daily use. There is no immediate need for major changes.

As a one-person team, I work on this project and address requests strictly based on my available free time. Since I cannot reply to every individual comment here on the forum, please follow these guidelines:

  1. Bug Reports & Features: Please submit them directly on the GitHub Issues page. This ensures they are tracked and not lost in the forum threads.
  2. Open Source: This is an open-source project. I warmly invite other developers in the community to contribute code, fix bugs, or add features.

Thank you for your understanding and your continued support!

Best regards, Mahmood

By Stefan on Friday, December 5, 2025 - 13:13

Hi, this addon is very useful and big thanks for the good work. Can you add bulgarian language for the translations, AI responses and audio transcriptions? It will allow other bulgarian blind users to use this addon on our native language.