[New Add-on] Vision Assistant Pro: Your Interactive AI Copilot for NVDA (Powered by Gemini)

By mahmood, 2 December, 2025

Forum

Windows

Hello AppleVis community,

I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.

It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.

🌟 Key Features:

👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.

🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.

📥 Download & Installation: You can download the add-on directly from GitHub:

Download Vision Assistant Pro v6.5.1 (Direct Link)

Just open the downloaded file and confirm the installation in NVDA.

🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro

I developed this to help our community become more independent. I would love to hear your feedback and suggestions.

Best regards, Mahmood

Options

Comments

Some Suggestions.

First of all, thank you. The gestures you have defined may conflict with the gestures of other add-ons. Example: NVDA+Shift+T=Instant Translate. Therefore: The ability to change add-on actions from within Input Gestures could be added. The add-on should be translatable into other languages. I would like to translate it into Turkish. Thank you again.

brilliant work

looks great, I will have to give this a try. One suggestion would be 'agent mode', where you allow the AI to click and interact in windows on your behalf according to your instructions.

@Umut KORKMAZ

Hello, you're welcome. Thank you for your kind suggestion. Both will definitely be added in the next version, which will be released soon.

@Ashley

Hello, yes I had this idea, but because a window is open for user instructions, the click goes elsewhere! I will try again. I hope others will also join this open-source project.

Update v1.5 Released: Interactive Refine, 20+ New Languages, and

Hello community,

I am happy to announce that Vision Assistant Pro version 1.5 is now available! Based on your feedback, this update focuses heavily on stability and making the AI interactions more fluid.

✨ What's New in v1.5:

💬 Interactive Refine Dialog: When you use the "Refine Text" menu (to Explain, Summarize, or Fix Grammar), the result now opens in a chat window. This means you are no longer limited to a static result; you can ask follow-up questions about the selected text directly!
🌍 Huge Language Expansion: Added support for over 20 new languages, including Turkish, Czech, Danish, Finnish, Greek, Hebrew, Korean, Thai, and more.
⌨️ Input Gestures Support: You can now customize all keyboard shortcuts. Just go to NVDA Menu > Preferences > Input Gestures and look for the "Vision Assistant" category.

🛠️ Critical Fixes: * Stability: Fixed COMError crashes that occurred when selecting text in specific apps like Firefox and Microsoft Word. * Connection: Added automatic retries for server errors (HTTP 503) to make the add-on more reliable during busy times.

📥 Download Update: You can download the new version directly from GitHub: Download Vision Assistant Pro v1.5

Simply install the new file over the old one and restart NVDA.

As always, I appreciate your feedback and bug reports!

Best, Mahmood

@Umut KORKMAZ

Done. You can download and use the new version.

Thanks.

The new version is nice. Thanks.

@Umut KORKMAZ

You're welcome

Why here and not on any of the NVDA Groups?

This add-on sounds very, very useful indeed, yet I have seen not one word of discussion about it on the NVDA Add-On Developer's group in advance of its release nor on the NVDA Screen Reader Group on Groups.io or the NVDA Screen Reader Discussion on Google Groups. All three of these groups are "your target demographic" and it would make a lot more sense to announce the debut of it on the two NVDA user groups, in particular. So would getting it into the NVDA Add-On Store.

I also subscribe to NVDA Groups. :)

I also subscribe to NVDA groups. Since e-mails come from many groups, sometimes some messages may be overlooked.

@britechguy

Hello. Yes, you are correct. I am not a member of any of these groups because I mostly program for myself. I have registered the add-on in the NVDA add-on store, and it will likely be released soon. The reason I published it here was that I use iOS, and I have been active on this site. I saw that this site also has a section related to Windows, so I decided to publish it here in addition to the NVDA add-on store.

Brazilian Portuguese language

Hello, I haven’t had time to test it yet, but I found the idea of the add-on very good. Let me just ask you a question: is the Brazilian Portuguese language available in this addon?

Regarding the smart dictation?

i’m not quite sure if I’m doing something wrong to be able to use the smart dictation feature? Every time I try to use it, it says it’s listening, but then it says no speech recognize when trying to type what I am trying to dictate. I believe I set up everything correctly within the settings. Any help would be very helpful! Thank you so far for creating a pretty cool add-on!

@mahmood

You really need to market your work to the demographic it's meant to serve, and in venues where the major pools of users are. The only reason I brought this up is because an Apple-centric site is not one where the majority of NVDA users will be looking for NVDA news.

Please do take the time to join the previously noted groups and let people know about your work. It will almost certainly be embraced with open arms, and far more quickly, if you do so.

@Guilherme

Hello, yes Portuguese is on the list. Test it and let us know if you have any problems.

@Brandon X

I checked and there is no problem! Maybe the problem is with the microphone settings in Windows!

@britechguy

Yes, you are right. I also published the add-on in the NVDA add-on store but I am not very comfortable with groups. Thanks again.

@britechguy

You can introduce the add-on in groups if you like. The add-on is open source, and everyone can see its source and get it from GitHub or soon from the NVDA add-on store.

Major Update v2.0 Released: Auto-Updater, Conversation Memory, a

Hello community,

I am thrilled to announce the release of Vision Assistant Pro version 2.0. This is our biggest update yet, transforming the add-on from a simple tool into a smart, learning assistant.

Based on your requests, I have added an auto-updater so you never miss future improvements, and a "Context Memory" feature that makes chatting with the AI feel natural.

🚀 Top New Features:

🔄 Built-in Auto-Updater: No need to manually check GitHub anymore! The add-on now automatically checks for updates. You can also force a check anytime by pressing NVDA+Shift+U.
🧠 Conversation Memory: The AI now remembers what you are talking about.
- Example: You translate a text. You can then simply ask "Summarize it" or "Extract dates" without re-selecting the text. The context is preserved in "Refine Text", "Vision", and "Document QA" dialogs.
⚡ Smart Translation Cache: Translations are now instant! If you translate a sentence you've seen before, the add-on retrieves it from memory instantly without using your API quota or internet connection.
📋 Clipboard Translation: New shortcut: NVDA+Shift+Y. Useful for web browsers or apps where selecting text is difficult. Just copy content and press the shortcut to translate immediately.

🛠️ Improvements & Fixes: * Strict Language Output: I have optimized the prompts to ensure the AI speaks ONLY in your target language (no more accidental English intros like "Here is the text..."). * Stability: Fixed a crash bug caused by special characters (like braces {}).

📥 How to Update: Since this version introduces the Auto-Updater, you need to install this update manually one last time. Future updates will be automatic!

Download Vision Assistant Pro v2.0

Enjoy the faster and smarter experience!

Hello, congratulations on the add-on, it’s very good! I would like to request something if possible: at my job I often receive photos with text, and many times I only need the text and not the descriptions. Would it be possible for you to add a function to the add-on that performs OCR on the screen and returns only the text? It would help me a lot.

@Guilherme

Hello. You can use custom variables. Please read the add-on guide.

Where do the audio transscriptions go after being done?

Hello,

I've tried this add-on, and I have to say it's very good. However, I'm curious to know when transscribing a audio file, where does it go? When I tried doing this, I get "Uploading... but then nothing indicating the progress of the file being uploaded, or the results of the transscription. So, I guess I'm a little confused as to where these things are located. Any help would be greatly appreciated.

@jay

Hello. If successful, a box will appear where you can both view the text of the audio file and chat about the audio file. Sometimes you need to use Alt+Tab to focus on the window.

about customized prompts

I read the add-on guide about custom variables and didn’t understand how to use them. Give me a practical example of how and where to use them, please: let’s suppose I want to take a screenshot and have it return only the text that is written on the screen — how can I do that?

@Guilherme

For example, you can use this command in the custom prompt section of the add-on settings.
My OCR:[file_ocr]

still about custom prompts

Right, but then, which command do I use to activate this prompt?

@Guilherme

Press Ctrl+Shift+Insert+R. Please read the document.

gemini tts

Hello.
Yes, Native Speech works well, but since all Gemini features are combined in a single plugin, it would be nice to use the text-to-speech feature through the same plugin instead of using another plugin.
So, I suggest including this functionality in this plugin without having to install the Native Speech plugin, since the Native Speech plugin hasn't been updated in a long time.

did I need VPN to connect to google gemini?

I used to use gemini via VPN.
should I also contact with VPn for this addon?

@ming

Yes. If you are using a VPN to communicate with Gemini, you must also use a VPN to use this add-on. A proxy option has also been included in the settings, which requires you to have a server that forwards your requests to Gemini, eliminating the need for a VPN.

Version 2.5 Released: New Shortcuts, Stability, and a Note on Su

Hello everyone,

I am happy to announce the release of Vision Assistant Pro version 2.5!

I have tried my best to implement many of your suggestions in this update, most notably standardizing the keyboard shortcuts (switching from NVDA+Shift to NVDA+Control+Shift) to prevent conflicts with system commands and the Laptop layout.

A note regarding support and future updates: With this release, the add-on is now stable and robust, covering all the core functionalities intended for daily use. There is no immediate need for major changes.

As a one-person team, I work on this project and address requests strictly based on my available free time. Since I cannot reply to every individual comment here on the forum, please follow these guidelines:

Bug Reports & Features: Please submit them directly on the GitHub Issues page. This ensures they are tracked and not lost in the forum threads.
Open Source: This is an open-source project. I warmly invite other developers in the community to contribute code, fix bugs, or add features.

Thank you for your understanding and your continued support!

Best regards, Mahmood

Bulgarian language

Hi, this addon is very useful and big thanks for the good work. Can you add bulgarian language for the translations, AI responses and audio transcriptions? It will allow other bulgarian blind users to use this addon on our native language.