[New Add-on] Vision Assistant Pro: Your Interactive AI Copilot for NVDA (Powered by Gemini)

By mahmood, 2 December, 2025

Forum

Windows

Hello AppleVis community,

I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.

It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.

🌟 Key Features:

👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.

🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.

📥 Download & Installation: You can download the add-on directly from GitHub:

Download Vision Assistant Pro v4.6 (Direct Link)

Just open the downloaded file and confirm the installation in NVDA.

🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro

I developed this to help our community become more independent. I would love to hear your feedback and suggestions.

Best regards, Mahmood

Options

Comments

Getting it to work

So I just installed the addon from the addons store, pasted in the API key and nothing seems to happen. When I tried to perform OCR on a PDF NVDA just says not connected.
I tried creating a new project in Google AI studio, same result.
Any idea on how to get it working?

@Nut

not connected? There is no such message in the plugin! Please check again.

@Stefan

Hello. Bulgarian has been added to the new version, 2.6. Enjoy.

Need help getting API key

I feel like I missed the first week of A.I. class and showed up on quiz day. When I go to Google Studio and hit the get API key button, it goes to an empty list of keys for imported projects. When I then hit the create API key, a modal pops up requiring a name and imported project--a combo box that's empty, with the "create" button disabled. Tried in Chrome and Firefox. Is this expected, and I have to create a project, or are browser extensions getting in the way, or what?
Thanks!

translate

It translates one sentence at a time and I have to press the key combination every time, which is not convenient. Make it like the nvda translator to turn on and off, instead of pressing a key combination for each sentence.

Re: Translate

Have to agree; real-trime translation of this quality is just what the doctor ordered. Question is, is it feasible?

@mantanini

Hello. That's not the case! Unless you are translating in the browser! To do this, place the text in the clipboard and press 'y' instead of 't'.

Congrats, good project!

I would recommend not implementing automatic updates or any update system directly in the add-on, as this is done automatically by the NVDA add-on store, and users can enable or disable automatic updates there.

API key missing

Thanks for this most indispensable add-on, i just used the OCR to describe my full screen and got a very nice description, i tried it again a minute later to get a full screen description. it says API key missing. what's up with that? is there a quota limitation of how much you use the add-on or am i missing something. and by the way i am using the very latest version of this add-on

suggestions

Hi Mahmood! I would like to congratulate you on the excellent add-on. Keep up the good work! I have some suggestions that would make things much easier: the possibility of having a screen to place multiple prompts and then being able to switch with a shortcut key. Sometimes, I want to know what's in a photo, then something in a game, and it would be useful to already have these prompts predefined by the user and just keep switching. Another suggestion would be an option so that when it showed the result, it would just speak, without showing a dialogue or window. It's quite useful to know information quickly without having to switch windows. Thank you!

please read

Dear Users,

Please download and install the latest build of the add-on from the project's GitHub page, which is linked in the original post. It appears that a significant portion of users are still running outdated versions.

Due to time constraints, my ability to visit this thread frequently to respond to comments is limited. For any issues or suggestions, please raise them directly in the Issues section of the project's GitHub repository.

Best Regards,

Mahmood

Post updated

Hello everyone. The download link has been updated to version 3.1.0. Enjoy.

Errore 400 in trascrizione audio di .mp3

Salve e complimenti per questo fantastico Add-On. Segnalo che, quando effettuo una trascrizione audio di un file .mp3, viene notificato l'errore 400, con richiesta di controllare la propria chiave API, che ovviamente è corretta. Ho provato a cambiare l'estensione del file audio ad esempio in .wav e la trascrizione funziona, dunque ritengo dipenda dal modo in cui l'estensione .mp3 viene vista dalla chiave Api, spero possiate risolvere, grazie mille.

@Maurizio

Hello, yes, this issue exists and will be fixed soon in version 3.5.0.

Complimenti enormi e richiesta su descrizione video

Intanto grazie per aver risolto brillantemente l'errore 400 sui files .mp3. Ho inoltre trovato davvero meravigliosa e fantastica la funzione per descrivere i video da Url di YouTube ed Instagram. Sarebbe super se si potesse fare anche per X ed altre piattaforme ed anche, se si potesse fare con i video in locale riprodotti sul proprio PC, so che ci potrebbero essere problemi di Privacy, personalmente, se richiesto, sono disposto a concedere tutte le autorizzazioni necessarie, sarebbe davvero meraviglioso. Infiniti e sinceri complimenti e grazie di cuore!!

@Maurizio

Thank you so much for your kind words and feedback, Maurizio! I'm glad to hear the MP3 fix and video descriptions are working well for you.

Regarding your suggestions, I'll be adding support for X (Twitter) very soon. As for local video support, I am currently thinking about it and evaluating the best way to implement it.

Regarding the privacy concerns you mentioned, I want to reassure you that since this addon is open source, everything is transparent and there is no need to worry at all. Your trust and support mean a lot to me—thank you again!

Description Video TikTok

Hello again! I will never stop thanking you for this amazing Addon! Furthermore, I hope he is well, if I am not mistaken his area is Iran and I know what is happening, I sincerely hope that he is as good as possible and that he is safe.
I would kindly ask if it is possible to implement video descriptions also for the TikTok platform, it would be truly incredible, given the efficiency I found for both YouTube, X and Instagram, although on the latter, I found some Download errors, in practice it seemed not to recognize some Urls. But, in any case, it is a wonderful function, can it also be implemented for TikTok? Thank you very much and all the best, with great respect and sincerity!

@Maurizio

Hello, thank you for checking on me. I am well, but internet access has become very difficult, and it's unclear if I will be able to access it in the future. I will try my best. I have never worked with TikTok before. Please send a link to a video. If I can and my internet is stable, I will implement it.

updated

You can get the latest version from the original post.

updated

Hello everyone. The new version of the addon has been released. Support for TikTok videos has also been added!

v4.0.3

Hello everyone.
The new version has been released.
You can use it.

Awesome add-on

This is fantastic! I'd like to donate, but I don't use the platform that is provided. Maybe consider Paypal like NVDA uses? I also appreciate that this is fully accessible to braille users!

First thoughts and questions

I finally got around to trying this add-on, and so far it's excellent! It took me a minute or 2 to figure out the API key thing, but once I got that, I've tried a few modes, and am impressed with the speed and quality of the responses so far.

A few Questions

1. As a low vision user, I noticed that when you bring up the Settings screen for VA Pro, I can navigate through it via NVDA and keyboard, but nothing shows up on the screen. It would be helpful if this window showed up visually, like the rest of the windows do when you use the add-on. Not sure why nothing shows up for the Settings screen.

2. Can there be an option for a dark theme for the windows that appear for low vision users, or those who are light sensitive?

3. When I scan a file or document, does it do the whole thing, like multiple pages, or just the first visual page like when doing standard NVDA + R?

4. When scanning a video link, is there a rough video length we should limit it to? Just trying to figure out how long it would take to process a video. I fed it probably a waaay longer video than I should have, and got a key error. Understandable...

5. Is there a keyboard shortcut to bring up a menu of the add-on's functions directly, in case we forget a specific letter? I know you can do this via the add-on submenu, but maybe a direct menu command to bring us right there?

I think this is all I can think of for now, but I will definitely be playing with this more. I like it though. I'd also love to see a PayPal option for donations, as almost everyone has PayPal.

about donate

Hello everyone, I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.

@Scott Davert

hi. I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.

@Jesse Anderson

Hello to you, I'm glad the addon has been useful for you.
1. I am blind, but I don't think this section relates to the display, because it uses the default NVDA settings.
2. This also seems to use NVDA settings.
3. The choice is entirely yours. Check it out.
4. I think it's about 10 minutes.
5. You can assign any key to any section in Input Gestures.
Regarding donate, if you would like to support, and if you cannot use crypto, you can purchase an American Apple gift card and send it to me privately on Applevis.

Recognizing input from a scanner.

Hi, wondering if this addon could take input from a scanner OCR it and then drop it in a document or speak it out loud? I'm recommending NVDA to more and more clients and some of them use scanners. Thanks.

update

Hello everyone,

Vision Assistant v4.5 is now officially available! This update brings several significant improvements, including the new Advanced Prompt Manager, full proxy support for all API requests, and an enhanced Document Reader experience.

Download Instructions: Please always use the official download link provided in the original post at the top of this thread to ensure you are getting the latest verified version.

Support the Project: If you find this addon useful and would like to support its continued development, please consider making a donation. Due to international banking restrictions in my region, I am unable to use traditional platforms like PayPal. To address this, I have updated the "Donate" dialog within the addon settings with alternative ways to contribute, including Apple Gift Cards (US region) and Cryptocurrency.

Your support is greatly appreciated and helps keep the project evolving. Thank you all for your feedback and for being part of this journey!

mahmood

I cannot express how much I appreciate this add-on, and yourself for creating it. I have litterally disabled and/or removed at least 5 separate add-ons & applications that had similar functionality, but individually. For example, I had a translation add-on, an OCR add-on, etc.

I installed the 4.5 update from an NVDA add-on notification, and ironically installed it before seeing this post. Anyways keep up the excellent work. This add-on is even outperforming the latest build of Be My Eyes for PC. Which is a shame because BME used to be amazing.

@Brian

Thank you so much for your incredibly kind words! Your feedback truly made my day and is exactly the kind of encouragement that fuels my passion for this project.

It's fantastic to hear that the plugin has been able to consolidate the functionality of multiple separate tools, like translation and OCR, into one seamless experience for you. That was precisely one of the core goals, and knowing it's making such a significant difference in your daily workflow is incredibly rewarding.

And your comparison to Be My Eyes for PC, especially given its previous reputation, is incredibly high praise. This kind of detailed and enthusiastic feedback is invaluable; it not only validates the hard work and countless hours put into development but also inspires me to continue improving and expanding its capabilities.

If you feel inclined to support the ongoing development of this project and my motivation to keep bringing new features and improvements, a donation would be greatly appreciated. Every contribution, no matter how small, helps sustain the effort and passion behind it.

Thank you once again for your wonderful words and for being such an enthusiastic user!

Update Failed

Hello. I just saw there was an update to this add-on this morning, along with an NVDA update. I was able to install the NVDA update, but every time I try updating the Vision Assistant Pro add-on, I get a download failed error. I will keep checking in and retrying the update. Im guessing something simple changed with the NVDA update. Just FYI for now.

re: Update Failed

Hello. When you update an addon through NVDA, there is no difference compared to updating it through the addon itself. The update methods for both NVDA and addons are the same. However, it's strange that you receive an error, whereas you should be getting the message "You have the latest version".

Update Fixed

Thanks for the reply. I did just try the update again, and it went through this time. Thanks again.

update

Hello everyone,
Vision Assistant v4.6 is now officially available!
Download Instructions: Please always use the official download link provided in the original post at the top of this thread to ensure you are getting the latest verified version.
Support the Project: If you find this addon useful and would like to support its continued development, please consider making a donation. Due to international banking restrictions in my region, I am unable to use traditional platforms like PayPal. To address this, I have updated the "Donate" dialog within the addon settings with alternative ways to contribute, including Apple Gift Cards (US region) and Cryptocurrency.
Your support is greatly appreciated and helps keep the project evolving. Thank you all for your feedback and for being part of this journey!

IN-DEPTH DESCRIPTION OF IMAGES INCLUDING TEXT

Hello and again my most sincere, immense congratulations for this Add-On which is becoming more and more indispensable for my daily operations with NVDA. Since for me it is very important to be able to have a complete and detailed description of the images, I would like to suggest the possibility of carrying out this operation: I know that you can use Be my eyes, for example, or cloudvision, which however seems to have had problems, or even Image Describer, but the latter also has problems with the API keys and does not work very well. Thanks to his skill and his undoubted abilities, given the excellent functioning of Vision Assistant Pro, I am sure that he will have no difficulty in implementing this function, so that the images and also the writings contained in them can be described, given that he already does this with the text of the images and the documents, but I would actually like the description of the images themselves to be added, as I described above and it would be wonderful and wonderful if one could then interact with the AI as happens with videos or all the other functions, so as to be able to delve deeper with further questions. For example, I use ImageMagik with the command line to insert my logos in various positions and I have built many scripts for this and, when I have an image, knowing the entire description, I can have the AI tell me in which position I can put my 200x200 png logo, where the space is free in the image in order to make it clearly present, without overlapping it with something or ruining the elements of the image itself. Furthermore, since with its fantastic add-on, you can describe the entire screen, I don't think it will be difficult for you to implement this function. I thank you sincerely and deeply for your wonderful work and I hope you will be able to implement this function, thank you very much again.