Hello AppleVis community,
I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.
It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.
🌟 Key Features:
👁️ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
🧠 Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
🎙️ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
🔓 CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
📄 Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.
🛠️ Requirement: You need a free Google Gemini API Key to run this add-on.
📥 Download & Installation: You can download the add-on directly from GitHub:
Download Vision Assistant Pro v2.0 (Direct Link)
Just open the downloaded file and confirm the installation in NVDA.
🔗 Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro
I developed this to help our community become more independent. I would love to hear your feedback and suggestions.
Best regards, Mahmood
Comments
Some Suggestions.
First of all, thank you. The gestures you have defined may conflict with the gestures of other add-ons. Example: NVDA+Shift+T=Instant Translate. Therefore: The ability to change add-on actions from within Input Gestures could be added. The add-on should be translatable into other languages. I would like to translate it into Turkish. Thank you again.
brilliant work
looks great, I will have to give this a try. One suggestion would be 'agent mode', where you allow the AI to click and interact in windows on your behalf according to your instructions.
@Umut KORKMAZ
Hello, you're welcome. Thank you for your kind suggestion. Both will definitely be added in the next version, which will be released soon.
@Ashley
Hello, yes I had this idea, but because a window is open for user instructions, the click goes elsewhere! I will try again. I hope others will also join this open-source project.
Update v1.5 Released: Interactive Refine, 20+ New Languages, and
Hello community,
I am happy to announce that Vision Assistant Pro version 1.5 is now available! Based on your feedback, this update focuses heavily on stability and making the AI interactions more fluid.
✨ What's New in v1.5:
💬 Interactive Refine Dialog: When you use the "Refine Text" menu (to Explain, Summarize, or Fix Grammar), the result now opens in a chat window. This means you are no longer limited to a static result; you can ask follow-up questions about the selected text directly!
🌍 Huge Language Expansion: Added support for over 20 new languages, including Turkish, Czech, Danish, Finnish, Greek, Hebrew, Korean, Thai, and more.
⌨️ Input Gestures Support: You can now customize all keyboard shortcuts. Just go to NVDA Menu > Preferences > Input Gestures and look for the "Vision Assistant" category.
🛠️ Critical Fixes: * Stability: Fixed
COMErrorcrashes that occurred when selecting text in specific apps like Firefox and Microsoft Word. * Connection: Added automatic retries for server errors (HTTP 503) to make the add-on more reliable during busy times.📥 Download Update: You can download the new version directly from GitHub: Download Vision Assistant Pro v1.5
Simply install the new file over the old one and restart NVDA.
As always, I appreciate your feedback and bug reports!
Best, Mahmood
@Umut KORKMAZ
Done. You can download and use the new version.
Thanks.
The new version is nice. Thanks.
@Umut KORKMAZ
You're welcome
Why here and not on any of the NVDA Groups?
I also subscribe to NVDA Groups. :)
I also subscribe to NVDA groups. Since e-mails come from many groups, sometimes some messages may be overlooked.
@britechguy
Hello. Yes, you are correct. I am not a member of any of these groups because I mostly program for myself. I have registered the add-on in the NVDA add-on store, and it will likely be released soon. The reason I published it here was that I use iOS, and I have been active on this site. I saw that this site also has a section related to Windows, so I decided to publish it here in addition to the NVDA add-on store.
Brazilian Portuguese language
Hello, I haven’t had time to test it yet, but I found the idea of the add-on very good. Let me just ask you a question: is the Brazilian Portuguese language available in this addon?
Regarding the smart dictation?
i’m not quite sure if I’m doing something wrong to be able to use the smart dictation feature? Every time I try to use it, it says it’s listening, but then it says no speech recognize when trying to type what I am trying to dictate. I believe I set up everything correctly within the settings. Any help would be very helpful! Thank you so far for creating a pretty cool add-on!
@mahmood
You really need to market your work to the demographic it's meant to serve, and in venues where the major pools of users are. The only reason I brought this up is because an Apple-centric site is not one where the majority of NVDA users will be looking for NVDA news.
Please do take the time to join the previously noted groups and let people know about your work. It will almost certainly be embraced with open arms, and far more quickly, if you do so.
@Guilherme
Hello, yes Portuguese is on the list. Test it and let us know if you have any problems.
@Brandon X
I checked and there is no problem! Maybe the problem is with the microphone settings in Windows!
@britechguy
Yes, you are right. I also published the add-on in the NVDA add-on store but I am not very comfortable with groups. Thanks again.
@britechguy
You can introduce the add-on in groups if you like. The add-on is open source, and everyone can see its source and get it from GitHub or soon from the NVDA add-on store.
Major Update v2.0 Released: Auto-Updater, Conversation Memory, a
Hello community,
I am thrilled to announce the release of Vision Assistant Pro version 2.0. This is our biggest update yet, transforming the add-on from a simple tool into a smart, learning assistant.
Based on your requests, I have added an auto-updater so you never miss future improvements, and a "Context Memory" feature that makes chatting with the AI feel natural.
🚀 Top New Features:
🔄 Built-in Auto-Updater: No need to manually check GitHub anymore! The add-on now automatically checks for updates. You can also force a check anytime by pressing
NVDA+Shift+U.🧠 Conversation Memory: The AI now remembers what you are talking about.
⚡ Smart Translation Cache: Translations are now instant! If you translate a sentence you've seen before, the add-on retrieves it from memory instantly without using your API quota or internet connection.
📋 Clipboard Translation: New shortcut:
NVDA+Shift+Y. Useful for web browsers or apps where selecting text is difficult. Just copy content and press the shortcut to translate immediately.🛠️ Improvements & Fixes: * Strict Language Output: I have optimized the prompts to ensure the AI speaks ONLY in your target language (no more accidental English intros like "Here is the text..."). * Stability: Fixed a crash bug caused by special characters (like braces
{}).📥 How to Update: Since this version introduces the Auto-Updater, you need to install this update manually one last time. Future updates will be automatic!
Download Vision Assistant Pro v2.0
Enjoy the faster and smarter experience!
Best regards, Mahmood
gemini tts
Hello.
Thank you so much for sharing such an amazing, excellent plugin with us and for developing such a beautiful plugin.
I have a suggestion for you regarding the plugin.
Could you add a text-to-speech feature to the plugin using Gemini TTS?
Just like in the Native Speech plugin, we should be able to enter text into a text box or select any file, choose the Gemini model and the voice we want to use or select voices in multi-speaker mode, save voice files in MP3 or other formats, and configure voice settings.
Is it possible to add Gemini TTS integration to the plugin?
Best regards.
you should change 3 combo keys, because they are of NVDA
Hi,
Many compliments on this add-on; it’s really well done!
I know very well that a developer can’t possibly keep track of every keyboard shortcut used by every add-on. It’s simply not feasible. Having worked on an add-on myself in the past (Nao), I know you do your best, but you can’t perform miracles.
That said, this point is quite important: you have three commands that conflict with NVDA’s laptop keyboard layout, and one that even conflicts in both layouts. I kindly ask you to change them. Here are the ones involved:
nvda+shift+d: this is the audio ducking toggle. It’s extremely useful, and your add-on overrides a core NVDA command in all layouts.
nvda+shift+o: reads the navigator object. Used in the laptop layout.
nvda+shift+a: Say All with the review cursor.
nvda+shift+s: reads the currently selected text in the control or document.
I hope you won’t take this as criticism; you’re most likely using the desktop layout and didn’t notice the conflicts. I understand you chose very convenient shortcuts, but add-ons really shouldn’t override NVDA’s core functions.
Thanks again, and take care.
@Fettah Pınar
Hello. You're welcome. I hope this NVDA add-on is useful for you and all visually impaired individuals. Yes. I will try to include it in the next version so that you can open the text-to-speech section with a shortcut key and convert your text to speech.
@Simone Dal Maso
hi. "Yes, you are absolutely right. I am using the desktop layout. Do you have any key suggestions for this layout? I'm concerned about choosing other keys and potentially conflicting with another layout, like the laptop one.
@Simone Dal Maso
Is it good to use Windows instead of the Shift key?
ProTip Re: Hotkey conflicts
For anyone having issues with hotkey conflicts, there is a beautiful little add-on, on the NVDA add-on store, called "check input gestures". What this does is check all of your add-ons and compares their associated hotkeys , to the default hotkeys of NVDA. If there are any conflicts, a little dialogue will pop up, allowing you to Press enter on any of the hotkeys from the list, which takes you immediately to "input gestures" so that you can reassign them accordingly.
It is efficient, and very handy.
HTH.
gemini tts
Hello again.
Thank you very much for your reply.
I hope this feature arrives as soon as possible.
By the way, I have some comments about the add-on.
I couldn't find some add-on shortcuts in the input gestures, for example, I want to change the shortcut for recognizing the current navigator object, but this shortcut doesn't appear in the input gestures.
Similarly, the shortcuts for some features don't appear in the input gestures.
Another question: will the option to use Gemini 3.0 Pro and other Pro models (alongside Flash models) be added?
about hotkeys
Should the keys be changed? Considering that the Input gestures in NVDA are changeable?
gemini tts
Yes, I want to change some keys.
For example, I want to change the shortcut for the current browser object, but I can't.
@Fettah Pınar
The reason other models were not used is that other models have a cost, and those who do not have an API connected to a billing account may encounter errors and become confused.
@Fettah Pınar
Missing options will be added soon
gemini tts
The following keyboard shortcuts do not appear in the Input Actions dialog box and cannot be changed.
NVDA + Shift + 6
NVDA + Shift + V
NVDA + Shift + O
NVDA + Shift + A
@Fettah Pınar
Hello! I checked and only the insert shift v key is missing, which I will add. The rest of the keys are registered.
about hotkeys
Soon, the Shift key will be added to the shortcut keys to resolve the conflict issue. For example: Ctrl + Shift + Insert + T
gemini tts
Hello again.
The interface translations have not been included in version 2.0; the plugin can only be used in English.
@Fettah Pınar
Hello. Yes, I know. This is an open-source project and I cannot do much alone. I hope other friends will join the project soon and make it more complete.
@Fettah Pınar about tts
Hello. I checked and Native Speech works well in NVDA. So I don't think there's a need to add TTS.
v2.1 Update: Fixed Keyboard Conflicts (New Shortcuts)
Hi everyone,
I released a quick update (v2.1) to fix keyboard conflicts, especially for those using the NVDA Laptop layout.
Change:
All shortcuts now use NVDA + Control + Shift + [Key].
For example, Smart Translation is now NVDA+Control+Shift+T.
You can download the update from GitHub as usual.
Regards,
Mahmood
OCR
Hello, congratulations on the add-on, it’s very good! I would like to request something if possible: at my job I often receive photos with text, and many times I only need the text and not the descriptions. Would it be possible for you to add a function to the add-on that performs OCR on the screen and returns only the text? It would help me a lot.
@Guilherme
Hello. You can use custom variables. Please read the add-on guide.
Where do the audio transscriptions go after being done?
Hello,
I've tried this add-on, and I have to say it's very good. However, I'm curious to know when transscribing a audio file, where does it go? When I tried doing this, I get "Uploading... but then nothing indicating the progress of the file being uploaded, or the results of the transscription. So, I guess I'm a little confused as to where these things are located. Any help would be greatly appreciated.
@jay
Hello. If successful, a box will appear where you can both view the text of the audio file and chat about the audio file. Sometimes you need to use Alt+Tab to focus on the window.
about customized prompts
I read the add-on guide about custom variables and didn’t understand how to use them. Give me a practical example of how and where to use them, please: let’s suppose I want to take a screenshot and have it return only the text that is written on the screen — how can I do that?
@Guilherme
For example, you can use this command in the custom prompt section of the add-on settings.
My OCR:[file_ocr]
still about custom prompts
Right, but then, which command do I use to activate this prompt?
@Guilherme
Press Ctrl+Shift+Insert+R. Please read the document.
gemini tts
Hello.
Yes, Native Speech works well, but since all Gemini features are combined in a single plugin, it would be nice to use the text-to-speech feature through the same plugin instead of using another plugin.
So, I suggest including this functionality in this plugin without having to install the Native Speech plugin, since the Native Speech plugin hasn't been updated in a long time.
did I need VPN to connect to google gemini?
I used to use gemini via VPN.
should I also contact with VPn for this addon?
@ming
Yes. If you are using a VPN to communicate with Gemini, you must also use a VPN to use this add-on. A proxy option has also been included in the settings, which requires you to have a server that forwards your requests to Gemini, eliminating the need for a VPN.
Version 2.5 Released: New Shortcuts, Stability, and a Note on Su
Hello everyone,
I am happy to announce the release of Vision Assistant Pro version 2.5!
I have tried my best to implement many of your suggestions in this update, most notably standardizing the keyboard shortcuts (switching from
NVDA+ShifttoNVDA+Control+Shift) to prevent conflicts with system commands and the Laptop layout.A note regarding support and future updates: With this release, the add-on is now stable and robust, covering all the core functionalities intended for daily use. There is no immediate need for major changes.
As a one-person team, I work on this project and address requests strictly based on my available free time. Since I cannot reply to every individual comment here on the forum, please follow these guidelines:
Thank you for your understanding and your continued support!
Best regards, Mahmood
Bulgarian language
Hi, this addon is very useful and big thanks for the good work. Can you add bulgarian language for the translations, AI responses and audio transcriptions? It will allow other bulgarian blind users to use this addon on our native language.