Hello AppleVis community,
I am Mahmood, and I am excited to share Vision Assistant Pro, a new open-source add-on for NVDA. This tool is designed to bring the intelligence of Google Gemini directly into your screen reader to solve digital challenges that usually require sighted assistance.
It is completely free to use (with your own API key) and focuses on interactivity rather than just static descriptions.
π Key Features:
ποΈ Interactive Vision (Object & Full Screen): Unlike standard OCR that just reads text, this feature lets you "see" and "ask."
- Object Vision: Take a snapshot of the specific control (icon, button, image) under your navigator cursor.
- Full Screen Vision: Scan the entire screen layout.
- The Best Part: After the initial description, you can chat with the AI. You are not limited to one description; you can ask follow-up questions like "Is there a save icon?", "Describe the chart in detail," or "What color is the button?"
π§ Smart Translator (Auto-Swap): Instantly translates selected text. It creates a seamless bilingual experience by automatically detecting languages. If the source matches your target language, it intelligently swaps them.
ποΈ Smart Dictation: A powerful voice typing tool. It doesn't just transcribe; it listens, fixes your grammar, removes stutters (ums and ahs), adds punctuation, and types the polished text directly into your active window.
π CAPTCHA Solver: Struggling with visual codes? Press a shortcut, and the AI will solve the math or read the characters and automatically type the result for you.
π Document QA: Have a PDF, TIFF, or text file? You can "chat" with your documents. Ask the AI to summarize them, extract specific data, or explain complex sections.
π οΈ Requirement: You need a free Google Gemini API Key to run this add-on.
π₯ Download & Installation: You can download the add-on directly from GitHub:
Download Vision Assistant Pro v4.0.3 (Direct Link)
Just open the downloaded file and confirm the installation in NVDA.
π Project Source Code: https://github.com/mahmoodhozhabri/VisionAssistantPro
I developed this to help our community become more independent. I would love to hear your feedback and suggestions.
Best regards, Mahmood
Comments
Getting it to work
So I just installed the addon from the addons store, pasted in the API key and nothing seems to happen. When I tried to perform OCR on a PDF NVDA just says not connected.
I tried creating a new project in Google AI studio, same result.
Any idea on how to get it working?
@Nut
not connected? There is no such message in the plugin! Please check again.
@Stefan
Hello. Bulgarian has been added to the new version, 2.6. Enjoy.
Need help getting API key
I feel like I missed the first week of A.I. class and showed up on quiz day. When I go to Google Studio and hit the get API key button, it goes to an empty list of keys for imported projects. When I then hit the create API key, a modal pops up requiring a name and imported project--a combo box that's empty, with the "create" button disabled. Tried in Chrome and Firefox. Is this expected, and I have to create a project, or are browser extensions getting in the way, or what?
Thanks!
translate
It translates one sentence at a time and I have to press the key combination every time, which is not convenient. Make it like the nvda translator to turn on and off, instead of pressing a key combination for each sentence.
Re: Translate
Have to agree; real-trime translation of this quality is just what the doctor ordered. Question is, is it feasible?
@mantanini
Hello. That's not the case! Unless you are translating in the browser! To do this, place the text in the clipboard and press 'y' instead of 't'.
Congrats, good project!
I would recommend not implementing automatic updates or any update system directly in the add-on, as this is done automatically by the NVDA add-on store, and users can enable or disable automatic updates there.
API key missing
Thanks for this most indispensable add-on, i just used the OCR to describe my full screen and got a very nice description, i tried it again a minute later to get a full screen description. it says API key missing. what's up with that? is there a quota limitation of how much you use the add-on or am i missing something. and by the way i am using the very latest version of this add-on
suggestions
Hi Mahmood! I would like to congratulate you on the excellent add-on. Keep up the good work! I have some suggestions that would make things much easier: the possibility of having a screen to place multiple prompts and then being able to switch with a shortcut key. Sometimes, I want to know what's in a photo, then something in a game, and it would be useful to already have these prompts predefined by the user and just keep switching. Another suggestion would be an option so that when it showed the result, it would just speak, without showing a dialogue or window. It's quite useful to know information quickly without having to switch windows. Thank you!
please read
Dear Users,
Please download and install the latest build of the add-on from the project's GitHub page, which is linked in the original post. It appears that a significant portion of users are still running outdated versions.
Due to time constraints, my ability to visit this thread frequently to respond to comments is limited. For any issues or suggestions, please raise them directly in the Issues section of the project's GitHub repository.
Best Regards,
Mahmood
Post updated
Hello everyone. The download link has been updated to version 3.1.0. Enjoy.
Errore 400 in trascrizione audio di .mp3
Salve e complimenti per questo fantastico Add-On. Segnalo che, quando effettuo una trascrizione audio di un file .mp3, viene notificato l'errore 400, con richiesta di controllare la propria chiave API, che ovviamente Γ¨ corretta. Ho provato a cambiare l'estensione del file audio ad esempio in .wav e la trascrizione funziona, dunque ritengo dipenda dal modo in cui l'estensione .mp3 viene vista dalla chiave Api, spero possiate risolvere, grazie mille.
@Maurizio
Hello, yes, this issue exists and will be fixed soon in version 3.5.0.
Complimenti enormi e richiesta su descrizione video
Intanto grazie per aver risolto brillantemente l'errore 400 sui files .mp3. Ho inoltre trovato davvero meravigliosa e fantastica la funzione per descrivere i video da Url di YouTube ed Instagram. Sarebbe super se si potesse fare anche per X ed altre piattaforme ed anche, se si potesse fare con i video in locale riprodotti sul proprio PC, so che ci potrebbero essere problemi di Privacy, personalmente, se richiesto, sono disposto a concedere tutte le autorizzazioni necessarie, sarebbe davvero meraviglioso. Infiniti e sinceri complimenti e grazie di cuore!!
@Maurizio
Thank you so much for your kind words and feedback, Maurizio! I'm glad to hear the MP3 fix and video descriptions are working well for you.
Regarding your suggestions, I'll be adding support for X (Twitter) very soon. As for local video support, I am currently thinking about it and evaluating the best way to implement it.
Regarding the privacy concerns you mentioned, I want to reassure you that since this addon is open source, everything is transparent and there is no need to worry at all. Your trust and support mean a lot to meβthank you again!
Description Video TikTok
Hello again! I will never stop thanking you for this amazing Addon! Furthermore, I hope he is well, if I am not mistaken his area is Iran and I know what is happening, I sincerely hope that he is as good as possible and that he is safe.
I would kindly ask if it is possible to implement video descriptions also for the TikTok platform, it would be truly incredible, given the efficiency I found for both YouTube, X and Instagram, although on the latter, I found some Download errors, in practice it seemed not to recognize some Urls. But, in any case, it is a wonderful function, can it also be implemented for TikTok? Thank you very much and all the best, with great respect and sincerity!
@Maurizio
Hello, thank you for checking on me. I am well, but internet access has become very difficult, and it's unclear if I will be able to access it in the future. I will try my best. I have never worked with TikTok before. Please send a link to a video. If I can and my internet is stable, I will implement it.
updated
You can get the latest version from the original post.
updated
Hello everyone. The new version of the addon has been released. Support for TikTok videos has also been added!
v4.0.3
Hello everyone.
The new version has been released.
You can use it.
Awesome add-on
This is fantastic! I'd like to donate, but I don't use the platform that is provided. Maybe consider Paypal like NVDA uses? I also appreciate that this is fully accessible to braille users!
First thoughts and questions
I finally got around to trying this add-on, and so far it's excellent! It took me a minute or 2 to figure out the API key thing, but once I got that, I've tried a few modes, and am impressed with the speed and quality of the responses so far.
A few Questions
1. As a low vision user, I noticed that when you bring up the Settings screen for VA Pro, I can navigate through it via NVDA and keyboard, but nothing shows up on the screen. It would be helpful if this window showed up visually, like the rest of the windows do when you use the add-on. Not sure why nothing shows up for the Settings screen.
2. Can there be an option for a dark theme for the windows that appear for low vision users, or those who are light sensitive?
3. When I scan a file or document, does it do the whole thing, like multiple pages, or just the first visual page like when doing standard NVDA + R?
4. When scanning a video link, is there a rough video length we should limit it to? Just trying to figure out how long it would take to process a video. I fed it probably a waaay longer video than I should have, and got a key error. Understandable...
5. Is there a keyboard shortcut to bring up a menu of the add-on's functions directly, in case we forget a specific letter? I know you can do this via the add-on submenu, but maybe a direct menu command to bring us right there?
I think this is all I can think of for now, but I will definitely be playing with this more. I like it though. I'd also love to see a PayPal option for donations, as almost everyone has PayPal.
about donate
Hello everyone, I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.
@Scott Davert
hi. I'm glad the addon has been useful to you. I try to always add more features to the addon. Regarding donations, unfortunately, I am in Iran and do not have access to PayPal. If you would like to support the project and do not have the option to support with cryptocurrency, you can purchase an Apple USA gift card and send it to me via private message. Thank you in advance for your support.
@Jesse Anderson
Hello to you, I'm glad the addon has been useful for you.
1. I am blind, but I don't think this section relates to the display, because it uses the default NVDA settings.
2. This also seems to use NVDA settings.
3. The choice is entirely yours. Check it out.
4. I think it's about 10 minutes.
5. You can assign any key to any section in Input Gestures.
Regarding donate, if you would like to support, and if you cannot use crypto, you can purchase an American Apple gift card and send it to me privately on Applevis.