Hi guys,
As many of you know, I am the developer of PiccyBot, which has been received well by the community as a flexible tool to describe videos and images.
As a side project, I started to work on Voistant, an AI voice assistant which should in principle give you complete verbal control over your PC (Windows only for now).
It is still very much in beta, and I am in need of feedback about it. Is this really something that would be useful? There are quite some costs involved in the AI agent calls, the screen captures, etc. so even though it is free to use for now, it will have to be a paid service later on. I need your thoughts about whether it would be worth it. Or what improvements would you need to make it worth getting a subscription for.
The website https://voistant.com has a link to download the current version. Note that you may have to approve the download as it is not an officially signed application as of yet.
Looking forward to your feedback! Thanks.
Martijn van der Spek
Comments
my thoughts
Hi Martijn,
I just installed your Voistant app. I managed to register, but it didn't seem to hear any of my commands at all. I wasn't given any instructions about how to configure my microphone, so I assume my microphone was working by default? I don't know if it was or wasn't. Usually my microphone just works when I use it for Google Meet or Microsoft Teams, but even if it doesn't work for whatever reason, I can go and configure it. I opened a webpage, and told Voistant to solve a Captcha, but it didn't hear anything I said. When I tried speaking after the beep, it kept saying, 'something went wrong'. I much prefer being able to type to things instead of speaking anyway, but that's a personal preference thing maybe. Also, when it reads the commands or instructions, I'd love it to use my screen reader voice, and not Sapi 5. But anyway, it doesn't pick up my microphone at all. I'd love to try it out but can't. I should point out that after registering, it gave me a brief description of my screen. It told me there were desktop icons, but not what they were, but I didn't question it further because I wanted to see if it could solve a Captcha. I'm on Windows 11, the latest version. I much prefer what Guide AI Assistant for Windows is doing, yes it's only text input for the moment, but I like being able to type much more than speaking to things. I don't know if you've seen that, but if your assistant could provide a way to type and speak, and a way to configure the microphone, that could be a promising competitor. Check out Guide.
https://www.guideinteraction.com/
That's the sort of thing I'd use and do use.
A solution like this has to do much more
It can't be just click here and there. I would be only interested in a solution like this if it had capability to perform complex actions.
For example, navigate to Amazon, find me highest rated lavender perfume under $50, purchase and use my default payment method and address.
Guide AI can do some of this
Hi,
Yes, Guide AI can do some of this, it bought something from Amazon when I was signed in. It clicked on the correct product and the 'buy' button. But at the moment I can't test this Voistant app since it doesn't even pick up my microphone, or hear when I speak to it.
Tara: Not recognising speech..
I am not sure where the issue is with your microphone Tara. Voistant will default to keyboard input if no microphone is found, so it does find it but somehow can't use it. Maybe an app privacy setting. Will check it!
Thanks for trying it out!
SeasongKing, these type of commands should already work with Voistant. I tested it yesterday by booking a movie for example. But it is definitely not without glitches and it is slow. I need as much feedback as possible for actual use cases so I can improve it.
logging back in
Hi Martijn,
I tried to log back in again, but it said my email address wasn't recognised. Was I supposed to get a confirmation email? I didn't receive one, and I checked my junk folder too. But when I tried registering again, it just logged me in, with my settings I'd saved like the interrupt key as the CTRL key. The things I can type in the terminal are 1 or 2 for registering and logging in. There is no edit field visible, I can just type, either 1 or 2, but no edit field. After logging in or registering, the microphone finally works, and it's just now heard me. I just asked it to describe my screen, and it described this page. I tried asking it to describe a Captcha, to tell me the letters or numbers, but it said it couldn't solve Captchas or something. I'd rather have the option of switching between some sort of voice mode and text mode. For example, I want to be able to navigate through my commands and the assistant's answers without it reading everything out in the SAPI 5 voice. If I try to type a question in the terminal window like, 'describe my screen,' I don't know if anything is being typed or not, and when I press Enter nothing happens. NVDA doesn't announce letters like it usually does, and there just seems to be a blank Window, not an edit field like NVDA would usually announce. If you haven't tested this with NVDA, I'd strongly recommend you do so, because then you'll know what I'm talking about.
Thanks.
Question
I have a question. Is this a matter of computing power, or is it specifically an issue with the mobile platform? I’m only wondering why you didn’t start with mobile first, since you already have a big fan base here, and then later transition to computer. It seems like most people would be using these tools on their phone rather than on a computer, but I could be totally wrong about that. I’m really wondering why we don’t have these products available yet for the iPhone.
these types of apps are useful
Hi Winter Roses,
I use Guide AI Assistant on Windows, primarily to solve Captchas, and I've used it in the past to get passed inaccessible cookie banners, and to describe screens, and help me navigate more difficult websites. So yes, this sort of thing is definitely useful to have on a computer. In fact, I keep Guide around now in case I come up against something inaccessible like a website or app. But this Voistant app can't seem to solve Captchas at the moment. The idea of these types of apps is to use your mouse to click on things that aren't accessible with a screen reader, like buttons and so on. For a better idea of this sort of thing, see my thread about the Guide AI Assistant I created a few months back on here.
https://www.applevis.com/forum/windows/guide-ai-assistant-people-who-are-blind-or-low-vision
As for why this sort of thing isn't available for an iPhone, Apple is funny about letting apps control your phone and sharing screens etc.. I think Android is probably more open to this sort of thing.
And also for Windows, there's the Viewpoint assistant, an app that brings inaccessible buttons and links into focus so you can click on them with your keyboard.
https://viewpoint.nibblenerds.com/
Going to test
But like Tara mentioned, I'd prefer a way to input text rather than speak to it all the time, for example, in say, an office setting.
Could be usefyl
I think any user agent that can do things efficiently could be useful. As long as it has way to also type ut instructions as well as speech. I’d pay for a subscription if it worked well.
Updated with choice of input
There is a new update available, Voistant will now ask you if you prefer microphone or keyboard input. If you don't have a microphone installed, it will use keyboard by default.
As has been said, developing this for mobile is a different kettle of fish. Apple doesn't allow reading of screens and controlling other apps, and even Google will be hard. I am working on a Mac version of Voistant though, that is looking positive.
can not login again
hi! can not login again and i receive error!
Login
Mahmood, can you register again? I made some adjustments in the latest update.
re: login
after i enter otp i got error from lib connection line 198
When can we expect a Mac version to be available
Hi, when can we expect a Mac version to be available? I'm a Mac user, and I would like to test it once it becomes available.