Looking for test users for AI-Powered Voice Assistant that empowers computer control for the visually impaired

I just signed up for it

Hi,

I just signed up for it. I cannot wait to test it out. question: how does this compare to existing features such as siri and voice control? and can you check for updates in future as soon as it's released?

Great, Thank you JC!

Great questions,
Nexy is similar to Siri in terms of communication, but Siri is very limited in capabilities, but Nexy has no limitations, you can use it in a browser, email, social networks, anywhere, giving just voice commands.

Nexy always works in the background and you can use it at any time, like Siri, you can ask, "what's on the screen", "What is drawn on the graph, picture, table", You can even ask to translate the text or read, find out what the weather is like today or the news, and most importantly you can ask to perform an action, for example, press a button or write or open an application and more complex commands like "Play music on YouTube".

That is, completely voice interaction.

Cool!

cool! cannot wait to test it out. Is it always going to be free? and will there be a check for updates option in the help menu?

Answer the questions

For the first 10 users it will be free forever, and yes, there will be a subscription costing approximately, but not more than 10-20 dollars per month, we will try to make it as accessible and cheap as possible.
So, congratulations, you are in the list of the first 10 users.
Also, you will always have a new version because the update will be automatic.

Nice!

Nice! and the subscription is optional, right? also, are there sound notifications to let you know it herd your commands just like siri?

You mentioned that it can be used to press buttons, can it also be used for clicking on links such as opening a link to a zoom meeting? say, click on OK, when the "this meeting is being recorded" dialog box pops up? another example, lets say that you are composing a message in the messages app, can you say "click on record audio," and it clicks on the button, and when you're done recording your audio message, instead of hunting for the stop button, can you say, "click stop" and it'll stop recording? if yes, I could use it in situations like this.

is there a website to download the app?

I forgot to ask, I know I have signed up to get access to the app, but is there a website to download the app directly?

Answer the questions

Did you apply with website, right?
1) Yes, there are sound notifications, for example: "I click the stop button!" like this .

Yes, you can ask to attach a microphone button for recording

Yes I did

Yes I did. I applyed on website.

is there a youtube channel with demos?

Is there a youtube channel with demos? I would love to hear demos if any.

Not yet, but we will do it

We don't have yet but we will do it, thank you for idea

No problem

No problem. Cannot wait to test app. also, is there help available if you ever get stuck?

I will always be in touch,

I will provide my contact details for communication

I can't try this as i'm on windows but.

I assume you already know that voiceover exists?

At the moment it sounds like you're trying to create a screen reader but through voice.

I'd recommend also adding a text to nexy option that way if you don't want to talk, or can't for some reason, you could stil use all the functions.

I’ve signed up.

Hi. I’ve signed up, but unfortunately don’t have a Mac. Could I still be added to the list so that I can test if/when a Windows version comes out? Also, if I do make it into the first 10 users and as you said that the subscription will be free forever for those users would I be able to get it for free on windows? If not, that’s perfectly alright.

does it has windows pc version

does it have windows pc version?

Would love

To test it out if the dev has any plans of coming up with a windows version...

Commands

Does it support Intel Macs? And do I need to "guide" it through each step - example Press send on the mail interface to send the message or can I just say something like "Send an email to x with subject test and write this is a test?"

I know of something like this for windows called guide.

It's not for me but others seam to enjoy it. You can get a month free trial I believe: https://www.guideinteraction.com/

Back to the Mac

Yes you can read about Guide and discuss it on our Windows forum here:
https://www.applevis.com/forum/windows/guide-ai-assistant-people-who-are-blind-or-low-vision

Privacy and security

Hi @Seregawpn. Can you comment at all about privacy and security? With an app like this you are giving over a lot of control and access to your Mac, so trust in the privacy and security is naturally very important.
Many thanks,
Dave

Signed up yesterday

Hey there,
I'm hopeful I can get a slot. I signed up yesterday afternoon.
I have some really interesting and different use cases I want to test out, I won't go into detail until I have the chance too, but I feel like if successful, it could be a game changer.

@Dave Nason I completely forgot there was another topic.

I won't take the link down but will look next time for a link before posting.

Could I play games with this?

Like civilization let’s say?

Just signed up. I'm an iOS…

Just signed up. I'm an iOS and macOS developer as well and work with AI so hopefully can provide some useful feedback and suggestions.

hmm.

the OP hasn't responded in a couple days, I've not put my email in this website and would advise others to stay away until this person responds.

Shiny toys are cool, but when there's not responses from the devs of those shiny new toys, I get quite wary.

Sorry for late answer

So, We have already 39 people in the list to test, we will start to send this week, I will let you know.

Other questions I will answer a lit bit later.

THanks for letting us know

Thanks for letting us know.

Thanks for the answers.

Sorry for the reply before, it's just you seamed very quick to answer, it never crossed my mind that you had testers to see to first.

I'm looking forward to checking out a video or two.

Signed up, but no response

Greetings and salutations, I signed up via the link provided, and I haven’t heard anything back. It’s probably been a couple days, so I don’t know how this is supposed to work, or how you’re in the queue to get selected. Just thought I’d put it out there.

General information

I apologize for not getting in touch often, we started providing 1 user at a time, we will move step by step, and we will try to provide 10 people to everyone this week, and then we will add more users.
I will periodically get in touch and answer questions.
Thanks to everyone

I'll answer on email everyone today to connect with you.

For additional information

I will make a video soon, on the weekend, and will also send it by email so that you have a preliminary understanding

Awesome!

Has anyone been enrolled yet?

This thread is giving me vibes of a thread from some time ago where its original poster was asking people to fill in a form with personal information to join some kind of group conversation on the subject of accessibility with Apple representatives. Nothing ended up coming out of it, and I suspect that the thread was actually deleted since I can't find it in my post history. The fact that the original poster here is now deflecting and deferring responses after being very quick to reply in the beginning, coupled with the fact that they haven't even addressed the privacy question raised by an earlier commenter, makes the whole thing feel rather fishy.

Also, and excuse my negativity, but I think that, if this project actually exists, in its current form it's just a gimmick to attract investment by surfing the AI hype. As a blind Mac power user I don't think there's much that an AI agent can offer me in terms of accessibility. I do not deny that navigating inaccessible content would help a lot, but being relegated to the passenger's seat when it comes to controlling my own computer is not something that I will give up on easily. There's a lot that AI can do not only for us but for humanity in general, and I do work for a company whose founders have been impressing me with lots of good ideas that I and others have been realizing into an actual product, but fortunately, agentic crap does not seem to be on their plans.

If this project really exists and pursues a serious objective, my recommendation is to focus on how to assist us doing things rather than on how to do things for us, by dropping the agentic crap and focusing into improving the situation with poorly accessible or totally inaccessible content. Even if there was a zero percent chance of the AI hallucinating, and even if it could somehow read my mind and make sense of my ambiguous prompts, I would still prefer to retain control since it's almost always more efficient.

Answer on João Santos questions

1) I may have missed questions, you can repeat them for confidentiality reasons, I will answer them.
2) The project exists, but not in a public format, since the project cannot withstand a heavy load today, since we need to make edits when bugs are detected, which is what we are doing now, and that is why I have collected emails to send one by one and see and control the load level.
3) I understand your caution and that is why I am ready to answer any questions, so that it is clear, but if you think that it is better for you to manage the computer yourself for some reason, this is your choice.

I am one of the developers and the founder of the company, we do not publish much in order to move systematically and correctly, since we do not have millions in funding, and any extra voice at the wrong time and publication of the product can be dangerous for us today, since this is a year of hard work to make the current product available to everyone.

We will be able to easily talk about us when there are no bugs and everything works correctly.

Thank you for your feedback, please be patient.
Happy Holidays to all.

updates?

Hi, any updates? I hope it's ready to be tested.

currently fixing some bugs

We are currently fixing some bugs that were identified during testing, it takes a little time, I hope we will start working with you in the next week.
Thanks for keeping in touch

OK

OK. also, let me know when the youtube video is ready. Would love to hear a demo.

no updates?

Hi, No updates? what's going on.

Glad I didn't comment earlier

This is funny because I actually browsed this thread earlier today and was considering commenting on it again, but then decided against it in order for my actions to not be perceived as a form of harassment.

There are some red flags here, so I strongly recommend against filling in their form with personal data and especially installing anything requiring input device access, accessibility privileges, AppleEvents privileges, or access to the camera, microphone, screen capturing, or system audio capturing facilities at the very least until they show an actual working demo.

I forgot about this...

Honestly, it seams like one of those things where a sighted person thought, let's help the poor blind peple and didn't actually ask or hire any blind peple to get this off the ground.

If a demo comes out, I'll check it out but they've been silent for a couple of weeks now.

Hi guys, I'm always here!

We have encountered a technical problem, so we are still solving the problem related to the fact that when the assistant speaks it is difficult to interrupt him or he hears himself, we are trying to do it as well as OpenAI, and conduct a real dialogue when you can calmly interrupt. Previously, a fairly simple solution that did not completely close the problem and during testing the problem appeared again and again, so it now takes time to solve and it is difficult to say how long it will take, I hope that it will not last long, since now we are creating noise suppression and echo cancellation that should work at a fairly high level.

Free hint

CoreAudio has a built-in echo cancellation feature that can be enabled using the kAudioDevicePropertyVoiceActivityDetectionEnable property of an audio device object.

The following is a section of a comment in /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/CoreAudio.framework/Headers/AudioHardware.h:

    @constant       kAudioDevicePropertyVoiceActivityDetectionEnable
                        A UInt32 where 0 disables voice activity detection process and non-zero enables it.
                        Voice activity detection can be used with input audio and has echo cancellation.
                        Detection works when a process mute is used, but not with hardware mute.

The above file can also be opened in an editor like TextMate from Terminal in a more portable way using xcrun as follows:

mate `xcrun --show-sdk-path`/System/Library/Frameworks/CoreAudio.framework/Headers/AudioHardware.h

Writing a fast wavelet transform based echo cancellation solution from scratch using the SIMD API from the Swift standard library if the CoreAudio option proves insufficient is not that complicated either, however I recommend against using Apple's BNNS module from the Accelerate framework since it's not really designed to take advantage of CPU cache, plus has no support for the discrete or fast wavelet transforms so you won't be saving yourselves any work. In this case you'd transform the spectrogram produced by the wavelet transforms of the input samples and a bigger chunk of output samples into a set of bezier paths, try to match those paths in both spectrograms, remove the ones that match best from the input, and reconstruct the signal using the inverse fast wavelet transform. In theory this solution provides the best results even in noisy environments.

Another option is to use the fast cosine transform, which I think is available in Accelerate.BNNS, perform the dot product between the last input chunk and a sliding window over the last few hundreds of milliseconds of output in the frequency-domain, subtract the output vector from the input multiplied by the computed dot product where the similarity is greatest, and use the inverse fast cosine transform to reconstruct the new signal. This solution provides good results in theory even with some noise, but small differences in the amplitude of each input frequency resulting from the audio signature of the speakers may result in some output bleeding back in.

Finally the lamest solution is to just perform a time-domain correlation between the input and output samples and subtract the input from the output where the similarity is highest. This is the easiest option but may not be sufficient because any difference between the output and input signal resulting from the audio signatures of the speakers will cause bleeding and this will also perform very poorly in noisy environments.

Edited to correct, clarify, and improve my suggestions.

@Seregawpn a question.

Why would I want to use your app when I can use voiceover to do the same thing?

I'm on windows so couldn't use it anyway but sell it to us, what's the advantages?

Think it's something similar to Voice Control

From what I gather they want to make something similar to Voice Control on steroids, because the idea is that we cannot use computers so need an AI agent to do it for us. They want us to talk to the computer and have it decide on what actions to perform in order to accomplish our goals. If properly implemented this could be situationally useful, however the agentic crap feels like a gimmick to attract investment by surfing the AI hype which is what irks me, as something like this could be much better if it just augmented the functionality provided by VoiceOver by making it possible to navigate totally inaccessible content such as video-games, images, and video, and then there are the privacy concerns because it is very unlikely that they will run a multi-modal large language model locally.

any updates?

Hi, Any updates? have you fixed the bug?

We finished solving the problem!

Hi guys, good news! Today we finished solving the problem - everything works now. All that remains is to adapt it in the application, which will take about a week. In a week, we will start adding new users for testing, in addition to those who have already submitted applications. We will continue with them.
Regarding the methods for solving the complex issue that I saw you provided: your solution was based on Swift. The point is that our project is created in Python, and there is a small difficulty with correct synchronization. Additionally, the solution you proposed is difficult to configure and is mechanical. We decided to go with WebRTC in JavaScript - this is a solution that was created by Google when working with Chrome, and the same solution is used by OpenAI.
I apologize for being out of touch for a long time. I tried to solve the problem as soon as possible and wanted to come back with good news. We managed to do it! Plus, more good news: we are now implementing the MCP (Model Context Protocol) interaction method. This means the speed of interaction with the hardware will be almost instantaneous. For example, if you ask to create a file or find a file on your computer with specific content, it will take about 10 seconds.

that's awesome!

that's awesome! Glad to hear that problem has been resolved. will there be a youtube video demonstrating it's features? would love to hear it in action. Looking forward to testing it.

Questionable choices

Regarding the methods for solving the complex issue that I saw you provided: your solution was based on Swift. The point is that our project is created in Python, and there is a small difficulty with correct synchronization. Additionally, the solution you proposed is difficult to configure and is mechanical. We decided to go with WebRTC in JavaScript - this is a solution that was created by Google when working with Chrome, and the same solution is used by OpenAI.

Beyond just enabling voice detection in CoreAudio, which is by far the most straightforward solution of all, my other suggestions were mostly mathematical, involving concepts that are best implemented at lower levels for performance reasons.

Latency in audio is always a problem that you will face, because at a sample rate of 48KHz, and considering that the speed of sound is 343m/s at sea level and room temperature (20º celsius), changing the relative distance between the microphone and the speakers by just 2cm is enough to affect the latency by at least one sample, so all echo cancellation solutions must be able to dynamically adjust to real world conditions, which even the lamest solution that I suggested is capable of doing.

As for WebRTC, if that works for you then great I guess,, but the solutions that I mentioned are far from being mechanical, and the one involving wavelet transforms is likely what modern implementations actually use since it's quite state of the art signal processing for audio. If by mechanical you mean algorithmic, then in my opinion that's actually a positive point, because machine learning is almost always significantly more resource hungry than algorithmic solutions with well studied optimizations, which is the case of all the suggestions that I made.

Anyway this thread is over two months old at this point and there's not even a single teaser video demonstrating the product under ideal conditions yet.

Any updates?

Hi,

Any updates?

Looking for test users for AI-Powered Voice Assistant that empowers computer control for the visually impaired

Options

Comments