AI-powered Screen Reader for Mac

By kchro3, 3 January, 2024

Forum
App Development and Programming

Hi folks, I am a (sighted) software engineer, and I am working on a next generation screen reader for Mac called Typeahead AI. The website isn't up-to-date because I've changed directions in the last few weeks, but I want to build a "Jarvis"-like experience of telling your computer what to do with simple voice/text commands.

I can share a YouTube video to demo it in action, and I'm curious to get your reactions and feedback to see if this is promising. To describe what's happening, there's a keyboard shortcut to open a chat window, and I typed in a command like "send an invoice from ABC to xyz@gmail.com." Typeahead recognizes that it's on the correct page, downloads the invoice as a PDF, opens the Mail app, and composes a message to the correct recipient and attaches the PDF. The app is also narrating each action so that it's transparent to the user what is happening, and there are buttons and keystrokes to interrupt it.

Behind the scenes, it's using ChatGPT and Accessibility tools to navigate different websites and apps, and there is a lot of engineering happening to keep it fast and cost efficient. Today, it can open apps and websites, click on buttons, and fill in text fields, but I've only been working on this for a couple months so looking for early feedback.

I am thinking of this as a complementary tool to VoiceOver, and it is more like a replacement to Mac's Shortcuts, which I find limited and difficult to use.

I am trying to educate myself on the challenges and painpoints of VoiceOver, and this community's forum is really amazing in that regard. I've already gained a lot valuable insights by reading posts on this website, and I hope that you'll indulge me. Really appreciate any comments, and if this is something that interests you, I'd love to chat and talk more about future features.

Options

Comments

By Callum Stoneman on Friday, February 2, 2024 - 02:08

Hi,

First off, thank you so much for your work on this. It sounds very interesting and I canā€™t wait to try it out as soon as I can. I can already think of a process I go through on a regular basis and was wondering if it could be done with TypeAhead. Here is a rough explanation of the steps:

1. Open Mail and find the email from person x with the subject x.
2. Click on the Zoom link in the email.
3. When Safari opens, allow it to launch the Zoom client.
4. In Zoom, join the meeting with video off and the Original Sound option enabled.

Would TypeAhead be able to do all this? Anything in particular Iā€™d need to be aware of, permissions Iā€™d need to allow etc?

Iā€™m guessing this could be done by recording a ā€œquick actionā€ if it canā€™t do it out of the box, however the email subject and Zoom link change each time. Would that matter?

Itā€™s not a particularly inaccessible process, but using something like TypeAhead could definitely speed it up especially if it is not restricted by the ā€œSafari not respondingā€ issue.

Again, canā€™t wait to try this out!

By kchro3 on Friday, February 2, 2024 - 02:08

Hi Callum, I think that's a cool use-case. My tip would be to ask Typeahead each step one-by-one, and it will have a better chance of succeeding.

In the latest version, there is a "dictation" mode, where you can talk to it, in case that typing each step out is tedious! I tried this for myself, where I said "open Mail and find the email from Jenny with the subject... I don't remember what the subject is but it has to do with a meeting."

It opened Mail, typed in from:Jenny in the search bar, and then clicked on an email with a calendar invite. After that, I said "click on google meet link", and it could open the link.

If you run into any problems or want to talk more, let me know. If you like, I've done onboarding sessions with some folks over Zoom, and I think that's been helpful for people to learn the ropes.

By Callum Stoneman on Friday, February 2, 2024 - 02:08

Just given this a try and I must say, while there are some issues which I totally expected at this stage, this is seriously impressive stuff!

From the finder, I gave it the following prompt:

Send an email to (my-email-address) with the subject "Testing 123" and the message "This is a test of TypeAhead". Check with me first before sending the message.

Sure enough, it opened Mail, clicked the compose button, entered in my email address, the subject and the message body all in the correct fields, told me what it had done and then asked if I was ready to send the message. I said yes, and it came back with an error "no such element" at that point and failed to send the message. However, sending another message saying "Click send" caused it to work.

The "no such element" error followed by a number is something that I seem to be seeing quite frequently. I'm guessing this is because it is unable to analyse the GUI properly?

I tried asking it to help me add this comment, but that really confused it! I asked it to help me add a new comment and it placed VoiceOver's focus on the add comment link, which was fair enough. Although I didn't need to do it this way, I then told it to "click the new comment link". VO focus was jumped to the address bar, and TypeAhead seemed to get stuck in a loop of saying "clicking add new comment link" and "thinking". Maybe it's because that link doesn't go away or take you out to a new page? I also tried telling it to focus on the subject field and got the "no such element" error, and using the smart focus feature seemed to put the focus on a random place each time.

Those are just a couple of issues, but when this works, it's pretty mind-blowing! Keep up the great work on this, I'm very excited to see its development.

By kchro3 on Friday, February 2, 2024 - 02:08

Thanks for the feedback, Callum! We're looking into the 'no such element' error and improving web interactions. Your insights are helping us make TypeAhead better. Keep the suggestions coming!

*edit: lol, i was trying to use it to draft a reply while I was doing something else. my workflow just now was to "smart-copy" your comment with cmd-option-C, and i wrote "draft a reply". It wrote up a draft, but I thought it was too long so I typed "shorter please" and it ended up sending the above.

Real thought is, the "element not found" issue has to do with the fact that it is trying to take a snapshot of what's on the page every time it "thinks," and I think that there could be some artifacts of previous snapshots that's causing it to interact w/ an element from an older snapshot. I'll try to look into it -- somehow I thought it was fixed, but I guess not!

By Brad on Friday, February 2, 2024 - 02:08

It's a shame I can't try this myself, I know the kind of comments I gave you before but am very impressed that you've changed the name and that people are able to write using such natural language.

I have a question; in the future; are you hoping to make it so you can type something like, scan (inaccessible app screen name,) and tell me what clickable elements are on the screen? That would honestly be amazing!

It would be even better if it could stay focused on that part of the screen and update it if it got clicked, like how NVDAs' OCR does it but better.

No we'd not be able to play games with it but if this was doable; a huge amount of apps would be open to mac users that weren't before.

Another thing; I'm lazy and use a program called Everything on windows, it allows me to go to a directory and find anything in it, for example; I can type .mp3 and it will find all files with that extention, could I do a similar thing in your app?

I think that would be very useful if you have tonnes of mp3 files you'd like deleted, or renamed even! The more I think about this the more ways I can think of using it.

Sorry about before, I didn't truly understand what your app was aiming to do and now I do. For blind people it's like having a sighted person next to you if you need it.

By Ekaj on Friday, February 2, 2024 - 02:08

I just tried this out and I'm sad to say I didn't get very far. I went through the tutorial, and attempted to find out Microsoft's stock. I only got speech from VoiceOver, and didn't find any settings specific to the app. But I will definitely try it out again, and see what happens. I do have one question though. Is there a way to turn off VoiceOver's word prediction in Mail and just have this app do its thing? As much as I like VO it does as has been stated here previously, have its shortcomings. But I've not had near as much trouble with it as others. I also agree that "screen reader" alone is probably a bit of a stretch for this app at present. Perhaps "VoiceOver add-on", similar to NVDA's model? But I'm excited to see where your app goes in the future. It's a wonderful idea, and please keep up the good work. I just subscribed to your newsletter but have yet to receive a confirmation.

By kchro3 on Friday, February 9, 2024 - 02:08

Hey Ekaj, thanks for trying it out! If you'd like, I've been doing short 15/30min Zoom calls with people to help onboard and also just chat about feature requests. I'd be happy to hop on a call with you.

> Is there a way to turn off VoiceOver's word prediction in Mail and just have this app do its thing?

Hm, I'm not sure what the word prediction is. What you can do with the app is to ask it to write an email either verbatim or abstractly, and it will generate a draft and paste it into the Mail composer. It should work even when VoiceOver is enabled, although I could be wrong.

> RE: Screen Reader naming

I'm still unsure of what to call it, but I did consider calling it a plug-in / add-on. I think it's somewhere between VoiceOver + Voice Control + Be My AI. I want this to be a standalone app that is useful even if you don't use screen readers, for example, people who are elderly or people who have mobility issues.

That's the longer term picture, though.

> I just subscribed to your newsletter but have yet to receive a confirmation.

Ah, apologies, I don't have an automatic email confirmation yet. I also haven't sent anything out yet, but I will in the next week or so.

By Brad on Friday, February 9, 2024 - 02:08

I've not used a mac in years but I believe it's a Voiceover/mac feature, kind of like when you text on an IPhone it will give you suggestions near the top of it.

@ekaj, this might hellp: How to turn off autocorrect on a Mac computer
1. Select "System Preferences" from the Applications toolbar. If you're having trouble finding it, press "Command" + the space bar on your keyboard to open up "Spotlight Search" and type "System Preferences" into the search Type "System Preferences" into the Spotlight Search bar. 2. Click "Keyboard."
3. Click "Text" in the top bar.
4. Deselect "Correct spelling automatically." This will turn off autocorrect.

By PaulMartz on Saturday, March 16, 2024 - 02:08

I finally cleared my plate enough to give this a try.

I was unable to use it to toggle the transfer lock on one of my domains at Hover. There simply doesn't appear to be any way to bring VoiceOver focus to that control. It's mouse-only.

I was able to use this tool to interact with the multi-select genre picker at The Submission Grinder. Safari wasn't letting me select more than one genre, which forced me to Chrome. But latest Sonoma broke this website in Chrome, which forced me back to Safari. Typeahead has made this website usable in Safari.

This is an excellent use of AI. This is what I've wanted for close to two years now, a virtual assistant that will make life more accessible. We have a long way to go. I say that because the potential for AI to circumvent accessibility issues is enormous, and not in any way to minimize Typeahead. Typeahead has already proven itself useful. It took that brave first step. Now, let's see what Apple rolls out in June.

I hope I live long enough to see AI overcome every accessibility barrier.

By PaulMartz on Saturday, March 16, 2024 - 02:08

(Jeff OP, also emailing this to you directly.)

Each new release of MacOS seems to change the Messages app just enough to render my muscle memory completely unusable.

One example is the method for switching to another conversation. In Ventura, starting with VoiceOver focus in the message text field, I used to be able to VO+Left arrow to the Apps button, then press VO+J to jump to the list of conversations. From there, I could interact (VO+Shift+Down arrow), then VO+arrow to the conversation of my choice, and finally VO+Space. It sounds like a PITA, but do it enough times and your fingers fly through those commands without even thinking about it.

But then Sonoma came out and changed how VO+J works in many apps. As a result, VO+J no longer jumps to the list of conversations. My new workflow, again starting from the message text field, is to press VO+Home to jump to the toolbar, then VO+Right arrow until I get to the list of conversations. It is going to take me days to get used to this.

Wouldn't it be better if I could simply Command+Option+F and tell Typeahead "Jennifer," and Typeahead would place focus on the button in the conversations list that corresponds to my conversation with Jennifer?

Unfortunately, this doesn't work. Current Typeahead complains it can't find a control fitting that description. And this is strange, because, if I open the item chooser with VO+I, I can type "Jen" and immediately narrow down the items to the button I'm looking for.

Typeahead needs to be at least as easy as using that Item Chooser.

Even better would be if I could Command+Option+Space and tell Typeahead "open Jennifer's conversation," and Typeahead would not only find the Jennifer button in the list of conversations, but select it.

By PaulMartz on Saturday, March 16, 2024 - 02:08

Hover, the domain registrar, sent me a customer satisfaction survey. Normally these things are completely tedious to navigate and fill out. Today, I used Typeahead. Wow. Wow. It was amazing. Walked me through the whole survey. Made it easy to select inaccessible checkboxes, radio buttons numbered from 1 to 10, and fill in comment fields. One of the pages had multiple questions. I asked Typeahead how many questions were on the page. It told me. Read each one to me as I asked. This is just outstanding technology. Thank you Typeahead.

By PaulMartz on Saturday, March 16, 2024 - 02:08

I'm amazed how many people I meet that have never used AI yet are of the opinion that AI is some kind of existential threat. When I tell them how I use AI everyday as a blind person, I hope it changes some minds. But the media has this smear campaign going that will be hard to stop.

Keep cheerleading. We have to make sure they don't throw out the baby with the bath water.

By PaulMartz on Saturday, March 16, 2024 - 02:08

I got this to work. I was overthinking it. I simply told Typeahead: "Find the conversation with Jennifer and select it." Voila. Messages app immediately switched to displaying my conversation with Jennifer.

Is Typeahead perfect? No. But the more I play with it and discover what it can do, the harder it's becoming for me to say that. This tool rocks.

By Brian on Saturday, March 16, 2024 - 02:08

Type Ahead sounds promising. I cannot wait to see what Siri + Generative AI will bring us in iOS 18/macOS 15. šŸ˜Ž

By Brad on Saturday, March 16, 2024 - 02:08

I didn't even know there was a website.

By Brad on Saturday, March 16, 2024 - 02:08

My brain wasn't working when I read that comment.