AI-powered Screen Reader for Mac

By kchro3, 3 January, 2024

Forum
App Development and Programming

Hi folks, I am a (sighted) software engineer, and I am working on a next generation screen reader for Mac called Typeahead AI. The website isn't up-to-date because I've changed directions in the last few weeks, but I want to build a "Jarvis"-like experience of telling your computer what to do with simple voice/text commands.

I can share a YouTube video to demo it in action, and I'm curious to get your reactions and feedback to see if this is promising. To describe what's happening, there's a keyboard shortcut to open a chat window, and I typed in a command like "send an invoice from ABC to xyz@gmail.com." Typeahead recognizes that it's on the correct page, downloads the invoice as a PDF, opens the Mail app, and composes a message to the correct recipient and attaches the PDF. The app is also narrating each action so that it's transparent to the user what is happening, and there are buttons and keystrokes to interrupt it.

Behind the scenes, it's using ChatGPT and Accessibility tools to navigate different websites and apps, and there is a lot of engineering happening to keep it fast and cost efficient. Today, it can open apps and websites, click on buttons, and fill in text fields, but I've only been working on this for a couple months so looking for early feedback.

I am thinking of this as a complementary tool to VoiceOver, and it is more like a replacement to Mac's Shortcuts, which I find limited and difficult to use.

I am trying to educate myself on the challenges and painpoints of VoiceOver, and this community's forum is really amazing in that regard. I've already gained a lot valuable insights by reading posts on this website, and I hope that you'll indulge me. Really appreciate any comments, and if this is something that interests you, I'd love to chat and talk more about future features.

Options

Comments

By kchro3 on Thursday, January 4, 2024 - 05:54

Thanks Oliver! Yes, it will narrate each of its steps. For more complicated routines, you can spell out the steps yourself, and you can also "record" yourself doing the action once, so that it will learn by example.

The default text-to-speech API in Apple isn't great, but I'm open to other ideas. I think that using the ChatGPT text-to-speech is an option, but I think it would be slower and too expensive.

Yes, I think that it is like Copilot, and I think that Apple might do something similar as a next-gen Siri. I suspect that Apple might not do this per se because of privacy restrictions, but it's hard to tell.

Let me try to see if I can make a quick demo with your calendar example. I think that's a great and very practical use-case: https://www.youtube.com/watch?v=SmMaZT0boJA (*edit: works out ok for Google Calendar! I think with a bit of prompting, you could get it to add calendar invitees and a description. The Mac Calendar app was harder to get working, so I'll try to look into it.)

By PaulMartz on Thursday, January 4, 2024 - 05:54

I love seeing development in this direction. Thanks for your work.

Scheduling an event is an excellent use case, as many calendar apps or websites are challenging. Especially challenging is any website that uses one of those horrendous calendar widgets in which every day is a button and there is no landmark to allow you to skip all 30 buttons in a month. Give me an AI that interfaces with UIs like that on my behalf, and I'll die a happy man.

Here's another use case. I just had a pretty poor experience with the Brevo.com website. I'm in a group that uses this website to send out marketing emails. I tried to go into the settings and authenticate a domain for sending emails. But the account menu was not accessible and I was unable to select the option to authenticate a domain.

I'd love to be able to open this website, log in, then tell an AI to authenticate a domain. It would be up to the AI to figure out how to do that, and once it navigated to the appropriate screen, it could prompt me to speak or type in any information necessary to complete the process. Added bonus if it copies the DKIM public key onto the clipboard for me.

Keep up the good work.

By kchro3 on Thursday, January 4, 2024 - 05:54

Thank you, Paul! I am so glad I posted here already. I added a YouTube demo of Google Calendar to my earlier comment, so happy to hear what you think of that and/or what could be improved.

I'll look into the Brevo.com use-case and try to see if I run into problems, thanks for the suggestion! I think it would be really cool to interoperate with dictation if that's what you're describing. I didn't think of that, and it's a very elegant solution to dealing with clarifying questions.

(*edit: Here is a demo video of adding a domain with Brevo.com)

By Angel Blessing on Thursday, January 4, 2024 - 05:54

Is there anyway that we as blind users can beta test this new screen reader for the mac?

By kchro3 on Thursday, January 4, 2024 - 05:54

Yes, of course! I would love if this community could try it out. The website is https://typeahead.ai, and you can click on the download button to install the app. It's currently free to use.

I was originally trying to take it a different direction (smart clipboard), so I would appreciate your feedback and patience as I fix things. If you find any issues or just want to know more, the best way to contact me is jeff@typeahead.ai. There is also a way to provide in-app feedback.

I am using ChatGPT as the backend, and I am keeping logs for up to 24 hrs for debugging and basic analytics.

By SiddarthM on Thursday, January 4, 2024 - 05:54

its a really a great inniciative

By Bruce Harrell on Thursday, January 4, 2024 - 05:54

Hi. I wrote to you separately, in which I urged you to concentrate on making your app enable the blind to define their own commands and strings of commands and to create a library of your own commands and a user library of user defined commands. I hope readers of my post here will add their own voice to these ideas. smile. Great job!

By kchro3 on Thursday, January 4, 2024 - 05:54

Thank you, Bruce! I mentioned this in my email, but in my app, you can press a hotkey to "record" yourself doing some action, and you'll be able to save that as a "Quick Action," which is like a shortcut or macro. I think that the goal is to let users define their own workflows by teaching the AI by example, but there is a lot of work to be done!

By Brad on Thursday, January 4, 2024 - 05:54

I'm not a mac user but this seams quite cool.

Could I be lazy and record myself downloading youtube-dl and then just type in something like download youtube-dl?

Or, maybe you could make a set of instructions that downloads a blind friendly software list.

I can't use this, I'm on windows, but if you can let people add their own recordings to a list and share them then I feel it would be even better.

Does it work with voiceover on?

By JC on Thursday, January 4, 2024 - 05:54

This is really awesome stuff! I have a few questions: first, Will this work with the Messages app for sending multiple messages to separate recipients as a single message? And is there an automatic update system implemented so that whenever a new update comes out with bug fixes and/or improvements it will be updated automatically just like all other apps? as you said, it could be a replacement to shortcuts, as the stocked shortcuts app is very confusing. for example, I have a shortcut called mass text message, where I send a message to multible people in sepret messages, and it would be cool if this app could do the same thing. for example, have the ai write a command, send a text to *friends name* saying hello! and apon doing so, write out the message, and after it's done, open up the message app with the text inserted, and after that, hit send, and it's done.

By Bruce Harrell on Thursday, January 4, 2024 - 05:54

Hi all,

Jeff, the app designer , hopes our first two emails below, will encourage you to give input. His response comes first; my original email to him comes aftere. Bruce

"Hi Bruce,

Thank you so much for your insight and for reaching out directly. I quit my job to work on this startup last month because I felt compelled to work on this, and hearing your feedback is validation that there is something here. If you are a Mac user, I’d love to ask for your feedback on the app, but I know that many people are not.

. . ..

I totally agree that everyone has individual needs, and you have a very valuable point that making blind people become more independent is paramount for this idea to really be meaningful.

I am trying to do something similar with the macro system you are describing, where you can press a hotkey to begin recording and the AI will try to create a “Quick Action” that you would be able to playback. It is still buggy and has a lot of problems, but I would love to talk to you more about your experiences and see if I can do something similar to what you’re describing. I think where the power of AI will come in is being able to build these macros in a more user friendly, non-technical way.

I love the idea of a tab with commands and making it shareable with the community. I think that is brilliant, and I’d love to explore this further. I am afraid that there are technical limitations with making Typeahead iOS friendly, and going around those technical limitations would likely violate the App Store's terms of service, at least to the best of my knowledge.

Anyways, if you are open to having a call with me at some point or just staying in touch over email, I’d love to do that, and if there is anybody that you think would be interested in talking to me, I’d love that too!

My LinkedIn is here: https://www.linkedin.com/in/kchro/, if you’d like to know more about my backgrounds, and thank you again for the kind words.

Jeff

On Jan 4, 2024, at 12:01 PM, Bruce Harrell <bbh@mind.net> wrote:

Greetings,

When I noticed your AppleVis post, I thought to contact you directly. I am a retired lawyer, a former Judge Pro Tem, and have been bLind since the age of twenty.

I am certain your project will have great interest among the visually impaired. A wide variety of possible TypeAhead functions and commands spring to mind, which I will set aside for now except one. ,Please be sure TypeAhead includes a simple, easy to understand and use UI to create, edit, save, delete, and execute commands, including stringing commands to build more complex often repeated tasks, with which the user would be enabled to:

1. Spare you from having to create an unimaginable variety of macros too numerous for an average lifespan;

2. Create their own commands and strings of commands to meet their individual needs.

Every user has their own needs. One can try to anticipate them when building the app, but if you focus on enabling the user to create their own in a simple easy t use interface, all you yourself need do is create the means for the user to do it themselves. Please believe me when I tell you that doing things for themselves is far more important to a blind person in all aspects of life than for the sighted. The sighted take independence for granted. The blind can't, and every tool the blind can use to become more independent is extremely valuable to the blind.

I realize the Apple Shortcuts app can be used by the sophisticated — I am not one — and I strongly suspect average users avoid creating shortcuts because of the difficulty in using the interface and because of the complexity involved in creating shortcuts. Instead, what I am hoping is that the input vocabulary you are building in TypeAhead would vastly simplify the command function creation process, including the creation of strings of commands, and the saving, editing, deleting, and user's choice to execute by voice or by keystroke.

I am reminded of the MSDOS version of WordPerfect I used in my law practice in the 1980's. It had a fabulous, albeit unintelligent macro system: Push a function key to record your macro; type out your macro text and/or commands; push the same function key again to stop recording the macro; push another function key to save the macro; type the macro text name or keystroke to be used to execute the macro; push the same function key again to conclude the macro creation process; and, to execute the macro, either push the saved keystroke command or push another function key, type the macro name, and push the same function key again,.

It was great! I had dozens of macros I used every day, and it vastly improved my workflow. Even though it lacked AI, int was easy and simple to use.

Like the shortcuts app, however, I hope you will include your library of commands as a tab, and create a tab for a library of user-created commands and strings of commands. I am confident the blind will add their creations. We know others similarly situated will benefit.

Incidentally, just to be sure, when I say string of commands, I mean a user defined command that executes a string of commands.

Last, I realize you might be well down a different road than what I am suggesting, and if so, oh well. Maybe in TypeAhead Ver 2.0. I do have one last request, however. Please be sure to make your app for IOS before making it for Mac. In IOS, many blind users have trouble with dictation accuracy. Worse, all blind people type incredibly slower on their onscreen keyboards as compared to their hardware keyboards. The greater need for TypeAhead is definitely n IOS.

Thank you for your hard work. You will benefit many thousands of the blind in the United States alone. Thank you, too, for being open to direct contact.

Joy to you and yours in this new year,

Bruce Harrell"

By aaron ramirez on Thursday, January 4, 2024 - 05:54

This definitely looks interesting an I'll be following this project closely. Out of curiosity, is this running on the self-operating-computer framework?

Based on my limited experimentation with it it's not at the point where it would be useful for a blind person, because GPT-4-Vision is really bad at estimating mouse positions, so I'd definitely be curious to know if you found a workaround for that limitation.

By Earth on Thursday, January 4, 2024 - 05:54

I love your idea! I'm an instructor at the Braille institute, the majority of my students are retired senior who recently just lost their vision. it normally would be too difficult for them to do tasks on the computer. with your project, it could help many of my students to become more independent. thank you for your time and effort. I will test it out and see what I can do to help.

By kchro3 on Thursday, January 4, 2024 - 05:54

First and foremost, thank you to everyone who has engaged with this post! I am getting a few emails with detailed questions, and I have gotten a handful of people downloading the app and trying stuff out. I know it's very early, so please bear with me as I keep developing this app, but I am so energized by your feedback!

I'll try to respond to some comments.

Brad: Could I be lazy and record myself downloading youtube-dl and then just type in something like download youtube-dl? Or, maybe you could make a set of instructions that downloads a blind friendly software list.

Yes, that is the goal. That's a brilliant idea, and I'll try to work on a demo of this use-case. I think that what I'd like to build is a workflow where you can record yourself doing some task once, like downloading an mp3 from youtube-dl, and the AI will save that as a "Quick Action" where you can do it again but for different videos. I will do some testing on my end to validate that it will work, but this is what I'm picturing.

Brad: if you can let people add their own recordings to a list and share them then I feel it would be even better.

Yes, I agree. I think it would be awesome to have a plug-in store, similar to NVDA's add-ons, where people can share workflows with each other.

Brad: Does it work with voiceover on?

Sort of, it's ironically not that accessible yet because I haven't added labels to things yet. I'll make sure to prioritize this, and I would love to know if people are running into issues. I haven't used VoiceOver before, so I would love to talk to people about what would make this a better user experience.

JC: Will this work with the Messages app for sending multiple messages to separate recipients as a single message?

Yes, I believe that you will be able to do this, but it might need very specific instructions. I would probably say something like "say Happy New Year to Bob and to Alice separately". I'll try to test it out on my end and post a video if I can get it working.

And is there an automatic update system implemented so that whenever a new update comes out with bug fixes and/or improvements it will be updated automatically just like all other apps?

Yes, I try to publish a new version every couple of days with bug fixes. Please email me at jeff@typeahead.ai if you run into issues, and I'll try to get on it ASAP.

Aaron: Out of curiosity, is this running on the self-operating-computer framework?

It is not, although I have heard of the self-operating-computer before. I think that vision is probably going to be an important component, but it is expensive and slow, so the tech may not be ready yet, in my opinion. I am getting around the issues of mouse positions by basically giving ChatGPT control of VoiceOver, if that makes sense.

Earth: I'm an instructor at the Braille institute, the majority of my students are retired senior who recently just lost their vision. it normally would be too difficult for them to do tasks on the computer.

Thank you so much, and I'd love to hear your feedback. I know that vision loss among seniors is a major issue, and I'm sure that it can be difficult to adjust to learning how to use screen readers. If there is anyone at the Braille institute that might be interested in talking to me, I would love an introduction. If you are interested in my background, my LinkedIn can be found here

By kchro3 on Thursday, January 4, 2024 - 05:54

Hi Oliver, I think that's a fair point for the Calendar example. I think a more interesting example would be if it could manipulate arbitrary calendar widgets on any website.

To the best of my knowledge, in order for Siri to be able to control another app, the app developer has to integrate with Siri, whereas with Typeahead, it can interact with any button or text field, making it more universal.

I think that it's easier said than done, so I will try to keep a pulse on what Siri is capable of too.

By Brad on Thursday, January 4, 2024 - 05:54

I've read the email from you and Bruce and am a bit concerned for you. You say you quit your job for this?

This might help the blind but is it really worth quitting a job where you have money coming in each month?

I really don't want to discourage you or put you down but quite a few sited people do this, they try and make an app/a device for the blind and then you find out they've not even tried or in some cases aren't even aware that something like their idea exists, for example, voiceover. What made you want to make this for those of us who use a mac when you've not tried the screen reader out first?

I only mentioned youtube-dl to be lazy, if I really wanted to I could download it from the website and use it on a mac,I use it on windows with a batch file from time to time.

I think what I'm trying to ask is what makes this stand out? At the moment this does seam quite awesome if you're older and perhaps not so good at the tech side of things, type in a command with very natural language and hit enter and it wil do it, and if that's your goal, great! But it seams you're aimingng this at everyone but without having tried VoiceOver I don't feel like you have a full idea of what can be done with it.

Sidenote: perhaps a talking function where you can speak to your computer like a smart assistant might be useful for those of us who want to be lazy or older people who can't type because they were never taught.

By JC on Thursday, January 4, 2024 - 05:54

I have another question: is it possible to tell the AI to record a voice memo by having it open up the app and click record without having to go into the app using VoiceOver, finding the record button, then VO spacebar on it? since I love to record audio, it would be cool if that could work.

By kchro3 on Thursday, January 4, 2024 - 05:54

This might help the blind but is it really worth quitting a job where you have money coming in each month?

I think so. I quit before I knew exactly what I wanted to work on, and part of the journey for me is figuring out what problem I want to solve. I really do think that the feedback I've gotten from this post and this forum is telling me that this is a good problem to focus on.

What made you want to make this for those of us who use a mac when you've not tried the screen reader out first?

For whatever it's worth, I want to emphasize that I'm not trying to replace VoiceOver. Not sure if this is a great analogy, but a self-driving car would probably still have a steering wheel. I have tried VoiceOver, and I have watched videos on YouTube of people showing how they use it. I am not proficient at it, but I can tell that it's not easy to learn because UI layouts like navbars, sidebars, and popup menus are built for sighted people. I also see a lot of posts in this forum saying that VoiceOver is not well maintained, so I think that the experience could be dramatically better.

What makes this stand out?

I don't think I have seen any screen reader do what I'm trying to do, although I'm happy to be proven wrong. I think that the ideal experience is that you shouldn't have to know or care about a website or app's version or layout because the AI will be able to figure out how to do what you want.

perhaps a talking function where you can speak to your computer like a smart assistant might be useful for those of us who want to be lazy or older people who can't type because they were never taught.

Yes, I totally agree, and I want to go in this direction. If you've tried ChatGPT's audio features, I really like the experience of being able to talk to an AI like you would to a person.

is it possible to tell the AI to record a voice memo by having it open up the app and click record without having to go into the app using VoiceOver, finding the record button, then VO spacebar on it?

It sounds possible, but I will look into it! I like this use-case too.

By JC on Thursday, January 4, 2024 - 05:54

I would like to have the option to turn off history. I know it saves the message history, but for me, I don't like to have the history active, forcing you to go into the history and clearing the message, and going into the chat and clear the history. Is there a way to have the history automatically disabled, for example, in the setting screen, have the option to disable history, so that you can use the app without having to go into the history in both the chat, and settings screen, to clear it every time. that would save plenty of time.

By kchro3 on Thursday, January 4, 2024 - 05:54

Hi JC, yes, I can add that in the next release

By JC on Thursday, January 4, 2024 - 05:54

Awesome! that way, once it's disabled, It will not save the history, and whenever I use the app, I can continue to use it, without overloading the history. also, I hope you recorded the videos with the use case. As I have said before, I use messages to send indivutial messages, and I would like to have a workflow to have the AI send messages to indivitual people, for example, telling the ai to say happy birthday to, XX, Xx, and XX, and have it sent as sepret messages. And also, it would be cool to have the AI record a voice memo by having it open up the app, and clicking the record button, and after you are done speaking, have the ai click done, and you can play the recording back. And finally, controling the ai with your voice would be awesome. Chat GPT has this option, so why not the mac AI? that would be cool.

By JC on Thursday, January 4, 2024 - 05:54

Keep history option is there as of the latest update, however, I still have to clear the chat history in the chat screen. could you also add an option for keep history in chats? so that when I turn it off, I don't have to worry about clearing the chat history every time in the chat screen?

By JC on Thursday, January 4, 2024 - 05:54

Hi, I have one more option for you to add to the general settings screen. and that is the option to control the AI using your voice. When this option is enabled, you can tell it to do a command, and it does it for you. If this checkbox is disabled, it goes back to the normal use of typing out the command.

By kchro3 on Thursday, January 4, 2024 - 05:54

RE: clearing chat history

If you want a fresh chat window, you can use the hotkey for a "new chat," which by default is cmd-option-N. You can reconfigure this in the Settings if you prefer.

By Brad on Thursday, January 4, 2024 - 05:54

I thought you'd not tried voiceover at all.

My last question is; how will you have the blind user know a website's layout? What I mean is we can jump around and use websites but there are those that are trickier to use and it seams you're wanting to try to fix that?

I can't really see it being done without already knowing the layout of the site? But one thing I can think of is forms, some aren't labeled and chat gpt4 might be able to put you in a buffer window, I think it's called, with the labels added? I think that a buffer window is a fake window, in other words it exists in the programs memory but not on the actual website/computer and is deleted as soon as the task is done.

If you make this as an IOS app, I can try it but as it's on mac I'll leave it to the rest of the comunity.

By mr grieves on Thursday, January 4, 2024 - 05:54

I like the idea of this and it's great that it is accessible. I think calling it a screen reader is a little misleading as it seems to serve a different purpose unless I am thinking about this the wrong way.

My current reservation with AI. is that it nearly nails everything but is never quite reliable enough to trust. So I personally need a bit of convincing that this is something that I could genuinely use.

I tried asking it to send an email to my wife wishing her a happy Saturday. It switched to Mail and I could hear various things being opened. After a little while I had an email. I thought it looked like it had her name ok, so I went to change the sender. I selected the email and pressed enter and then the email sent. I got it back and it had been sent to hername@example.com and I think it had included an Applevis email notification as the body. I guess that was the email I had selected before starting.

I then turned on narration and tried again. and it told me it was opening Apple Mail, which it did. But then nothing happened. After a while I was told that there was an error.

So I tried a third time, this time specifying both her email address and my (sender) email address. Again ot switched to Mail, and again nothing happened.

After a little while I switched to another window so I could start typing this. After a while, maybe a minute or two, it suddenly started opening up a mail window and then asking me to do things like entering in the emails, subject and body. Her email hadn't been filled in, so I did this but I kept being told "sender email address" or something and then focus would switch, and then it just went through all the fields. As it was doing this I think I switched back to this document and then heard "body" and moments later was told the message had been sent.

I presume that if it is sending down keystrokes, particularly if it is going to be this slow, then it's going to become affected by whatever I am doing. I certainly felt like I was fighting it and it kept talking over VoiceOver.

In this particular case I think I could have gone to Mail and written the whole message myself before the AI was even able to start the new message.

This probably isn't indicative of how it normally works, but for me if this sort of thing is to be useful it has to work reliably and understandably and also be faster than the alternative.

I am obviously not as skilled as most on here. As a sighted user I was very confident, now with a screen reader I feel I can work my way around the Mac, fun though it isn't. But I can also become easily disorientated. If the AI is flinging windows around and barking orders at me I think I will feel less in control than before. I feel nervous about playing with it as I don't want to suddenly find that I don't know what has happened and I'm having to crawl around trying to figure out apps I've possibly not used before. There's nothing worse than that sinking feeling where you are sure something has just happened, probably not what you wanted, but you have no idea what it was!

Apologies if this is coming across as negative. I realise that this is an early release and it is amazing that you have only been working on it a short time. I have also only tried to do the one task so far and not spent any time trying to understand it.

I am fascinated by AI and where we are heading with it. But sometimes I wonder if we are sometimes over-complicating simple things.

I've used this example elsewhere, but I've stood in my doorway practically begging my Echo to turn the lights on, and maybe after the fourth go it does, but I am literally stood next to the light switch. And I've never had an occasion where the light switch has misinterpreted my intent.

I think one of the problems with AI as it stands is that it literally promises everything, but you still need to understand what it can do and how to make best use of it.

Obviously this is where the world is going and no doubt things will improve as we go. I've not used Microsoft Copilot but I presume the idea is similar.

I would say for less computer savvy users it needs to be totally watertight. An experience like mine where I did the same task three times and got different results every time, and never quite as I expected would be enough for many people to quit and not try again. For experienced users it would need to be a lot faster.

Maybe its power isn't doing a single thing like sending an email but automating a workflow.

Anyway, sorry about this - I am a grumpy, cynical old man. But I am also fascinated by this and will definitely be keeping an eye on how it progresses. And when I get some more time I will try a few other things.

Thank you for making this available for us to try. And please don’t allow me to put you off!

By Brad on Thursday, January 4, 2024 - 05:54

The way I'd look at this is like this : the screen reader user can do these tasks, opening mail, browsing the web, etc etc,, so what makes this stand out.

Also voiceover is only hard for you to get your head around because you've not used it as other blind mac users have, for them, interacting with the bars and things is as easy as can be so are you sure as mr grieves says, you're not just over complicating things that can be done in seconds?

Other blind users seam to like this and I do too but I'd stil have not quit your day job over it, why? because you don't fully understand how voiceover works with webpages yet and things like that, until you can come to us and say I've built something that is on par or better than voiceover, I'd keep the day job.

I honestly think it's a bit silly just to drop a job to work on a program like this. DO you intend to charge for it, so you can get money back you'd make from quitting your day job? If so, how much will it be, keeping in mind that blind people don't get much disability money per month.

I hope you still have money coming in for your own safety.

I really should leave this topic alone lol, I'm not the intended audiance but it seams i can't let this job thing go.

By kchro3 on Thursday, January 4, 2024 - 05:54

Hi Mr. Grieves, thank you for sharing your negative experience and for your email as well with detailed feedback. I'll try to go through these examples myself, and this kind of feedback is important so that I know where things are going off the rails.

I agree that AI is in this weird place, where it's almost good enough for more applications. I think it's kind of like building a bridge, where you have to stress it until it breaks in order to build a better one. Not that I've ever built a bridge, but I imagine we had a lot of broken bridges before we got good at making them.

I appreciate your patience as I try to work through these things!

By kchro3 on Thursday, January 4, 2024 - 05:54

.

By Brad on Thursday, January 4, 2024 - 05:54

It seams like you've made a cool app but because you've not fully investigated voiceover and what it can and can't do you don't fully understand what blind people can and can't do on the mac.

Just because people say thing x is bad doesn't mean it's not doable, for example the safari not responding bug, it's slowly being fixed as far as we know.

My best advice to you is to play with voiceover, but the problem is, when sighted people do stuff blind people have been doing for years they get frustrated so assume it can't be done when they've just not pressed the right keystroke or aren't in the right mindset.
So ask us how to do thing x if you can't.

For example, and keep in mind i'm not a mac user, you might think that making it so that you can write press play and hitting enter and the AI does it would be cool, but, there's a shortcut, k, that does exactly that. SO be careful you're not just reinventing the wheel but with an AI twist.

was that dot post just a mistake or was it aimed at me, I can take it :) You're going to get push back like this because we aren't all the same person.

By Brad on Thursday, January 4, 2024 - 05:54

But now we need a nap...

All joking aside, I'm really concerned that the OP quit their day job to work on this, i hope you have money coming in in some way.

By Bruce Harrell on Thursday, January 4, 2024 - 05:54

Being human, we all have our littl quirks, such as having completely different notions about what someone else is saying. My notion is based on the following quote. If I was the op, I'd be feeling a little discouraged after some of the comments here

Anyway, the op, Jeff, wrote: " to demo it in action, and I'm curious to get your reactions and feedback to see if this is promising. To describe what's happening, there's a keyboard shortcut to open a chat window, and I typed in a command like "send an invoice from ABC to xyz@gmail.com." Typeahead recognizes that it's on the correct page, downloads the invoice as a PDF, opens the Mail app, and composes a message to the correct recipient and attaches the PDF. The app is also narrating each action so that it's transparent to the user what is happening, and there are buttons and keystrokes to interrupt it."

This isn't a screen reader. To me, what Jeff describes is a means to greatly speed up our workflow. However, because our workflows are different from one person to the next, it occurred to me the best approach would be a simple means in Jeff's app to record, save, and execute strings of actions with a single command. Just think about how much time that could save us. Is there something you do every day that takes a minute or two? How would you feel if you could enter a single voice or key command and voila! It's all done, just like the example Jeff offered.

Yes, that's what shortcuts is for, but what Jeff described is actually smething the dim witted among us can use. I am one of those dim witted fellows, and I don't mean using a shortcut to keep track of the water I've had to drink today. I'm talking about Jeff's app, which might just allow me to record and save my own string of commands without first having to read a 300 page manual to get things done. So, while you continue finding fault, doubtless now with me and my silly thoughts, I hope you will consider sifting through the chaff to find the grain instead of sifting through the grain to find the chaff. One helps; the other doesn't. Yes, of course, I'm completely wrong. Smile. I can live with that.

By Panais on Thursday, January 4, 2024 - 05:54

What if this tool could enable blind folk playing totally inaccessible games, like Rome total war for example? I don’t know if this is doable, however this would solve one of my many long-standing problems :)

By mr grieves on Thursday, January 4, 2024 - 05:54

You can download it here: https://www.dropbox.com/scl/fi/0thdd4vx8d43lo1lv5v8k/TypeaheadAI-2.1.17.dmg?rlkey=4x6rke6siso3xtdv68z8oolxe&dl=1

For my part, I definitely don't want to dishearten the OP. I think this sort of thing is a genuinely interesting idea, but I think it is also fair to point out that it still maybe needs a bit of focus so that we can understand how it fits in. Maybe this is something we can contribute to. But sometimes it's a little hard to know what to do with something that promises everything. I personally sometimes lack the imagination to see all the possibilities of these things.

It's fun to play about with, so well worth a download in my opinion. ANd it's going to be exciting to see where this all leads.

By Brad on Thursday, January 4, 2024 - 05:54

I just think that sighted people have ideas like this and try to rush them a bit.

I'd not keep going on about it but the OP has said he quit his day job so I'll make this my last post on here by saying this: I hope you know what you're doing and have money coming in.

By JC on Thursday, January 4, 2024 - 05:54

After downloading the latest update, I got it working! I told the AI to open Voice Memos and click the record button to start a recording. It worked with no issues whatsoever. I'm going to experiment later with more apps to see if it'll work. I just have to give it a proper prompt.

By kchro3 on Thursday, January 4, 2024 - 05:54

Hi everyone, sorry for the late response. I was busy over the weekend so I didn't get a chance to respond to some of the comments.

I've been getting a bunch of new users and some emails with feedback and questions, so I am really happy with the response and feedback. I have my work cut out for me, but it's great to have enthusiastic users to build for so please continue to reach out to me (jeff@typeahead.ai).

There were a lot of things that I didn't even think about until someone mentioned it, like the fact that reading out the steps as an imperative ("click" instead of "clicking") makes it sound like a command to the user, so this feedback has been really amazing. I was even asked to translate the app into Italian, so if there are more localization requests, I'd be happy to oblige.

I agree that I want to tighten up the product focus and zero in on the types of actions that are doable but annoying or not doable at all. As an aside, I've been trying to practice using VoiceOver more myself and navigating around things with my eyes closed, which I know is not the same but hopefully with practice, I'll become more proficient with it. I've heard people mention that scrolling through Terms and Service agreements and account creation forms can be painful, and I saw a Tweet saying that things like paying a bill can take a long time even with a screen reader.

I would like to establish more trust within this community by building features that people here want to see, and after addressing some of the bugs that people have run into, I'd like to try what Oliver suggested and make a new post asking about inaccessible sites and apps to try to add support for.

GUIs design for visual interaction and to display detailed feedback may be good for the sighted, they are dreadful for us. This would go far beyond OCR, it would be able to create objects we can interact with which the AI can then enact using mouse clicks and it's version of sighted manipulation. Think of it as someone sitting beside a blind user describing the screen with understanding of what the application does and asking the blind user what they want to do.

I'd like to explore this idea more. There are some apps (such as games) which don't have any VoiceOver support, so it could be interesting to use computer vision to describe what buttons there are on the page. I don't think it's very reliable yet, it could be better than nothing, so that's something I'd be interested to try out, especially if there's interest.

Brad: I'd not keep going on about it but the OP has said he quit his day job so I'll make this my last post on here by saying this: I hope you know what you're doing and have money coming in.

I want the community to know that I'm doing this because I want to, and I wouldn't have taken a risk I couldn't afford. I will try to be as responsive as I can, and the best way to support me is to try it out for yourselves and tell me what you think could be better.

Thanks!

By Ash Rein on Thursday, January 11, 2024 - 05:54

I would like to say, I am very grateful for all of your efforts. I know that the road is hard. And I know that ultimately if you stick with it, several things are going to happen. One way or the other, things will change for the better. as you work, more and more people are going to end up, encouraging you and even working with you. Everything in the world is hard until it’s actually done. And I for one and more than willing to donate money to ensure that you have a way to support yourself and to continue the work.

Keep going. Keep pushing. This is more than exciting and definitely definitely needed.

By kchro3 on Thursday, January 11, 2024 - 05:54

Thanks for the kind words Ash Rein! I see that you had participated in a conversation about 3rd party screen readers, where you advocated for the importance of choice.

From my standpoint, there is little point making an NVDA for Mac because: 1. Being incrementally better won't be good enough to convince people to switch and relearn key mappings, etc 2. VoiceOver and the 3rd party apps will eventually converge, so any incremental improvements could be copied in the next release.

What I'd like to do is build something that's orthogonal to VoiceOver, so it adds value rather than trying to do the same thing. I will need the support of this community to get that right, but I think this is promising.

By the way, if you try out the app or if you would like a 1:1 onboarding session, please don't hesitate to reach out to me by email (jeff@typeahead.ai). The feedback so far has been amazing.

By Brad on Thursday, January 11, 2024 - 05:54

I know I said I'd leave this but I can't because of my 20 odd years of using a screen reader so will comment when I feel it's needed.

As someone else pointed out, this isn't a screen reader, it's an app. I really really do think you need to sit down, for a month let's say, and use voiceover, perhaps you could pay a blind person to help you, perhaps you could both be on a call and use team viewer or something like that to connect to each others computers and you will then see how screen readers work vs your app.

This might sound harsh and others can tell me if i've gone to far but at the moment to me at least; it sounds like you've thought you'd fill a roll without knowing anything about the role you want to fill.

Here's why I say this, at the moment you can't tell me what your app does better than a screen reader or why I should pay for it when a free screen reader exists.

Yes voiceover has bugs, yes there's a safari busy thing going on and i'm sure there are others who actually use the mac who can point out other issues but mate,, at the moment I really don't think you understand what you've gotten yourself into because you don't understand how a screen reader works.

For example; I could type in to your app, how many headings are on this page, or I could just load the page with voiceover and get that question answered in seconds.
it's the same with buttons,, form fields and so on. The only thing I can see your app possibly doing, and there's issues with overlays like this too, is adding button labels to unlabeled buttons but even then I think you'd have to thai your app into the Voiceover screen reader and I don't think apple allows you to do this.

What I'm trying to say is at the moment you're reinventing the wheel without fullly understanding how a wheel works.

I know it seams I'm repeating myself but there's a good reason for this.

Sighted developers often do this; they'll come into a space like this and say I've made idea x! Isn't it great! Then people will say actually, we can already do this and the developer more often than not gets pissed off.

Well you can get as pissed off with me as you like, I don't mind.

But the fact you can't see why making NVDA on the mac would be a good idea tells me you've not used voiceover enough, or a windows machine, with NVDA or asked enough questions to fully understand what you're getting into.

NVDA on the mac would not just mean a couple shortcuts difference. It would mean there would be more addons, pages would load in a logical way, top to bottom instead of left to right, (that could just be a me thing but that never made sense to me,) there'd be an actual support group that would listen to the mac users and more.

Believe it or not; i'm really not trying to discourage you from making this, I just want you to fully understand that at the moment, what you're making isn't a screen reader.

At the moment it seams to me at least you've made a tool that allows people to be lazy and that's about it, I hope you can prove me wrong.

By kchro3 on Thursday, January 11, 2024 - 05:54

> I hope you can prove me wrong.

Haha, yeah, I will try. I think you're casting way too much judgement on something that you haven't tried.

By mr grieves on Thursday, January 11, 2024 - 05:54

One thing that has always slightly confused me about this app, right from the original post, is what makes it an app for the blind specifically?

Of course it is being developed with accessibility in mind. I'd naively like to think this should be how everyone develops any app, but we all know that isn't the case so it's always nice when it is.

The narration is, of course, a big part of that. But then we wouldn't say that Siri or Alexa are tools for the blind specifically, it's just we are beneficiaries. And it is fulfilling a similar role to the responses they give.

So I'm struggling to really answer that question. But that's OK, and maybe it helps not treating it as an app that is putting a lot of effort into being good for us to use as opposed to being an app only for us to use.

As we all know, AI is being added to absolutely everything these days. It won't be long before tea bags have built in AI that can write you a poen as they are slowly dying in boiling water.

On Windows, there is Co-pilot. I've not used it, but I have very similar feelings about it - as in, what am I actually going to get out of it? Why is it there? Why, in that case, does it need its own button on the keyboard?

I'm an old man, set in his ways, so I am not the right person to answer that. But I feel like if we are trying to express how we use a computer now in terms of AI then maybe we are missing the point.

If you use Alexa to give you an answer, you aren't talking to it in terms of headings or form fields or whatever else we might use when we are working with VoiceOver. We are trying to accomplish tasks.

For something like this, or Co-pilot to succeed it needs to be able to do these things faster and more efficiently than we can, screen reader or not. Now maybe this is because it is working like a macro and batching together repetitive tasks, or whether it's doing something that we just don't know how to do. Or maybe in our case something that can't be done with a screen reader.

But I would expect to be able to tell it to perform a task and then not to have to worry that it might be opening a web browser to such and such a site, going to a particula form and filling it in, or opening up my calendar or whatever it is doing.

Although I echo the sentiment that AI seems to have been born a solution without a problem, we can't treat it like a normal app that we may have used in the past. We can already see evidence of real problems that have indeed been solved. The most obvious example to me is Be My AI which still blows my mind every time I use it. But AI is being used for all sorts of things, almost none of which were likely conceived from the beginning.

At times it can be like getting a big bucket of assorted Lego pieces. Some of us might be able to build incredible structures from those pieces, but others, like me, will just scratch their heads and just see funny little bobbly bricks and maybe we can see the potential but it's hard to know how to get there.

It's hard to know how best to take TypeAhead. As a final product, it's slow, unreliable and I just don't know what to type into that big box of it most of the time, although it's fun to play with. But taking it as a prototype and something that maybe we can help shape then it is quite interesting although I am maybe hoping that those with more imagination than me can help push things forward until it clicks.

Would I pay for it? Well, no, not until I understood what it would do for me. The arguments on here about how it maybe needs to do a small number of things is fair. I don't think an AI like this is really a one-task tool, but if we can see some small concrete examples of what it will do for us then maybe it will start to make more sense. It definitely needs some strong use cases. But maybe the problem is that we all do different things. So maybe creating an invoice and sending it in an email works for some, but not for me. Maybe some things that can automate my job might work, but then PyCharm has started bringing in AI smarts of its own that may be more suited to me. And maybe the rest of my life isn't complicated enough to warrant automation. Or maybe what I need is too specialised. Or maybe one day I will finally have the imagination to try a new career path and this will help me take my first steps. I dont know but feel that sometimes it's easy to be constrained by what you currently know vs the potential for going somewhere entirely new.

AT times, AI powered things feel a bit like the Emperor's New Clothes, but I'm sure we've all seen places where it has been truly impressive and I've no doubt it is going to be a big part of our future. It's just hard to know exactly how right now.

On the subject of the dev giving up his job for this, that too concerns me a little. It's nice to see concern on here because it means we care. But also that we don't want to feel responsible for someone making a bad financial risk.

The way I'm trying to see this is that if this app doesn't strike gold, the experience and learning that will come from the building of it will be incredibly valuable. I only wish I had the mental capacity to engage with AI like this - maybe one day.

At any rate, let's not kick the dev for trying to engage us like that. I don't think any of us currently know where this project is going to end up, but that doesn't mean it's worth taking a few steps to see where it leads.

And it's nice to be part of the conversation and not just an afterthought.

By Brad on Thursday, January 11, 2024 - 05:54

It's not because I don't want to talk; I do; it's just I want everyone to read our chats.

You mention that others have emailed you saying that they've had to wait for their partners to click a button for them and I then remembered that VO and NVDA are different in another way.

They both use drilling down ways of getting things done, (VO because of the way it interacts with stuff and NVDA with a couple of commands,) this is used on NVDA to see if you can get more info, there's also OCR for both platforms I believe but NVDA's is better as you can click items on a screen, I could be wrong; but I don't think you can do that with voiceovers' one.

i've noticed that in your video you showed me, you don't use voiceover at all, if you're going to show videos, i'd highly recommend turning voiceover on as the community here use it and are very unlikely to turn it off to use an app.

You say in your mesage that you're thinking of ignoring me and that you want to work with more open minded people, and that's fine but I'm tellling you, if these more open minded peple can't clearly figure out a use for your app you're going to go back to square one.

let me ask you some questions: 1. I'm using voiceover an am on a page with an unlabeled button, can I type in to this window, "what's the unlabeled button for?" Or Would I have to type something like, "what's the unlabeled button for on www.test.com/text?&quot;

2. I have an app open that is completely inaccessible to Voiceover, how am I getting around it with your app? How will I; as a blind person, know what is on the screen? Would I be able to type, what is on screen x and if i can, can I then arrow up and down with voiceover to view the response?

3. I have mail open and this inaccessible app, this was all done before I wrote the command, I then write, "click on the toolbar and click on open," is it going to know I mean the inaccessible app or will it open mail instead?

You say in your message that you've been on voice calls and that others seam to get what this can be used for and that's great! It really is, but i'm just thinking of use cases for myself.

I really like the idea of inaccessible apps being made accessible, the only downside to that is blind people are probably going to already have apps that work for them, although I'm sure they'd love to try out new apps.

The only thing is; I don't think people are going to pay for your app in its current form because even if you make apps accessible, (which would be amazing don't get me wrong,) why would they pay for an app that they'd use for trying out inaccessible appps when they have free ones that work?

By kool_turk on Thursday, January 11, 2024 - 05:54

Brad makes some very good points.

I'm not going to comment much because I don't use a mac.

Looks like someone beat me to mentioning the Rabbit R1.

My concern with that thing is, it has a touch screen, which you use to confirm some of the prompts, like when the guy was booking a holiday.

I'll have to see what their thoughts are on accessibility.

There is also the Humane AI Pin, with very little UI elements because a majority of the heavy lifting is done using what they are calling AI experiences.

Again, that one seems accessible, but how accessible is their web portal, which you would need a computer for.

If it's like their site, then it's not all that screen reader friendly, because there's a lot of elements where you can only see things when you hover the mouce over it.

I'll have to take a look at Rabbit now that they've announced their product, but I don't like the sound of carrying another thing in my pocket.

Oh and the voice on the Rabbit R1 sounds better than the one for the AI Pin, much more bright and bubbly.

The one on the AI Pin sounds clinical and bored.

Plus, you can teach the R1 things, like the 3d printing thing mentioned in a previous post.

You would probably need a sighted person to teach it a bunch of things that you can't do yourself, kind of like the old days when you would record macros.

It would be even better if you could share those with others.

Exciting times are ahead.

By Brad on Thursday, January 11, 2024 - 05:54

I haven't got a response yet, this was about 3 or so days ago, so I'm not sure they really care much about accessibility. Or they could just be dealing with a lot of back orders.

By kchro3 on Thursday, January 11, 2024 - 05:54

I'm setting up user interviews over the next couple weeks, and there have been multiple version updates since last week based on people's bug reports and requests. Thank you to everyone who has reached out and contributed their feedback!

If anyone is interested in chatting with me, I'd be happy to discuss over call or email.

I'll be sure to post a new thread when there is a major update.

By kchro3 on Thursday, January 11, 2024 - 05:54

If you're interested, I put together a demo use-case where you might want to ask detailed questions about a 3D model on printables.com. https://www.youtube.com/watch?v=d1MO3JG-3-s Open to feedback on how it could look.

I also tested with PrusaSlicer. It was able to change the infill, but it was flakier on other tasks. You can configure quick actions to add extra hints on how to do a task, so it could work with some tuning.

By kchro3 on Thursday, January 11, 2024 - 05:54

> basically pulling all the information into a screen reader friendly format we can read and act upon including finding log in and out buttons, contact, etc and repackaging it into something we're not struggling over.

Hm, that's interesting. Would that be better than just saying "log in" or "log out"?