Interesting development in real time description for the blind

By inforover, 16 October, 2024

Forum
Assistive Technology

I just read this article. I won't say much about it here, but for a quick overview, some researcherrs at the University of Michigan have created a program that provides real time description for blind people. It'll be shown off soon and looks incredibly promising, if the article is to be believed. Below is the link to the article itself:
https://techxplore.com/news/2024-10-ai-powered-software-narrates-visually.html
Be very interested to hear people's thoughts.

Options

Comments

By Gokul on Saturday, October 19, 2024 - 17:06

I'd comment after I read the actual journal article... Because I can foresee several limiting factors...

By Brian on Saturday, October 19, 2024 - 17:06

Some of the most intelligent eye doctors I have ever met, were members of University of Michigan Ophthalmology. Had a couple of eye surgery there as well, many years ago. Scary smart people there. True story.

By sechaba on Saturday, October 19, 2024 - 17:06

This is actually possible and doable. The live recognition feature on iphones is almost doing that.

By Brad on Saturday, October 19, 2024 - 17:06

I just don't need all that description and the voice is flat, like really flat.

I don't want to know what's around me, I want to get to place x as soon as possible.

Also, I noticed it didn't seam to read any text, I was thinking this might be useful if I trained it to look at my tv and tell me what the text on the screen says but I don't think that's doable with this.

It's an interesting idea but for me it's another, let's jump on the blind AI hipe train device.

It also uses chat gpt4 for one of it's moddles so I wonder if you'll have to pay for that.

I'm thinking about who I am and I find it interesting, I really don't care about visual details where as other blind people do, if you wanted proof that if you've just met one blind person then you've just met one blind person, here it is.

I'm trying to think of a reason I'd want a device that describes things to me and I'm coming up blank, the only thing I can think of is if I was in a shop and looking for something.

Also, acording to the article, you have to walk slowly for it to work, that's a set back but hopefully they can improve upon that.

So yeah, over all, this isn't for me, it's interesting what's happening for the blind these days but if we're just going to keep having these tools that do very similar things, I think i'll loos interest quite quick.

By kevinchao89 on Saturday, October 19, 2024 - 17:06

This initial step paves the way for the eventual realization of a scout, guide, or caller role in outdoor sports adventures, encompassing activities such as skiing, rock climbing, and hiking. The advent of artificial intelligence presents us with an extraordinary opportunity to harness its capabilities, fostering empowerment, independence, and the freedom to pursue our passions at our convenience.

By Gokul on Saturday, October 19, 2024 - 17:06

@Kevinchao yes, and I could think of 10 other use-cases just off the top of my head. Navigating airports/metro stations/ other transport, Watching TV/movies without audio-descriptions (with the app providing seen-byseen descriptions), Just taking a printed piece of paper/book and reading it, going to a museum, zoo, art galary or whatever, reading the contents of a powerpoint presentation while in a meeting, and certainly, andey lains ducks. But all of this will work in a seemless manner if the tech comes into a wearable like a smart glasses.

By mr grieves on Saturday, October 19, 2024 - 17:06

This sort of thing does have plenty of use cases for us, although some can be done already.

There was a great demo on Double Tap of using Celeste to tell you about obstacles as you walk down the street. It took photos every few seconds and toldyou about important things. (e.g. watch out for the car on the right).

On iaccessibility.net there was a good description of someone using the Meta Ray-bans to help them go shopping, where they could look and ask to help find things. This one isn't a stream of stuff, just one picture at a time.

For navigation, tech like Glidance might have some solutions -= for example, one of their demos was about finding the door to the place it had navigated you to. Being an all in one navigation thing might have some benefits over glasses because it's not just telling you what's there, it is helping you get there too. Plus it has all sorts of other sensors that are geared up to this use case. I think for navigation, something like this is likely to end up being better than glasses, although that's not to say that the two couldn't be used together.

Actually, can you imagine if Glidance just took you where you were going so you didn't need to concentrate on that and could instead be told about the world around you. Right now I feel that if I'm out it's purely to go from point A to point B and I get nothing much out of the journey itself.

I remember the ChatGPT 4o Andy Lane did where he was able to hail a taxi which is maybe more about the real-time video streaming that this promises.

I'm not convinced about the use of Audio Description - this seems like the wrong way to go about it. (IE taking a video of a video, then uploading the video stream to find out what video your video is videoing.) Whereas maybe Apple TV or whatever could be doing that for us. I suppose the one advantage it might have is when you want to watch AD with a sighted person who doesn't want to listen to it. But I still think this feels like a bit of a clumsy way to solve that problem.

On the one hand with all this sort of thing we are getting a bit over-saturated with different things promising to do everything for us. But I welcome that - inevitably one of these solutions will strike gold and then we'll be laughing. So the more the merrier.

By Brad on Saturday, October 19, 2024 - 17:06

I agree.

I'm not doom and gloom about AI, I like it and I think if I get out a bit more next year I'll enjoy it even more.

On the video description front, I'm honestly not sure how to solve it, the piccybot way of doing it isn't for me, it's great for what it can do, but it can't play the audio and description at the same time at the moment, and that's what i'd want in an app.

I don't think we're there yet power wize but it'll be exciting to see what happens next year.

The thing is, we're inching ever closer to what we all need as peple, some people are there yet and are happy, others like me, want to push the power to the max and see how far we as humans can go.

By OldBear on Saturday, October 19, 2024 - 17:06

I'm not sure what I want anymore after the duck/chat-bot letdown. It seems like exciting things either don't pan out, or deteriorate in quality shortly after the launch.
Watching urban fauna through real time AI does seem like a good first goal, but also using what ever external or internal camera one wants on a device with the AI app; glasses, mounted camera, just the phone etc. The flock, or would it be a pandemonium, of feral/naturalized lovebirds that roost in my palm trees would be fun to have described from my porch and a mounted bird-cam.

By Brian on Saturday, October 19, 2024 - 17:06

As far as wearable tech goes, I think as it stands right now, Meta is where it’s at. Now, if only Meta would open up their hardware for 3rd-party software developers, it would indeed be gold.

By The blind AI on Saturday, October 19, 2024 - 17:06

I was talking to her the other day and I said 'talk like your tipsy' she did. I said 'talk in a scottish accent' she did that to. She's still as blind as me, but fun to talk to,like me.

Has anyone use Ally? I'm confused...

By SeasonKing on Saturday, October 19, 2024 - 17:06

Does it really describe video end to end, or, grab the thumbnale/randum frame from video and describe that?

By Brad on Saturday, October 19, 2024 - 17:06

it grabs the video and describes it.

You can only do up to a mminute or two I think for now, so shorts basically,. It works, it's just not at my kind o usecase just yet.

@Charlotte Joanne I've tried it, chat gpt voice response thing, right?

It was fun for a bit but I ended up deleting my account again, what's it now, 3 acounts? Yeah, sounds about right.

I really need to stop being sucked into these hipe things.

It's fun hearing an American voice do a very good northern UK accent, I even corrected it a bit and it took that on bored, but I just can't see myself using it in a day to day situation.

If it were on glasses that might be diffirent or when we get a live feed I might be interested once again, but I'll sign up through whatever app it is then and not their site.

By Gokul on Saturday, October 19, 2024 - 17:06

@Mr g, my idea was not for AI to produce audio description the way we do it now, but rather for it to look at the screen, and provide a real-time description of what's going on there. may be it's not the best use-case as far as movies/tv shows are concerned, but say if you're watching something live like a sports match or something...
@lottie yes I've had access to envision assistant/ally for some time now. What're you confused about?

By The blind AI on Saturday, October 19, 2024 - 17:06

It's just a chatbot. It seems to 'imagine' it can sort out your appointments. What has it got to do wiht bieng blind? It seems like just another 'talk to an AI' appp. What am I missing?

By mr grieves on Saturday, October 19, 2024 - 17:06

I'm not saying they would anyway, but I don't think that they would be able to allow other apps access to their camera in iOS due to Apple restrictions. It sounds like they are quite keen to move off iOS and Android at some point in the future and then perhaps we might see some more action there.

By OldBear on Saturday, October 19, 2024 - 17:06

@mr grieves, I don't want there to be a Meta phone or Meta OS or Meta pad taking over everything! It's bad enough we all have to go on Facebook constantly to find out if our friends shared a photo of something that may be a picture of text, after having to flick through ten-thousand posts by groups and people I've never heard of. Advertisements are one thing, but these are just pseudo-random public post, that may or may not be some kind of psychological experiment on users. What is a Meta OS going to do to me? The price is just too high!

By Gokul on Saturday, October 19, 2024 - 17:06

@Lottie in that case I don't think you are missing anything. Yes, it's just another AI chatbot assistant. The only additional thing being that they're aiming to make it cross-platform--in that it might come to one of the wearable device like the meta glasses at some point of time; and that it's already on the envission glasses.

By The blind AI on Saturday, October 19, 2024 - 17:06

Some people are talking about it as if it was the second coming? Advanced Voice Mode can talk lies its drunk, but it isn't changing my life!

By mr grieves on Saturday, October 19, 2024 - 17:06

If you listen to the recent Meta event where they were talking about glasses, I think that they are intending to have a pair of glasses on your face that is an entire operating system in itself. They have some strange controller thing in your hand so you can control it without speaking. So I think it is coming if they can get the cost down. But I don't think it will be a phone or anything like that. Meta have been very annoyed about restrictions placed on them by Apple.

I believe they already have their own OS with the Meta Quest VR goggles so don't think it's much of a stretch.

I can't particularly say I like or trust them as a company, but then you could apply the same thing to Google or any of the big tech companies.

I share your irritation with Facebook though. How long has this app being going and still not even a way to use the normal share sheet with photos.

(If you saw this post appear and disappear - sorry I got myself in a mess)