I just read this article. I won't say much about it here, but for a quick overview, some researcherrs at the University of Michigan have created a program that provides real time description for blind people. It'll be shown off soon and looks incredibly promising, if the article is to be believed. Below is the link to the article itself:
https://techxplore.com/news/2024-10-ai-powered-software-narrates-visually.html
Be very interested to hear people's thoughts.
By inforover, 16 October, 2024
Forum
Assistive Technology
Comments
Interesting indeed.
I'd comment after I read the actual journal article... Because I can foresee several limiting factors...
Ann Arbor
Some of the most intelligent eye doctors I have ever met, were members of University of Michigan Ophthalmology. Had a couple of eye surgery there as well, many years ago. Scary smart people there. True story.
interesting developments
This is actually possible and doable. The live recognition feature on iphones is almost doing that.
I don't think this is for me.
I just don't need all that description and the voice is flat, like really flat.
I don't want to know what's around me, I want to get to place x as soon as possible.
Also, I noticed it didn't seam to read any text, I was thinking this might be useful if I trained it to look at my tv and tell me what the text on the screen says but I don't think that's doable with this.
It's an interesting idea but for me it's another, let's jump on the blind AI hipe train device.
It also uses chat gpt4 for one of it's moddles so I wonder if you'll have to pay for that.
I'm thinking about who I am and I find it interesting, I really don't care about visual details where as other blind people do, if you wanted proof that if you've just met one blind person then you've just met one blind person, here it is.
I'm trying to think of a reason I'd want a device that describes things to me and I'm coming up blank, the only thing I can think of is if I was in a shop and looking for something.
Also, acording to the article, you have to walk slowly for it to work, that's a set back but hopefully they can improve upon that.
So yeah, over all, this isn't for me, it's interesting what's happening for the blind these days but if we're just going to keep having these tools that do very similar things, I think i'll loos interest quite quick.
Outdoor sport adventures?
This initial step paves the way for the eventual realization of a scout, guide, or caller role in outdoor sports adventures, encompassing activities such as skiing, rock climbing, and hiking. The advent of artificial intelligence presents us with an extraordinary opportunity to harness its capabilities, fostering empowerment, independence, and the freedom to pursue our passions at our convenience.
Use-cases
@Kevinchao yes, and I could think of 10 other use-cases just off the top of my head. Navigating airports/metro stations/ other transport, Watching TV/movies without audio-descriptions (with the app providing seen-byseen descriptions), Just taking a printed piece of paper/book and reading it, going to a museum, zoo, art galary or whatever, reading the contents of a powerpoint presentation while in a meeting, and certainly, andey lains ducks. But all of this will work in a seemless manner if the tech comes into a wearable like a smart glasses.
Use cases
This sort of thing does have plenty of use cases for us, although some can be done already.
There was a great demo on Double Tap of using Celeste to tell you about obstacles as you walk down the street. It took photos every few seconds and toldyou about important things. (e.g. watch out for the car on the right).
On iaccessibility.net there was a good description of someone using the Meta Ray-bans to help them go shopping, where they could look and ask to help find things. This one isn't a stream of stuff, just one picture at a time.
For navigation, tech like Glidance might have some solutions -= for example, one of their demos was about finding the door to the place it had navigated you to. Being an all in one navigation thing might have some benefits over glasses because it's not just telling you what's there, it is helping you get there too. Plus it has all sorts of other sensors that are geared up to this use case. I think for navigation, something like this is likely to end up being better than glasses, although that's not to say that the two couldn't be used together.
Actually, can you imagine if Glidance just took you where you were going so you didn't need to concentrate on that and could instead be told about the world around you. Right now I feel that if I'm out it's purely to go from point A to point B and I get nothing much out of the journey itself.
I remember the ChatGPT 4o Andy Lane did where he was able to hail a taxi which is maybe more about the real-time video streaming that this promises.
I'm not convinced about the use of Audio Description - this seems like the wrong way to go about it. (IE taking a video of a video, then uploading the video stream to find out what video your video is videoing.) Whereas maybe Apple TV or whatever could be doing that for us. I suppose the one advantage it might have is when you want to watch AD with a sighted person who doesn't want to listen to it. But I still think this feels like a bit of a clumsy way to solve that problem.
On the one hand with all this sort of thing we are getting a bit over-saturated with different things promising to do everything for us. But I welcome that - inevitably one of these solutions will strike gold and then we'll be laughing. So the more the merrier.
@mr grieves.
I agree.
I'm not doom and gloom about AI, I like it and I think if I get out a bit more next year I'll enjoy it even more.
On the video description front, I'm honestly not sure how to solve it, the piccybot way of doing it isn't for me, it's great for what it can do, but it can't play the audio and description at the same time at the moment, and that's what i'd want in an app.
I don't think we're there yet power wize but it'll be exciting to see what happens next year.
The thing is, we're inching ever closer to what we all need as peple, some people are there yet and are happy, others like me, want to push the power to the max and see how far we as humans can go.
AI burns
I'm not sure what I want anymore after the duck/chat-bot letdown. It seems like exciting things either don't pan out, or deteriorate in quality shortly after the launch.
Watching urban fauna through real time AI does seem like a good first goal, but also using what ever external or internal camera one wants on a device with the AI app; glasses, mounted camera, just the phone etc. The flock, or would it be a pandemonium, of feral/naturalized lovebirds that roost in my palm trees would be fun to have described from my porch and a mounted bird-cam.
My thoughts
As far as wearable tech goes, I think as it stands right now, Meta is where it’s at. Now, if only Meta would open up their hardware for 3rd-party software developers, it would indeed be gold.
The duck-thing woman!
I was talking to her the other day and I said 'talk like your tipsy' she did. I said 'talk in a scottish accent' she did that to. She's still as blind as me, but fun to talk to,like me.
Has anyone use Ally? I'm confused...
Question on Pikcybot
Does it really describe video end to end, or, grab the thumbnale/randum frame from video and describe that?
@SeasonKing
it grabs the video and describes it.
You can only do up to a mminute or two I think for now, so shorts basically,. It works, it's just not at my kind o usecase just yet.
@Charlotte Joanne I've tried it, chat gpt voice response thing, right?
It was fun for a bit but I ended up deleting my account again, what's it now, 3 acounts? Yeah, sounds about right.
I really need to stop being sucked into these hipe things.
It's fun hearing an American voice do a very good northern UK accent, I even corrected it a bit and it took that on bored, but I just can't see myself using it in a day to day situation.
If it were on glasses that might be diffirent or when we get a live feed I might be interested once again, but I'll sign up through whatever app it is then and not their site.
Audio description
@Mr g, my idea was not for AI to produce audio description the way we do it now, but rather for it to look at the screen, and provide a real-time description of what's going on there. may be it's not the best use-case as far as movies/tv shows are concerned, but say if you're watching something live like a sports match or something...
@lottie yes I've had access to envision assistant/ally for some time now. What're you confused about?