The quest for better and better computer vision (CV) helps blind and visually impaired people who can use computer vision to replace or enhance the vision they have lost. This has mainly focused on reading and reading printed material at that. The first widely available reading machines came to the UK in the late 1980s. I remember getting one at the time and I think, from memory, it cost the same as my annual salary for a twenty-year old. Now the free app Seeing AI on my iPhones reads almost everything I need.
In this latest AI for Accessibility scouting report, I have been on the far frontier, I am introducing a different use for CV and drawing inspiration from another white-hot area of AI research, robots. Specifically I am looking at 3D movement analysis and sentiment analysis.
3D Movement Analysis:
3D movement analysis focuses on understanding and interpreting human movements in three-dimensional space. This technology is being applied to various domains, such as affective robotics, video surveillance, e-Health, and interactive gaming. By analyzing body language, AI systems can better understand human emotions and intentions.
Sentiment Analysis:
Sentiment analysis is an NLP technique used to determine the emotional tone or sentiment conveyed in textual data. It involves categorizing sentiments as positive, negative, or neutral, providing valuable insights into public opinion, customer feedback, and the emotional context of textual information.
By combining 3d movement analysis with sentiment analysis, via smart glasses, a blind or visually impaired person would be able to receive spoken feedback on the body language of the person they are talking to.
Imagine this being extended to facial expressions, you would get the information that is present in speech, even if the person isn’t saying anything.
the proposed combination of 3D movement analysis and sentiment analysis through smart glasses represents an exciting step forward in leveraging AI and computer vision to enhance accessibility and social interactions for the blind and visually impaired community. However, careful considerations regarding privacy, accuracy, and ethical implications would need to be addressed to ensure responsible and beneficial implementation.
I don’t think this will happen. I don’t even know if it should and if I would want it if it did. What do you think? Are we human or are we dancer?
Let me know in the comments.
Comments
I think this will happen.
Anything that gets research resources is very likely to happen and theirs so much research in this area at the moment. Self driving cars need to understand context over time. Law enforcement already buy predictive models that predict crime patterns. Understanding intention is going to be central to any AI thats capable of understanding, predicting and working with people. I think it’s when not if and like you, I can’t wait. Did you see the recent demo of a robot that understood someone wanted food so picked up and offered the only food item then while it was clearing up the rubbish into the bin it was explaining why it did what it did with the food and checked if it got everything right. If you haven’t seen it. I’ll find it for you. It’s fascinating to see where we are already. These capabilities might not be distributed widely yet but they already exist and will get more and more capable, all while the cost is falling because of scaling and mass adoption.
Video.
https://youtu.be/Sq1QZB5baNw?si=q2B5AKrUtn7g0djv
It’s called Figure 01. They only started work in 2022. NVIDIA are also doing crazy things in this area. For robots to work, they need to understand human intention as you outline. If they can understand it then they can sure explain it to blind and low vision people.
The voice is incredible too.
It kind of sounds a bit unsure, down in the dumps and vulnerable. If I had this robot. I’d just want to give it a cuddle and tell it I’m going to be a nice robot owner and it’s got nothing to worry about.
It's already been done
But not on a phone-sized scale, in regard to body language. In another thread, I brought up Alex Pentland's MIT experiments with computer analysis of body language communication, as detailed in his book, Honest signals, from back in 2010. The point wasn't about computer processing, rather that we communicate on unconscious levels with body language, but the team used powerful computers to predict the likely outcome of interactions that were recorded by little cameras worn in badges by participants. It could have only gotten better since then. Plus, since a lot of this was visual, I can only imagine we blind folk are missing a lot of that unconscious communication.
I don't know what "NLP" is.
Oh
I should have been able to figure that out. I need better natural language processing and spelling, because trying to get my posts to sound right is exhausting
The video of the robot... I guess maybe it's the pauses before answering and the background music that seemed a little creepy, like if it were considering whether to run amuck and start attacking, or just answer. Have it say "hmmm" if it's having to process/think about something.
I read a science fiction novel about AI robots that served as friends to children. It is called, Klara and the Sun, by Kazuo Ishiguro. Just throwing reading suggestions out there. I probably read way to many novels...
This will largely not happen, and there isn't a real need for it
The first problem you'll run into is that even if the research is done, most companies won't touch it do to the implications, such as the model discriminating on race, gender or a mental condition. This would also make targeted advertising and Mass surveillance, much, much worse, which is also a huge concern if I'm not in control of the model. I don't trust OpenAI to not misuse the data people are willingly providing to them.
As for myself, i'm not interested in having a computer talk about the body language of other people. My intuition has developed to the point that I don't really need to look at body language to know how a Conversation is going to go.
My usual ramblings on
I don't think the only problem is with the AI interpreting the expressions of the person you are talking to but with the way that it might convey that to me. If I am having a conversation with someone, I wouldn't want another voice in my ear giving me an audio description because I don't have the attention span to cope with two voices. Sure it could do a super-lidar style tone going up and down but I think that would also be a bit much. I suppose if it could learn to only talk during quiet moments, but even then I think my brain often likes to spend that time processing what it's been told rather than taking in new information.
One thin I might find useful, though, is to know when someone has stopped talking to me. I find this very difficult socially. I've no idea if someone is looking in my direction or not. Or maybe they start talking and I don't know if it's to me or someone else. So it's risking being rude one way or another - either ignoring someone or talking over someone else. Or those situations where you carry on talking to someone who has left altogether. So maybe something that could convey changes like this could be helpful. But again only if it's not talking over something I need to hear.
What might also be helpful is in those situations where you weren't concentrating because you didn't think someone was talking to you only to find out that you have just been asked a question and have no idea what it was. A quick recap would be good. But maybe then the question is how I would ask for this information. If I say "hey meta, what did that guy say?" it's a bit of a giveaway that I wasn't listening.
I suppose also being able to tell if someone is waving at you or trying to get your attention would be helpful.
Taking how the info is conveyed to you out of the equation, I don't know how I feel about being told what someone's expressions are. I suppose as a blind man I should be more in tune with the tone of someone's voice and how they are speaking than perhaps I am. But if someone was flicking me the v-sign when talking to them knowing I couldn't see, or rolling their eyes at me looking bored, scowling or whatever. Would I want to know that? Maybe if someone was giving me the eye or leaning in close I might want to. Or maybe in the days before I got married anyway. I've never had to go dating when blind - the thought horrifies me to be honest so I hope I never have to go through that.
I can imagine going dating as a blind person could be quite dangerous too, so to get some extra info there could potentially be very useful. For example, if the AI told you that their bedroom wall was covered in polaroids of crime scenes you might decide that it would be a good idea to leave rather quickly.
There is a hell of a lot of information out there that may or may not be useful. So the problems I see it are how we ask for that information, how we are told it and, of course, whether the AI is up to the job. And then whether the AI should be coldly telling you the facts or trying to interpret them. And then whether we trust what the AI has just said anyway.
If I've not already gone off too far off topic with this, I could see it being helpful with other social situations. For example, knowing that there is a queue, getting you into position, telling you when it's time to move forward. Knowing that when someone has asked if they can help you at the end that they are talking to you and not the person you didn't know was standing next to you.
I should say I'm not skilled at these things so maybe there are better blind-person techniques for dealing with them. But usually I just bumble about and let other people take control, so anything that gives me a bit more an edge would help.
I'm a Dancer. And a Prancer…
I'm a Dancer. And a Prancer.
I was reading along, composing a comment in my head, and then Mr Grieves comment pretty much summed up my thoughts. I completely agree that all this communication can be detected by an AI. The trick will be to communicate it to the blind user in a non-distracting manner.
I use an SSP - support service provider - when I attend conferences and other large events. The SSP communicates to me using a system of touch gestures called haptics, not to be confused with the vibrations of your iPhone. As an example, when I'm in a conversation with someone, I can tell from my SSP's haptic signals whether the person I'm speaking with is smiling, frowning, agreeing with a nod, or (more likely) completely ignoring me and playing with their cell. If I'm speaking to a group, my SSP indicates by touch when someone raises their hand, even indicates which direction I should face to acknowledge them. And because this is all done by touch, it's very unintrusive. As Mr Grieves points out, having some other voice talking in one's ear would be rather distracting by comparison.
When I think of all the ways I use SSPs at various events, the vast mojority of assistance they provide could be handled by an AI. It's becoming more difficult to think of things my SSP does that an AI cannot, or could not, do. Technology marches forward.
Snake Vision
There is that BrainPort device that lets you see with your tongue. Generic Wikipedia article:
https://en.wikipedia.org/wiki/Brainport
Probably difficult to talk and look at the same time though.
I wonder how a haptic body suit would tell you someone is glaring at you... giving you the stink eye and such.
I read NLP, in this case, to…
I read NLP, in this case, to be nuro linguistic programming which, as it happens, also works.
I think the issue with this, and where it becomes unnatural, is in being a quantified result. Part of human nature and communication between humans is ambiguity and the interpretation by the one perceiving, to outsource this to an AI sounds bad.
I think there is a high concern that we, as blind people, miss out on a great deal of non-verbal communication, where, I don't know that is true. We can hear when someone is looking away from us, for example, hear them shifting, and there is a huge amount of information conveyed in voice. So, in the grater scheme of things, I don't think it's worth it.
If it can be monetised it will come out anyway though, just because it's a research paper does not mean it will ever see the light of day. There are millions that don't.
Mostly agree with Ollie
The more I think about it the more using AI in these situations does seem a bit ridiculous, even though I'm enjoying thinking through the idea.
I think there are certain things that my be very difficult but where the solution is to get better skills and not to delegate responsibility to a piece of tech.
It's a similar thing to mobility skills. A listener to Double Tap wrote in and said that maybe instead of spending a few hundred pounds on a robot guide dog or a toilet seat that sits on your neck and tells you where to go, that you would be better off spending that money on better mobility training. I think there's a lot to be said for that. Not that you can buy social skills in quite the same way.
Regarding social stuff specifically, I don't have Ollie's skills. If someone is talking to the person to my right or to me I wouldn't have a clue. If they got up and left, I may or may not hear someone move but I probably wouldn't know if it was them.
I think with things like this a lot of the problem is with confidence and being able to use mistakes to your advantage. I suffer from social anxiety and tend to avoid those sorts of situations which isn't helpful as it means if I do find myself in them I lack some of the basic skills. Maybe I do sometimes use blindness as an excuse for the lack of those skills, and maybe that is a cop out. Still would having a co-pilot in my ear really be a realistic or sensible solution? I suspect not. You could also see a situation where people start to depend on that kind of thing and that's just a bizarre thought.
Anyway I'm enjoying these AI discussions - they are certainly getting the cogs going. I think AI is such a complex thing - it promises so much and doesn't quite deliver as much as it makes out that it does, but it also offers such an infinite amount of possibility. I think we have every right to get excited about where it is going.
Semiotics
Paul Martz gave a few examples of where it might be useful, if the information were conveyed in a non-distracting way. One situation I can think of is when the noise effectively blots out anyone's ability to hear, sighted people revert to hand and body/facial gestures. I was in one situation where two sighted people were intending to harm me and were communicating with hand gestures as one of them talked to me. Luckily, someone else saw it too, and intervened. On the other hand, I know someone who worked in a photography darkroom, who would flip the bird finger at his boss as soon as they turned out the lights.
I don't know that AI will ever be used in those ways for blind people, but the glasses told that one guy his mother was smiling, if I remember the post correctly.
thanks @Lottie
I think sometimes I struggle to know what I should realistically be able to do and what it's OK for me not to. I'm usually quite hard on myself, I think, and always feel like I'm not doing well enough.
And maybe that's why my opinion is flip-flopping so much.
I think we should at least aspire to be able to do certain things ourselves. For example, I think learning the white cane is pretty essential and if you decided to buy the shoulder toilet seat and that was your only way of getting around then that wouldn't be a good idea. (If no one knows what I'm on about, I seem to remember the biped or something being described like that but I might be talking rubbish like usual - it just makes me smile as a description.)
If there was something that could be used to give me a boost in social situations then that could be a good thing as long as I didn't rely on it to the point where I'm not developing my own skills in that area.
I guess my point about it being ridiculous is maybe that I started imagining having like an audio description of my life which I don't think could be workable.
So maybe it would have to be very simple, as Paul's example sounds. I think it's very easy to get sensory overload. So maybe this does only work if we need a fairly small and specific set of prompts.
I played the last of us part 2 a couple of years back and they had a ton of audio noises that would tell you what was going on - maybe there was a pickup, or an enemy, or whatever but there were so many. And I did struggle to remember what they all meant - I had to keep digging into the audio glossary all the time. Which is fine in a game but maybe less so in real life.
Anyway if anyone is working on this then I am sure they have a bigger brain and more imagination than me, so if anything did come out that could do this I'd certainly be wanting to know everything about it even if it wasn't a subscription I'd be willing to pay.
I think the example of…
I think the example of nonverbal communication in a noisy environment is a good one. Simple physical gestures, orientation to others, other's actions are all good pieces of information.
The point is not that we shouldn't have the information but that we should not rely on an AI to tell us someone's emotional state, we already have the means to do that verbally and, if anyone knows us, they will be more verbal.
the question I keep returning to is, will this make our lives easier or better? I'm not sure. I think there is a danger of being overwhelmed with information. We've worked hard to develop skills based on our input strengths, to negate those or even, if possible, augment them, may be detrimental in the long run. Fact is, we have to work harder, I burn through a ton of spoons in social situations, hyperaware of my environment, keeping track of where people are and following conversation, not to mention my witty contributions. I don't know if such tech would make this easier or just add to a rather busy information highway.
Mr. G, I get the anxiety thing. I think many of us have to deal with it. All I can say is that the more I do the easier it gets. Growth, unfortunately, is uncomfortable. I think we have to be mindful of burn out and getting through those spoons though. I won't say yes to every social event and I have boundaries that, as I get older, I find easier to explain. Devices such as AI Glasses may well make things easier, make us more independent. My concern is that by the time we get to sitting and chatting with people, we'll be knackered, our minds having heard the equivalent of the bible just getting to the pub.
Good mobility is the foundation from which we must work. Assume your battery is dead and, as long as you can get home, that's when you start building on this with cyborg stuff... At least, that's my opinion. What is good for us as humans, built to live in the grasslands of the savannah may not be more tech, but less. There is a lot of research being done that is starting to support the idea that exposure to rapid data switching, think twitter feed, instagram, or any list of tasty data, is detrimental to our dopaminergic reaction, namely we get a lot of little hits and then get a big bunch of low. We are not designed to suck on a hosepipe of data and be okay with it.
Weird...
I am sure I read the original post a couple of times, but the "emotion" part really didn't click with my mind. I was thinking really, really basic emotional stuff, like smiling, winking, scowling, eye rolling. Those quick facial expressions that might not even be accompanied by words, and of course, the hand gestures...
I have seen, back when I could, people smile with their lips, but not their eyes, so to speak; the hunched shoulders; the empty gazes. Not sure how AI would do with that.
The jabbering stream of descriptions, and lack of squelch control that might come with these new devices do concern me a little. I shut my screen readers up all the time when I don't need to hear something like fifty words in an image's alt text. It's just a tap or key press. What is it with a pair of AI glasses? I don't want to keep tapping the earpiece out in public, or something noticeable.
As far as cane travel, and almost as important, using hearing (if you have it), I'm all for learning cane mobility from the beginning. I wrote elsewhere, the phone app might stop being developed or go out of business, your stand-alone, mobility-tech device might outlast the company making it, maybe even last a lifetime with care, but you can always--and I speak from experience-- grab a stick or broom handle and manage to get around if you have working limbs and cane skills.
Squelch control, never heard…
Squelch control, never heard it called that but a very good point. Something like apples pinch gesture as it uses with their watch and vision pro would be ideal for this, but of course, apple don't have glasses... Yet.
I do think the idea of having siri constantly on a user in a comfortable form (airpods get uncomfortable after a time), is something they will be considering. I fear their privacy stance will hamstring much of the functionality. As we know, what apple does, for the most part anyway, it does well though with a high degree of restriction.
Good tip re the broom handle. Low tech is reliable.
Also, flipping the script somewhat, how would you all feel about a little glistening eye watching your body language and interpreting your internalised emotional state? It's one thing having a human do that and, most likely geting it wrong due to the cognitive bias that makes us who we are. Having something that might actually be able to correctly distinguish our reaction and then, because that is why it would be developed, used to profit from us in some way, chills me. Personally, I really hope the emotional recognition aspect of this is squeltched. All the other stuff is fine.
I'm not good at remembering what things are called
I guess maybe verbosity. I couldn't think of the term, so reached over into radio communication terminology. I still don't know what the term is for pressing one of the modification keys to interrupt the screen reader's speech.
I've read in a few different neuroscience books, sometimes citing back to William James, that our emotion is intertwined with perceptions of our physical bodies; like pounding heart and numbness of limbs for fear; tightness of the chest muscles for sadness, etc. That the sensation of those emotions are much reduced as those physical sensations are no longer experienced. If that is the case, I'm not sure how well AI could interpret complex emotions. I'm not hearing anyone talk about the AI having a virtual body and sense of self that are placed in its models of the world around it, like what our brains do in simulating all our experiences. So how does the AI catch the sympathetic vibrations of emotion, so to speak, of a pounding heart and cold, numb limbs, as a person might feel just from reading a stalking scene in a thriller novel?
* Having difficulty getting the grammar to sound right in this one, and it's tormenting me.
Another Very Interesting Topic
This is something which I've wondered about for awhile. One of the coaches at the fitness studio in my building--who is fully sighted and used to work at an adjoining nonprofit--actually put Seeing AI through its paces. She was able to get the app to describe the faces of some colleagues at the time, not just physical characteristics but emotions too. She said the app did okay and was very impressed with it. I've mainly been wondering about this because of some things having to do with mental health. I probably can't disclose much here, but suffice it to say it'd be awesome if we could get information like this. Btw, I'm no mental-health professional but tbh I've felt sort of like one for the past several years.
It's not AI
Ekaj, but there was Joseph Weizenbaum's program "Eliza," from back in the late 60s. for the beginning of natural language processing, and mental health. A generic Wikipedia article:
https://en.wikipedia.org/wiki/ELIZA
I'd like to think that machine learning has come a long, long way since, but as Ollie points out, it's still just people with their own baggage creating the programming, and still part of the system. Lots of novels out there, though, that imagine runaway AI that starts reaching back into itself and rewriting its own programming without safeguards.