I have both AIRA Explorer and Be My Eyes. I use both for AI-generated image descriptions, I don’t have a preference, not yet.
Earlier today, I saw a Toot on Mastodon with an image with no Alt Text, so I sent it to AIRA’s Access AI. This is what it said:
“This image is a meme composed of four panels, each featuring characters from a well-known movie. The characters are engaged in a conversation about artificial intelligence (AI).
- The first panel shows a young man with the caption: "AI is going to be used to change the world."
- The second panel shows a young woman smiling with the caption: "As a tool to make the world better, right?"
- The third panel shows the young man again, looking somewhat skeptical or thoughtful.
- The fourth panel shows the young woman again, now with a more questioning or concerned expression, and the repeated caption: "As a tool to make the world better, right?"
The meme humorously highlights the ambiguity and potential concerns surrounding the use of AI, questioning whether its application will indeed be for the betterment of the world. 👩🦯”
I wondered what the ‘well-known movie’ was, so I decided to send it to Be My Eyes, this is what it said:
“The image is a meme featuring characters from the movie "Star Wars: Episode II – Attack of the Clones." It consists of four panels:
1. The first panel shows Anakin Skywalker with a serious expression. The text reads: "AI is going to be used to change the world."
2. The second panel shows Padmé Amidala smiling and looking hopeful. The text reads: "As a tool to make the world better, right?"
3. The third panel shows Anakin again, this time with a more intense and slightly skeptical expression.
4. The fourth panel shows Padmé looking concerned and slightly confused. The text repeats: "As a tool to make the world better, right?"
The meme humorously suggests a doubt or skepticism about whether AI will indeed be used for positive purposes. 🤷♀️”
So, there you go. Access AI and Be My Eyes both give basically the same description, but only Be My Eyes tells me the source of the meme.
Do people think I should email them about this? I like both descriptions, but can’t see why I wasn’t allowed to know the source of the meme by Access AI.
Comments
What ChatGPT4o had to say
• Detail and Context: Be My Eyes provides a more comprehensive description by identifying the characters and the movie. This additional context can significantly enhance the user’s understanding and appreciation of the content.
• Possible System Prompt Restrictions: As you mentioned, AIRA’s system might have prompts that prevent it from identifying specific characters or sources, which could explain the difference in detail.
Google Gemini
Google's Gemini also seems to usually tell about sources of things in images, like characters in a video game and such.
Hey Dev
Can you try it on Mastodon images? If so, you can find it in my feed, several times!
gpt 4o
@AI you should try attaching the image in the chat gpt app and see what it comes up with.
If we are doing comparisons...
What about Microsoft's Co-Pilot, or Amazon's Rufus? Or do these not work in the same way?
Keep in mind
These AI services can also make the wrong assumptions sometimes. For example, just the other day, my character from adventure Island was confused with the character from super Mario. In your case however, I'm sure the AI was correct, given the extra help with the text accompanying the meme. Just be extra careful when the AI starts mentioning names.
The reason why I only did these
Is that I am using services for blind users. I do have access to Picture Smart in JAWS and the beta of the envision Assistant, but the two i used are in my share sheet and I use them all the time.
Interesting
Maybe they are both using the same model, but those models have been modified to suit each application a little differently. Did you ask access AI to give you more specific information since you weren't satisfied with the initial description? Sometimes, you have to prompt the artificial intelligence for more information, which, of course, I'm guessing you already know this. If you ask for more information, did you receive the context? If not, I do think you should email the company. Let them know what happened here, and then ask if the contextual information can be generated automatically. I know that, generally speaking, the initial description, is focused on giving you an overview, and then you have to ask for additional information in terms of Receiving clarification, if that is something you desire.
One thing that I would love to see, is for us to be able to train the models, within the set parameters, based on how we use these programs.
Roses, you are a genius!
I forgot about the 'tell me more' option - I hadn't wanted to use AIRA verify, but I forgot to ask for more. Here is what I got:
"Certainly! This meme utilizes a scene from the film "Star Wars: Episode II – Attack of the Clones," featuring the characters Anakin Skywalker and Padmé Amidala. The scene is set in a picturesque, grassy meadow, and the characters are dressed in attire that reflects the film's science fiction and fantasy elements.
Here's a more detailed breakdown of the panels:
1. **First Panel:**
- **Character:** Anakin Skywalker
- **Expression:** He has a contemplative look, indicating he is making a serious or thoughtful statement.
- **Text:** "AI is going to be used to change the world."
- **Context:** This statement reflects a common belief about the transformative potential of artificial intelligence, suggesting that AI will have a significant impact on various aspects of life and society.
2. **Second Panel:**
- **Character:** Padmé Amidala
- **Expression:** She is smiling brightly, appearing optimistic and hopeful.
- **Text:** "As a tool to make the world better, right?"
- **Context:** Padmé's response introduces a hopeful perspective, implying that the change AI brings should be positive and beneficial to society.
3. **Third Panel:**
- **Character:** Anakin Skywalker
- **Expression:** He looks more serious and somewhat conflicted, as if he is considering the implications of Padmé's question.
- **Text:** No text, but his expression conveys a sense of doubt or uncertainty.
- **Context:** This panel creates a moment of tension, suggesting that Anakin might have reservations about the optimistic view of AI.
4. **Fourth Panel:**
- **Character:** Padmé Amidala
- **Expression:** Her smile has faded, and she looks more concerned or uncertain.
- **Text:** "As a tool to make the world better, right?"
- **Context:** The repetition of her question underscores her concern and emphasizes the uncertainty about whether AI will indeed be used for the betterment of the world.
The meme plays on the juxtaposition of optimism and skepticism regarding AI. It humorously highlights the gap between the idealistic hopes for AI's positive impact and the realistic concerns about its potential misuse or unintended consequences. This type of humor resonates with many people who are aware of both the promises and the perils associated with the rapid advancement of AI technology."
OK, Access AI knows a lot
The question now is, what would be the best combination of all three ansers? This might be a good advert for the Envision Assistant - that allows you to give it a 'personality' in advanced, to shape how it describes things.
Indeed impressive.
Using Be my AI, I got the AI to not only describe the options menu layout for the Nintendo game Crisis Force, but also which setting I'm highlighting.
It gives me cognitive dissonance!
All I read all day is half the people screaming that AI is stupid, just fancy auto-complete, that parrot thing or that it never gets anything right!
But the image descriptions I get from it are amazing. I have checked a lot and yes, they aren't perfect...but I have no useful vison at all! So it is a million times better than nothing.
Question
Let's say that you receive a description for a picture from the artificial intelligence program, but the information that you receive is not accurate. If you received a verification of the corrected information, is there a way to train the AI model, within the parameters of the application, to identify this for future use? I think that this should be the next step. Community building.
Very Interesting..
This is very interesting indeed, and I have on occasion experienced it myself. Here's my question though. Is there any way to train these apps to recognize which button performs which function on something like an inaccessible thermostat? I happen to have one of these inaccessible thermostats in my apartment, and it's unclear at this point whether or not I'm going to get a new one. To make a long story short, I asked the landlord for one right after I moved into this apartment. He was going to get all of us new thermostats anyway, but that has yet to happen and we've been waiting about 3 years. He and his crew have been great otherwise. But I digress. I don't really care at this point what kind of descriptions I get from these apps. I guess the reason for that is I cannot see anyway so why bother? I've always been able to verify the descriptions with friends/family, and have never gotten a wrong description. This is also true of audio description, although that's not AI-based at least for now.
Answer, of sorts, for Roses
AIRA have AIRA Verify, one of there professional vision interpreters checks the description and makes any necessary changes..
AIRA have just launched Build AI - a program to do what you suggest. It is opt-in at present and only in the U.S. I think there aim is to produce a modle that is fine tuend for Alt Text.
Clarification
So, after you received the updated description, let's say that you want your artificial intelligence to be able to provide descriptions in a certain manner, or to write in a certain style, for example, maybe you want the subject to be described first, including color, shape, placement in the photograph. Is there a way to tell the artificial intelligence that the information is wrong that you got initially, and then this is the updated information that they should refer to in the future, if they were to describe this photo again, instead of having to reiterate that fact every time you describe the picture, hoping to get the right results from them initially. Basically, what I'm saying is, it would be great if we could pull some samples of what we want our descriptions to look like, whether that's for describing pictures, or another task, and have the AI learn from us in real time, within the parameters of the application, of course.
This can be done
@Roses what you suggest can be done; it'll be most effective if the AI is run locally in which case it can pull in a lot of personal info and learn from that also. But I don't think any existing blindness-specific service is focused on or interested in building online personal profile which is what would be required for this kind of thing. But GPT 4o has this memory update feature that lets you ask it to remember certain thins and it will
unfortunately can not try AIRA
well,
I can not try AIRA
so, can not compare with both.
but, I think be my eyes is really cool.