New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Gokul on Monday, June 17, 2024 - 12:19

I can capture a video in meta glasses and get it described with the app? Interesting! not realtime, but still...

By privatetai on Monday, June 17, 2024 - 12:19

"Someone on here mentioned a few comments back that they produced a slideshow and imported it (as a video) to PiccyBot. Simple question; how did that person do that? I can't find any way in the photos app that allows me to save a slideshow I have created. slightly off-topic I know but I've asked elsewhere and it didn't seem to register."
Personally I can't seem to get the photo app to share or save the slideshows I created. i read online you save the photos to a folder, and then you save that folder as memory and then you can go to memory and save it as video- but I've not been able to get that to work. So I use the alternative method: Imovie. The imovie app on your phone will allow you to join videos, photos, insert sound clips, add music and all that. And it's pretty accessible and self explanatory.

By Bingo Little on Monday, June 17, 2024 - 12:19

Subject line says it all. I share a video to PiccyBot, or a photo, and nothing happens. I'm returned to the share sheet with all its various options. Anyone else having this?

By mr grieves on Monday, June 17, 2024 - 12:19

I just went into the Photos app, selected a video and shared it to the pixie and it worked great.

Where are you sharing from?

By Bingo Little on Monday, June 17, 2024 - 12:19

I'm sharing from the photos app too. It doesn't seem to be playing ball.

By Martijn - Spar… on Monday, June 17, 2024 - 12:19

I tested it on a few devices and different iOS versions, and it all seems to work..

Do you have the latest update of PiccyBot? Which device and iOS are you using?

Note that PiccyBot currently only receives shares of images and videos from app library or any app that can save the images or videos (E.g. it works from Whatsapp or Messenger but doesn't work from Youtube or Instagram). Still hoping to expand that further.

By Bingo Little on Monday, June 24, 2024 - 12:19

HI Martijn, I'm using an iPhone 13, latest IOS 17.5.1 and latest version of PiccyBot. Sounds like it's just me, then..

By Gokul on Monday, June 24, 2024 - 12:19

Not just you, Bingo Little. Tried sharing a video from Whatsapp and it didn't work. Using iPhone14 pro with IOS 17.5.

By Martijn - Spar… on Monday, June 24, 2024 - 12:19

Ok, looking into it. Could be the format of the image or video. If you guys could check that it works for some images or videos and not for others, it would give me a clue.

By Lee on Monday, June 24, 2024 - 12:19

Just tried via photos app and no issues sharing photos or videos. So a little strange.

By mr grieves on Monday, June 24, 2024 - 12:19

I tried sharing a photo in WhatsApp and it worked OK, so WhatsApp isn't totally broken. But as was said, maybe something to do with the specific images or videos.

Everything I have tried has been captured with my Meta Ray-bans if that makes a difference.

By LaBoheme on Monday, June 24, 2024 - 12:19

hi Martijn, can you add the option for sending image to the ai without downloading the imager first?

example, when in safari, if you tap and hold an image, various options come up, share sheet is one of the options. tap share and all the possible share options pop up--mail, message, etc., but piccybot is not one of them.

By privatetai on Monday, June 24, 2024 - 12:19

"Subject line says it all. I share a video to PiccyBot, or a photo, and nothing happens. I'm returned to the share sheet with all its various options. Anyone else having this?"
I had the same issue, so I went to the app store, into account to see if my piccy is up to date, it said it was, then just to be sure, I double tapped on the piccybott in appstore to open up the app's page where it shows rating and description and all that, and wallaaa, there's an "update" button. After updating using the button there, everything works fine now. Weird how appstore told me it was up to date yet the update button only showed on the app's page.

By Ollie on Monday, June 24, 2024 - 12:19

The actual image is sent to the AI so it has first to be downloaded by the phone and then it is uploaded. What we'd need for the safari example is on device AI. For now, we have to shuttle any data we want manipulated into the cloud and, in this case, via our phones.

By LaBoheme on Monday, June 24, 2024 - 12:19

of course the image has to be downloaded, but it doesn't need to be saved to the phone, it can simply be temporarily cached and sent along. or more technically appropriate, it should simply be saved to the app and not the photo library.

right now, one has to save the image to the photo library first, and delete it when it no longer needed.

By Martijn - Spar… on Monday, June 24, 2024 - 12:19

Hi guys,

Updated the Claude 3 Sonnet model to Claude 3.5 Sonnet. Please try it out on images. It is definitely one of the best models, from my initial experience it seems to catch more personal expressions than GPT4o while GPT4o is better at background details.

No update on the sharing to PiccyBot yet. It does work in most cases, figuring out where not.

By SiddarthM on Monday, June 24, 2024 - 12:19

Hello Martijn,

I have been testing this app since this morning and it works very well. I am curious about a few things, though:

1. I cannot seem to see the currency rate in my country. I am located in India, where we use INR. The dollar is expensive here, so $20 is a lot of money in INR. Could you make the pricing a bit more reasonable? I would like to purchase the lifetime plan, but the current pricing seems a bit high for me.

2. Is there any possibility of adding live video description in the future? Instead of capturing or sharing a video, could we receive a live description as we turn on the camera? I understand that this would be difficult to implement and would require a lot of funding, but I would love to see it in your app first.

3. Can we not share a video or image directly from any app? For example, I tried sharing one from WhatsApp, but I could not find your app in the share sheet.

4. Is this app available for Android users as well? Some of my friends who use Android would like to try it out.

By Martijn - Spar… on Monday, June 24, 2024 - 12:19

Thank you. Regarding your questions:

1. PiccyBot is using seven different AI engines at the moment, plus an engine to generate the speech. Unfortunately these services all have costs associated to them. I can't afford to reduce the fixed price unfortunately.

2. I am definitely looking at live streaming. OpenAI has been teasing that with their new model but it is not yet available. As soon as it is, I want to integrate it right away (keeping in mind costs though).

3. This should be possible. Can you look further in the share sheet? It could be down the list?

4. PiccyBot is available for Android users as well, with very similar functionality. The link is https://play.google.com/store/apps/details?id=com.sparklingapps.piccybot

Hope this helps!

By Gokul on Monday, June 24, 2024 - 12:19

$20 is like INR 1600, and we're talking of a life-time subscription here. A monthly subscription costs like INR 299 otherwise.

By Ollie on Monday, June 24, 2024 - 12:19

Didn't know it was on android too. Great stuff. Will be interesting if the new google glass ever gets off the ground.

By Orlando on Monday, June 24, 2024 - 12:19

Hello, thank you for such a wonderful app! I am enjoying it very much and I am a subscriber!
I was scrolling through Reddit and I found the following video on a Shortcuts subbed about creating a shortcut that would let ChatGPT 4.0 describe images. Could something like this be done for this app?
I have attached the link to the video, and also the Reddit post for anybody who is interested. I’m not very good at programming apps or shortcuts so any help would be appreciated.
Thank you again for such a great app.

https://youtube.com/watch?v=AkmtCXlEldk&si=ln-h76JsO8pyQw3o

The Shortcuts sub reddit

https://reddit.com/r/shortcuts/comments/1d9go6a/creating_a_shortcut_using_gpt4o_to_explain_photos/

By Ambro on Monday, June 24, 2024 - 12:19

Hi everyone. I tried to get a one month subscription for this app, and first of all I thank the author for his excellent work. Knowing very well the photos and videos I have, and comparing them with Be My Eyes, I noticed that, even though I use the ChatGPT4O model, the description of BeMyEyes is better. For example, in one photo there was a man with a cigarette in his hand, described well by BMY, while PiccyBot said a piece of paper. I noticed these inaccuracies in more than one photo. BMY precisely described a little girl in a photo while PiccyBot said a person, without specifying whether man or woman.
Even in one video a woman was described as a man. I don't know if the people at BeMyEyes have optimized their algorithm, but very often their description is more accurate.

By Martijn - Spar… on Monday, June 24, 2024 - 12:19

Ambro, you could improve the description results by posing a more detailed question. The default is simply 'what is in this image?'. If you add more specifics what you would like described it could give better results. I will look into changing this initial question to be more useful for blind and low vision users, which could well be what other apps are doing.
Please compare GPT4o and Claude 3.5 Sonnet as well. I have found that the new model gives better descriptions, especially about expressions and emotions. But as you know, the models give slightly different descriptions each time so comparison is not that easy.

By Ambro on Monday, June 24, 2024 - 12:19

Thanks for your reply, and congratulations again for your work. Could you then add the default phrase to query the AI among the options? Because if I share a photo, for example from WhatsApp, the phrase chosen by you is always used.

By Ollie on Monday, June 24, 2024 - 12:19

I think it's possible to include a prompt behind the scenes, this is how it works with that shortcut mentioned earlier in the thread. Something like: You are an AI describing images to the blind in vivid detail, describing people, expressions, stance, clothing and attitude as best you can. Include perceived gendre, what they are holding or doing... And so on. I think the current prompt is too short and adding in this big block of text will be too much.

As I've suggested before, like with the chat GPT app, it might be worth having a means of a pre-prompt that is sent with every picture such as the one outlined that may also be editable by the user dependent on their interest and requirements for output. For example, some of the LLMs can be rather verbose, inserting mood or general commentary on the scene which I personally find patronising and annoying. 'A great time is being had by all'... And alike. Glibness isn't something I want in a description.

The other advantage of adding in the pre-prompt, aside from specifying the style of feedback is, of course, the personality of the feedback. I know these personalities are fun, but they are restrictive in choice and I can't imagine many people are using them. Adding in the ability to define one's own personality within the subscription, 'you are a dead pan ai who quips like james bond who is especially interested in beautiful cars' could be both more fun and adaptable to the end user.

Also, PiccyBatch forever!!!

By Missy Hoppe on Monday, June 24, 2024 - 12:19

I just wanted to come out here to thank the developer of this amazing app. I've been having so much fun with having it describe pictures from my photo library, and I've even had it describe a couple of videos. For myself, personally, I have, at least for now, turned off the personalities of the voices. Somehow, I seem to have better results with no personality. I've also set it to provide me the lengthiest descriptions possible, and it's amazing. There are times when it hasn't been entirely accurate. Most notebly, I'm thinking of a short video my friend took when I was trying some coffee she made. I was tempted by the flavor name, but it just tasted like yucky old coffee to me; not a coffee drinker at all. It smells great but I can't stand the taste. Anyway, when I used GbT as the ai model, It kind-a made up its own version of what happened. According to it, I said the coffee was delicious and smiled. That is, in fact, quite the opposite of my tru reaction. Geminy Pro seemed to be a bit more accurate, so I'm using that as my default for the moment. I'll most definitely check out some of the other AI models just for fun. For anyone on the fense about this app, please check it out. It's definitely worth every penny I paid for it.

By blindpk on Monday, June 24, 2024 - 12:19

Firstly, have read through the whole thread now and huge thanks to the developer for being so active here and for improving the app continously. After using the app for a while now I have some feature requests. I'm using the app without voice feedback, only with VO reading the descriptions.
  • Full-fledged conversation history: I would like to see all questions and answers I have asked about an image, like in most other apps of this kind.
  • "Clear conversation" button: A button to clear all data about the current image/conversation.
  • Share URL or similar: Whatever "magic" Be My Eyes has that makes it possible to share almost anything from e.g. Safari and have it described.
  • AI customizations: The ability to set your own "system prompt", that is sent with all requests. I don't know if all models have that feature, but OpenAI has it at least. A bit more advanced would be to also be able to control "temperature", i.e. how "random" a model response is (as I understand it the recommendation is to keep it low for scenarios like these where a more predictable answer is preferred).
I also have a bug to report. It seems that, even if the length is sett to 100%, responses are cut off. I noticed this just now, so I haven't done extensive testing, but sending a few images to GPT-4o it clearly cut off somewhere in the middle. I also tried typing "continue" as you do in e.g. ChatGPT to get the rest of the response but it re-generated it instead. Thanks again for all the hard work!

By neil foster on Wednesday, July 3, 2024 - 12:19

the video stuff is brilliant for the ring doorbell. any way u can get it to build up a database of people it can then recognise? so u would know who u missed or whatever.
be my eyes/ai doesnt do videos n'est pas?so piccybot has a big plus point there.
dont think theres any easy way to get facial recognition from a video.

By Gokul on Wednesday, July 3, 2024 - 12:19

I guess you'd need on-device AI to have that implimented because of the privacy concerns associated. But that'd definitely improve my life if it were there as I deal with a lot of staff every day and it'd be brilliant if I could know which one of them is coming into my cabin. Just one use-case.

By neil foster on Wednesday, July 3, 2024 - 12:19

ok yes piccybot is great, but has its limitations
i trusted it too much, and it told me somebody was breaking into my home!
so i dialed 999, and yes should obviously have gotten some sighted input first.
the sweeny arrived post haste and umm whoops!
i showed them the video and they said, err... this is yourself putting the ring doorbell back on! i'd recharged it earlier that day, and lost track of time.
ah if i had a quid for every banana skin i ever trod on :-)

ok my apologies to the tax payers and boys in blue for the f*kwittery, but, it is very probably worth mentioning that even nowadays, ai, has its limitations.

however, with reference to gokul point, about on device ai.
it is obviously possible to arrange for off device storage and code execution to be conducted securely, but u'd need procedures/practices to be put in place that would make such a thing trust worthy.
yes u r obviously right since be my eyes has hit this problem as well.

By Ollie on Wednesday, July 3, 2024 - 12:19

I wrote a short story a few years back about a blind man living alone and using an AI image recognition app that kept saying that there was someone else in the shot. 'A bowl of fruit, apples, oranges, pears with a man behind it smiling.' He thinks it is a false positive, what we'd now called an AI hallucination, but, oh boy, it wasn't.

By Martijn - Spar… on Wednesday, July 3, 2024 - 12:19

Hi guys,

Thanks for all the feedback! Sorry to hear about the scare Neil, but still glad you used PiccyBot for it :-)
Recognizing contacts would be hard privacy wise, Gokul. Maybe I can integrate it with Apple Intelligence later on, which should have the data..
Thanks Ollie and blindpk for the suggestions, I have actually adjusted the base prompt to be more in line with what you would need as blind or low vision user. Hope it helps. The Instagram share is one of the first 'generic' share ones. Will need a paid service for it to expand it fully, trying to balance it.

PiccyBot Pro users now get longer video descriptions (set video quality to 'high' in settings). And the Instagram video share to PiccyBot is hopefully useful. Do note it will take longer as I need to both download and then upload the video to generate the description.

Thanks for using the app, looking forward to further improve it!

By mr grieves on Wednesday, July 17, 2024 - 12:19

Love the interview on Double Tap. The thought of having the pixies in my meta ray-bans is incredibly exciting. I really hope you can make that happen.

By Martijn - Spar… on Wednesday, July 17, 2024 - 12:19

Was great to talk on Double Tap. I have been trying to get PiccyBot integrated into the Meta Raybans, but no luck so far. It looks like the only possible way to get it done is through a Whatsapp service. You could then say 'Hey Meta, Whatsapp last photo to PiccyBot' and then get a description back. Much slower than the Meta AI though and unfortunately you can't do video descriptions this way. I love my Meta Raybans but I really wish they had opened it up for us developers.

By mr grieves on Wednesday, July 17, 2024 - 12:19

Ah, I was optimistically hoping you had discovered some secret to make it all work. I can see how Meta might not want apps like yours integrated because the glasses do promise the same sort of thing using Meta AI and they seem to be very heavily pushing it. However, it's not a patch on what the other AI models that your app uses.

When I am out and about I do sometimes ask Meta AI to describe what I'm looking at but it's always a bit disappointing. I personally would love to be able to send the image via WhatsApp to the pixies and get those high quality descriptions. But I don't tend to get a huge number of WhatsApp messages so I turn notifications on. I can imagine many people not wanting to do that.

It feels like we are agonisingly close to the perfect solution.

By Ollie on Wednesday, July 17, 2024 - 12:19

Yeah, I also looked into the whatsapp solution for a self hosted thing with chat gpt 4o, but it just feels clunky.

I think, at least for the forseeable, the meta ray-bans are going to be completely locked down to meta services that is, of course, the reason they sell them. It's not to make money from the glasses themselves, but to provide a gateway to their own ervices. I have a gut feeling that the meta ray-bans, as tantalisingly close as they are, will never be the device we hope for. Meta's accessibility history has not been grat. Facebook is an explosion of nonsence, threads is a mess... Whatsapp is good, but that's the only product of theirs, aside from teh glasses I use and, to be honest, I hardly use the glasses. The AI is rubbish and if I need sighted help, my phone is already with me. They're not comfortable to wear all day long, the battery wouldn't last that long anyway, and the latency of voiceover through bluetooth is just a pain in the bottom.

I think you'll be able to get a Pici on glasses soon, just not Ray-bans.

By Ollie on Wednesday, July 17, 2024 - 12:19

#TeamPiccyBatch

By Assistive Inte… on Wednesday, July 17, 2024 - 12:19

plans to work on the Ray-Ban Meta glasses includes Access AI? I just realised, I should probs see if the Envision glasses work with Access AI?

By Ollie on Wednesday, July 17, 2024 - 12:19

I think the Aira thing is just going to be numbers accessed through whatsapp, IE, calling sighted guides. I don't think it will have any AI/independent image identification, at least, this is based on what I read on the sign up page.

By Assistive Inte… on Wednesday, July 17, 2024 - 12:19

It is just the live service, even on the Envision glasses. Next question is to find out about the ARX headset - that works with Seeing AI and an iPhone version comes out in two months.

By Gokul on Wednesday, July 17, 2024 - 12:19

Has anyone explored the possibility of linking some random smartglass camera with one of these apps, say, PiccyBot? Is that even possible in the IOS environment?

By Martijn - Spar… on Wednesday, July 17, 2024 - 12:19

Hi guys, I added GPT4o Mini to the list of PiccyBot models. It is supposedly very close to GPT4o but faster. It's also a lot cheaper, so I may consider using it as the default model for the free version of PiccyBot if performance is good enough. Please try it out and let me know what you think?

By Martijn - Spar… on Tuesday, September 10, 2024 - 12:19

Hello guys,

Good to have the AppleVis forum back online!

I released an update of PiccyBot today. The main improvements are in localizations, fixed some language support issues.

I also switched the home screen buttons around, after feedback from some of you.

The chat window will give brief and fast responses now. It doesn't remember earlier questions yet, I am working on that, will add it in the next update.

Hope you guys will be back with feedback in this forum, it was sorely missed!

Thanks,
Martijn

By blindpk on Tuesday, September 10, 2024 - 12:19

Thanks for the conversation function, really nice to see that implemented. I have a question though, is the app using the latest models from OpenAI and Google, "chatgpt-4o-latest" or "gpt-4o-2024-08-06" for OpenAI and "gemeni-pro-exp-0827" (or something similar) for Gemeni? Especially the Gemeni model seems promising after testing it on Chatbot Arena.

By Martijn - Spar… on Tuesday, September 10, 2024 - 12:19

Blindpk, regarding your question about the models, PiccyBot currently uses gpt-4o-2024-08-06 for the main OpenAI model and gemini-1.5-pro-latest for Google Gemini. In addition to this, you can use the GPT40-Mini and Gemini Flash models as faster but more limited options for these two.

As soon as new models surface I'll try to include them. Looking at Reflection and Grok 2 at the moment.

By Shannon on Tuesday, September 10, 2024 - 12:19

This is a really fascinating app. My daughter took a boat ride down the Mississippi and she sent me videos from that boat ride and piccyBot describe them just fine. It was awesome to get it described in real time as it were.

By blindpk on Tuesday, September 10, 2024 - 12:19

Thank you for the fast response and really nice that you stay on top of things. This is one of the strengths of this app, that you can try out different models for the descriptions.

By Ollie on Tuesday, September 10, 2024 - 12:19

Any more thoughts on PiccyBatch?

Man, it's nice to be able to talk to you all directly again. That was a kinda horrible dream there for a spell.