New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By TJT 2001 on Tuesday, September 10, 2024 - 10:02

I downloaded this app today, but I'm having some trouble sharing images from other apps to get descriptions with PiccyBot.

I was able to get descriptions of an image I shared from my camera roll and also one from WhatsApp, but when I shared one from Safari and another from Discord, the app continued to play the loading sound even after a whole minute had elapsed. When I tried to share a couple of images from the Dystopia app for Reddit, I was taken to a screen with no elements on it, and nothing further happened, even after waiting a whole minute.

I'd have no problem paying the subscription to help cover the development and operational costs of the app, but it doesn't seem to be able to meet my use case.

Am I doing something wrong? Why is the app able to only receive images shared with it from certain apps? This is not an issue with other image description apps.

By Martijn - Spar… on Tuesday, September 10, 2024 - 10:02

Correct, the app can describe videos and images from your phone library, Whatsapp, and Messenger. I am still working on supporting more apps. But for now, either open PiccyBot and select the media from there, or save it from the app to your library and share it to PiccyBot from there.
Can you confirm which image description app does this properly directly? I can then study their method and hopefully implement it in PiccyBot as well.

By TJT 2001 on Tuesday, September 10, 2024 - 10:02

Thanks for your prompt reply. I didn't realise that there were differences in how image recognition apps received data from other apps.

I'm able to get descriptions from the apps I mentioned using Seeing AI and the Be My AI feature in Be My Eyes.

By Bingo Little on Tuesday, September 10, 2024 - 10:02

I was on holiday in July and sent it a montage of photos put together into an MP4 video to describe. it did pretty well, but then started talking about 'we can hear the wind whispering through the leaves of the trees and the sound of the waves'. What nonsense! The video was silent i.e. no audio! Not that it matters enormously, but to paraphrase and channel the spirit of cricket commentator Fazeer Mohammed: Why did it do that? Unbelievable! I do not have the personality thing turned on, so that doesn't explain it.

Great app though. rEally fantastic.

How are people doing now getting it to describe the whole of a video? I still find that a bit hit and miss.

By LaBoheme on Tuesday, September 17, 2024 - 10:02

example, i was discussing about baroque furniture and specifically talking about the fitness of the grand piano in the photo. then i uploaded a timelapse video about a city street. the ai explains it is a city scene and there is no piano in sight. this behavior started in the latest version, and it happens 100% of the time (that is if you do ask a question and you then send a new photo or video).

By Martijn - Spar… on Tuesday, September 17, 2024 - 10:02

LaBoheme, thanks for reporting this. Checking it out. I suspect it is a side effect of the chat mode, which normally only deals with questioning a single image or video. Should be a minor fix, hopefully backend only.

By Martijn - Spar… on Tuesday, September 17, 2024 - 10:02

LaBoheme, I checked the question carry over issue. It is a result of the feature to start with a specific question. It will then continue with that same question unless you clear it. I could clear it automatically, but I can imagine you have a specific question like 'Is there a house in the picture' while going through a number of images one by one. If I clear in-between that would not be practical anymore.
So right now, if you start 'blank' PiccyBot will give you a general description of the image or video. If you enter a question, PiccyBot will then continue to use this question for any further images or videos until you clear it or edit it. Separately, you can go into chat mode and ask specific follow up questions on the same image or video.

If anyone has any suggestions for a better approach, please let me know?

By DMNagel on Tuesday, September 17, 2024 - 10:02

It would be nice if we could upload our picture along with our question at the same time, to avoid the full description and go straight to the important stuff. I once tried uploading a question alongside the picture at the same time, but the app ignored my question and gave me the full explanation. It seems like you can only really ask questions once the picture is already uploaded.

By blindpk on Tuesday, September 24, 2024 - 10:02

My suggestion would be a change of workflow. Instead of having the question box on the page before you chose an image/video, put a screen (or use an existing one) before you send the image/video to the model (the first screen would be empty except for the buttons). There you have the the question box, the history, etc.. When you have gotten your description, and the ability to chat and so on, and you choose another image/video, have the question box pre-filled with the question that was asked last, so that the flow will be speedy if you are to ask the same question about multiple images. If an image is shared to the app, put the user on the "question screen".
The only downside I see is that it might be one more button press before you get the description, but in my opinion the better logic outweighs that little inconvenience. As it is now the question box is a bit hard to grasp how it works.

By LaBoheme on Tuesday, September 24, 2024 - 10:02

a clear button to clear the question. right now, one has to tap the text field for the clear button to appear; the clear button should be visible whether the user is editing the text area. that would make life ten times easier.

By kevinchao89 on Tuesday, September 24, 2024 - 10:02

I've been enjoying and loving PiccyBot, especially for describing videos, which no other app can do! I wanted more, so I paid for the full version.
Most of the videos that I've been having described are from Meta Ray-Bans, which are shorter clips, less than 3 minutes each. I've been having to either count the videos or remember timestamps of which I just had described, find next to hav it described, and sometimes losing count in all of this.
A feature request would be to select videos for a given day or a set of videos, have first processed and described, then batch background process the others in the day/set, and play the descriptions in sequence.
1. Select videos for a day/set.
2. Process first/describe first.
3. While first is being described, Batch process other videos in set.
4. After first is done being described, play descriptions of other videos in sequence.

By DMNagel on Tuesday, September 24, 2024 - 10:02

I have used the app quite some time ago, and it appears that many things have since been fixed, so my bad.

By peter on Tuesday, September 24, 2024 - 10:02

I like the idea of having a Clear button. This seems to be a common way of handling th the issue of clearing a text edit box in other apps throughout the OS.

--Pete

By Kaushik on Tuesday, September 24, 2024 - 10:02

As I am from India, I would like to know or have South Indian languages detection of image when I share to this app, please improve this as well as give feature for PDF reading along with our South Indian languages and it will be better

By Gokul on Tuesday, September 24, 2024 - 10:02

So I've been hearing that Picstral, a vision-oriented model from Mistral has been doing really well in the image description department. Never checked it out, but the demos I've seen are quite good. They show it being able to solve captchas etc. So thought it'd be interesting to check out and include as a model if it's worth it.

By Gokul on Tuesday, September 24, 2024 - 10:02

Speaking of which, apart from the regular image descriptions, I was thinking it'd be nice to have certain specific functions tailored to maybe specific models which can do it really well, and a preset prompt making things easy. For example, a function that's just designed for easy solving of captchas, which is something that's never addressed properly, or a function that helps us match colors of clothings or something which'd really help fully blind folks. Just throwing out random thoughts.

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

Gokul, thanks for the suggestion. I have added Mistral's Pixtral model to the model list in settings. Please try it out? So far, I have seen it working well, but making some mistakes (switching left and right for example). But let me know what you think?

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

LaBoheme, Pete, based on your input I added a clear button that is always available on the input text field. I released a new update with that change just now. Hope it makes the process at least one step easier.

By inforover on Tuesday, September 24, 2024 - 10:02

When sharing a video from instagram, PiccyBot no longer seems to describe it. It's just stuck on please wait for ages.

By Gokul on Tuesday, September 24, 2024 - 10:02

So I played around with pixtral for a bit; interestingly, it seems to describe colors more vividly? than any other models around. Maybe it's my impression, but there it is...

By Charlotte Joanne on Tuesday, September 24, 2024 - 10:02

This seems quite funky!

"The image presents a captivating digital illustration of a human eye, which is the central focus of the composition. The eye is depicted in a close-up view, with the iris and pupil clearly visible. The iris is a vibrant mix of blue and purple hues, while the pupil is a deep black color.

Surrounding the eye is a network of circuit-like lines and dots, suggesting a technological or futuristic theme. These elements are rendered in shades of blue, purple, and orange, adding a sense of depth and complexity to the image.

The background of the image is a dark blue color, which contrasts with the bright colors of the eye and circuit lines, making them stand out prominently. The overall effect is a striking blend of organic and technological elements, creating a sense of intrigue and curiosity."

By Missy Hoppe on Tuesday, September 24, 2024 - 10:02

I just have to say that I still enjoy this app very, very much. My best friend's husband posts a lot of his art work and nature pictures on facebook, so if there's one I'm extra curious about, I save it to my photo library and then have PiccyBot tell me all about it. Wish I could somehow use piccy bot without saving the pictures first, but that's far from a big deal. :) Thanks for all the time and effort you've put into creating such a fabulous app. If I end up getting a new iphone 16 at some point, I'll be curious if its camera takes better pictures than my current 13 pro does.

By blindpk on Tuesday, September 24, 2024 - 10:02

Is it just for me or has the chat button disappeared? I've had a few images described today but the chat button is not there anymore.

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

It is turning out to be a good model to have in the list, thanks Gokul and Charlotte. And thanks for the thumbs up, Missy! Not sure what happened with your chat button, blindpk, haven't heard that issue before.

Lots of chatter about the OpenAI voice model release. Hopefully this week. And then let's see how to implement it in PiccyBot..

By blindpk on Tuesday, September 24, 2024 - 10:02

Yes it is a really strange issue, I will try experimenting a bit more to see if I can make it appear again.
I also want to thank you again for making this app! In this AI landscape where new models appear all the time it is fantastic to have many of them available in the same place and the new ones added quickly.

By blindpk on Tuesday, September 24, 2024 - 10:02

Turns out it was, at least mostly, my fault. The chat button is called "microphone" for me, but it works like the chat button would. I didn't check that one out earlier.
Two other minor things as well:
* If you turn of "waiting sound" there is still a, very discreet, waiting sound in the chat view. Actually, this quiet waiting sound is more pleasant than the standard one and it would be great to have as an option for the main screen waiting sound as well.
* The feature where the app gets a new description when you change model and leave the settings screen has a small bug (or is it intentional?) in that it activates even if you choose the same model as you had before. It only happens, it seems, if you open the model selection menu, if you don't do that nothing happens when you leave settings.

By hasajaza on Tuesday, September 24, 2024 - 10:02

Today I subscribed in the premium option, but nothing change in the app for me. is that normal ?

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

Hasajaza, subscribing should enable the settings screen. There, you can select voices, personality on/off, select an AI model, set the length of the description you want, enable longer and better video descriptions, and share the audio of the description.
If this was already working for you, you must have subscribed before? Odd, but if so, please cancel the subscription through Apple.
Blindpk, thanks for figuring this out. Clear points, will adjust the button description and make sure the waiting sound setting also adjusts the chat mode sounds.

By Gokul on Tuesday, September 24, 2024 - 10:02

or does the app not yet have an option to add a second/third picture so as to get comparisons etc?

By blindpk on Tuesday, September 24, 2024 - 10:02

No, I don't think you can feed the app more than one image at a time (I believe this is the feature that people eralier in this thread gave the name PiccyBatch).

By hasajaza on Tuesday, September 24, 2024 - 10:02

Hi
Unfortunately there is no settings option even after subscribing. In addition, there is still an add shown in the app. I canceled the subscription but I want to know if there is a solution.

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

Hasajaza, thanks for altering me. I know there is a problem with Apple connect today, it could influence the reading of the subscription status and causing this issue. Not sure, have to check, but if so it should be working fine again soon.

By Gokul on Tuesday, September 24, 2024 - 10:02

@Blindpk: Nope. PiccyBatch was for processing and adding descriptions to large number of pictures. What I am looking for is something simple; like what we have in BeMyEyes where you have an add picture button and then you can click a new picture and compare etc.

By blindpk on Tuesday, September 24, 2024 - 10:02

Oh yes, you're right, that is what PiccyBatch is. The first part of my previous message stands though (and yes, I would also like to be able to add/process multiple images).

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

Blindpk, I will look into multiple images, but at the moment it may complicate things (some models support, some not, how to mix video and images, etc.). It's on the list, but later.

By Martijn - Spar… on Tuesday, September 24, 2024 - 10:02

Inforover, thanks for the alert, the Instagram video sharing to PiccyBot should work again. Note that only posts from public accounts can be shared.
Apple connect is working fine again, so please try again Hasajaza, it should work.

By blindpk on Thursday, October 3, 2024 - 10:02

Over the last weeks or so a few new models have been released:
OpenAI: ChatGPT-4o-latest. This is not really new, but it seems it currently outperforms the standard GPT-4o with image descriptions (at a higher cost).
Google: New versions of Gemeni Pro and Gemini Flash (don't know how their API is structured, but maybe it already uses them, they end in -002 I believe).
Meta: Llama 3.2 in one bigger and one smaller variant.

If you haven't already, can you check if any of these can be (and are worth) implementing in Piccybot?

By Martijn - Spar… on Thursday, October 3, 2024 - 10:02

Blindpk, thanks for asking. I am working on adding Llama 3.2, that should be available this week. The Gemini models PiccyBot uses are already the latest.
Regarding ChatGPT-4o-latest: looking into it. OpenAI lists it as a text only model like O1, so it is a bit confusing. The limits for it are also very low, seems to be a mistake on OpenAI's part.

But definitely keeping an eye on any new models to add to PiccyBot to improve it further!

By privatetai on Thursday, October 3, 2024 - 10:02

Now that Be My Eye/AI allows PC access for window users, I'm again bringing this up: it'd be super super nice if I can access piccy via my PC. It's my favorite photo describing APPs by far on my Iphone, but labeling/renaming photos on the iphone is tedius at the best of times. If I can do my photo sorting on my PC, it'll be way faster and easier on my nerves :)

By blindpk on Thursday, October 3, 2024 - 10:02

Thanks a lot for the info! Very strange with ChatGPT-4o-Latest, it is definitely an image model, it is in the lead on Chatbot Arena's Vision leaderboard for one thing :) and I use it for images with my API key.

By Martijn - Spar… on Thursday, October 3, 2024 - 10:02

PrivateAI, thanks for asking. I am thinking of creating a browser plugin for PiccyBot. Just have to check how to make the free and pro versions happen. I don't want to deny you guys the best features but don't want to go broke either.
I'll definitely look into ChatGPT-4o-Latest, blindpk. It is very likely a glitch in de OpenAI documentation.

By Martijn - Spar… on Thursday, October 3, 2024 - 10:02

As requested by you guys multiple times, I have added support for sharing videos from TikTok, Facebook, SnapChat and Youtube. Just open the video and use the share feature to send the video to PiccyBot.
I appreciate the feedback on this to get this working perfectly! Please let me know how it works for you. Note that it is still limited to shorter videos only but I could increase the length later on when it is all working stable.

By blindpk on Thursday, October 3, 2024 - 10:02

To have Piccybot available on PC would of course be great! As I have understood it, som e here use Chrome and some Firefox, so if you make a browser plugin it would ideally support both. Of course you must look for a good and sustainable financing option.
Then a question about sharing from Facebook, do I understand it correctly that this is only for videos and not for images?
Keep up the good work!

By Martijn - Spar… on Thursday, October 3, 2024 - 10:02

Blindpk, correct, I gave priority to video sharing from Facebook before images. You can describe images in multiple ways already and you could save to phone and describe it from there, so there is a workaround. But will add direct support for that as soon as the rest is stable.

By blindpk on Thursday, October 3, 2024 - 10:02

Yes, that sounds like a reasonable priority, as you say, there are workarounds for images, even if they are a bit annoying (especially since no description apps that I know of can describe images directly from FB specifically).

By Brad on Thursday, October 3, 2024 - 10:02

I know for a fact this took a lot of work to implement so I don't want to take away from that in the slightest but at the moment, it's like i'm still checking out a still image.

Would it be possible for the video to play and either have the AI make an audio description track, it would probably have to be trained but I honestly have no idea where to begin with that.

Or, play a bit of the video,, explain that bit, play the next bit and so on?

This is interesting but I couldn't run a youtube video through it and sit back and relax knowing that the AI would describe things I'm missing visually, at least not yet, I hope it can get there one day though.

I tried it with a couple of shorts and it works but it feels a bit... I'm not sure, gimiky? I don't like saying that but at the moment that's how I feel.

Perhaps you could contact the NFB, I think they describe things for Americans but I'm not sure as I'm in the UK, and ask them if they'd allow you to train your moddle on older AD scripts, if they exist.

ideally what i'm looking for is an app on my pc where I can put a video through it, wait a while for all of it to be put together and then have it described in a professional mannor along with the video playing.

I really think a donate button would be great in the app, even though i'm giving quite negative feedback, in my oppinion, i'd love to donate a bit more to this ap.

Also, there doesn't seam to be a way to restore my perchace, i know I bought the app, but I keep removing it but when I download it again it acts like I'm downloading it for the first time again, is that something that needs to be worked on or have I just not found the setting?

Also you'll want to label the help for the settings screen button something a bit more helpful than what it is at the moment.

By GayBearUK on Thursday, October 3, 2024 - 10:02

Hi! Tried this app at work today. I'm very sad to say that my first impression of the app was awful. It gave me a basically correct description of what was in the picture, but added lots of flowery non-sense that was unnecessary. When I asked to describe the man in the picture it refused to give me any details about what I looked like or what I was wearing.

Tried another picture and aasked it to read some printed info to me. Again described other things but refused to describe what I asked for. Again and again.

Not sure how this app could be helpful? Am I doing something wrong?

By Brad on Thursday, October 3, 2024 - 10:02

If you go into settings and turn the personality off it improves things but I'd personally not use it to describe stuff.

The video thing is great and the developer really does care about making the app the best it can be but for me I'd not use it to describe things.

Then again, perhaps it's because I was born blind, I don't need to know all this stuff,

Perhaps it's because I was born blind but I find it a bit to much. It describes so much stuff that just isn't needed, that's not the developers fault at all, all these AI moddles do that. You can go into the settings and lessen the tokens used to make a shorter description but I just find it's a bit to personal?

No, personal isn't the right word but I don't know what else to call it.

For example, if i'm looking at a Harry Potter short, I don't need to know that character x is formitable, or character y is ready for action, that doesn't really tell me anything. Put it this way, you can tell it's written by an AI.

I think it really needs to be trained on AD data to get the tone right.

By Charlotte Joanne on Thursday, October 3, 2024 - 10:02

When I tried it, you had to pay to be able to tone it down. I objected to that, so use something else, but lately the work the dev has been doing is starting to make me think about changing my mind.

Brad, those tings mean something to me, probs to anyone who has been able to see. Like colours and prespecitve and shadows I suppose.

Im not sure about this app, but with others you can create your own prompt to, for example, not use any vision-related words, or not talking about people Etc. It really can improve things.

Also, OPEN AI launch visual fine tuning, I wonder if someone will use that to create better models for the blind?

By Brad on Thursday, October 3, 2024 - 10:02

The video is an animated re-enactment of the Battle of Hogwarts from the Harry Potter movies, using Lego figures. The scene opens with a large Lego Hogwarts castle, with Death Eaters flying overhead. The sound of battling and explosions can be heard in the background.

The first character we see is a Lego Harry Potter, who is standing in the center of a courtyard with other Lego characters. Harry is looking distressed and is holding a brown, pointed hat in his hand. He is dressed in a grey and white sweater and blue jeans.

Then, we see a Lego Voldemort, who is standing in front of a group of Death Eaters. Voldemort has a pale white face with dark, piercing eyes and a mischievous smile. He is wearing a long, green robe with splashes of brown dirt.

The camera moves to a Lego Bellatrix Lestrange, who is standing next to Voldemort. She has long, black, curly hair and a stern expression on her face. She is wearing a black dress with a necklace and black arm bands.

Next, we see Harry draw a silver sword, which is similar to the sword of Gryffindor. He is looking determined and ready to fight.

The video then shows a series of close-up shots of Lego characters battling, using magic and swords. We see Harry and Ron Weasley, who is also dressed in a school uniform, fighting alongside each other, showcasing their close friendship. There is also a Lego Hermione Granger, who is looking fierce and confident as she points her wand at the enemy.

Later, we see Voldemort and Harry fighting on top of one of the towers of Hogwarts. Harry throws a spell at Voldemort, who dodges it and then lunges at Harry. Harry tries to push Voldemort off the tower, but Voldemort grabs him. Finally, Harry manages to push Voldemort over the edge, and he falls off the tower, creating a huge explosion.

The video ends with a shot of the Lego Hogwarts castle.

It's cool, there's no doubt about that.

I know it seams i'm going back and forth on this, and i think that's because of the way it's formatted at the moment, if it could be trained on audio described data and play the video with the audio, I think that would make a masiv diffirence, although it would have to learn, some how, when to insert itself like an audio describer would.

The best part about this is, all of these feeling of mine are going to change next year, this stuff will kepe improving and who knows where we'll be.

@Charlotte Joanne I'd personally pay for it even if you don't use it, why? Because this dev really does care about feedback and they understand along side us that this is a new technology and they really are trying to please people, even people like me who can be quite negative at times.

I've been being a bit negative in my last posts but don't get me wrong, this app is amazing, I just can't see me using it out and about at the moment.

I really do want a donate button in the app, i'd honestly donate a lot to it even if i don't use it, the more money the dev has, the better their app can be.