New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek



By Brad on Friday, October 4, 2024 - 14:03

I can't seam to play the audio atached to my mail when sharing with piccybot, is that a bug or just a me thing?

I'm using vlc to play it if that helps.

By Brad on Friday, October 4, 2024 - 14:03

I think that a little description for each moddle would be nice, why would I want to use moddle x over y?

If that's already in the question mark thing on the top left, at least that's how it is for a blind user, great! I just thought i'd throw it out there.

The more I listen to AIs right, the more there's a pattern, at the moment I'd not say they right like a human does, I really do wonder where we'll be with that this time next year.

By Brad on Friday, October 4, 2024 - 14:03

Now, it's not for me, but once they manage to sink the description with the audio/visuals of the video, if that's possible, then it will be great and if they make an addon for firefox, I'll pay for it straight away.

If possible I'd like more voices but that can be added later if at all.

Please focus on the video side of things and syncing audio and video together if you can.

By inforover on Friday, October 4, 2024 - 14:03

I love the amount of description you get, I think it's great. I want a picture painted in my head and the more info it gives, the better IMO. I was also born blind so I think this is a very subjective thing.
As for an audio description like experience, I think that'd be crazy difficult to immplilment. You'd have to have it understand the contexts of the video to put description in the right places somehow and that's going to require a lot of work on Martijn's end, if it's even possible at all.
Love the idea of a donate button on the app as well.

By GayBearUK on Friday, October 4, 2024 - 14:03

Thank you, but what do you use the app for then if not descriptions? Maybe I'm thick here, but I'm not getting what the benefit of this app is? Just trying to understand.

By Gokul on Friday, October 4, 2024 - 14:03

I guess the amount of details that one prefers in a description is entirely a subjective thing; I am born blind, and prefer rich, detailed descriptions with a detailing of the colors, tone etc almost 70% of the times. It's only when dealing with work-related data or something that I prefer a concise approach. But maybe a setting to determine what kind of description one needs might be nice.

By TamagotchiTune on Friday, October 4, 2024 - 14:03

I was born blind and I am loving the detailed descriptions this and every app gives. I think it is a person to person preference. I am going to test this app on youtube on some shorts, this is a great app!

By Brad on Friday, October 4, 2024 - 14:03

@gaybearUK, I don't really use the app.

I try it, find a feature to be neet and then delete it.

@inforover oh it would be really hard to do,, I don't think we're there yet, but I do think we'll get there one day and it won't be far off.

By privatetai on Friday, October 4, 2024 - 14:03

After updating to the latest version, now when I try to share a photo from dropbox, no matter via "share" or "export" piccy shows "fetching data" and then "please wait" and then sits there forever doing nothing like it's frozen.

By Kaushik on Friday, October 4, 2024 - 14:03

Hello developer, this is Kaushik from India. Your apps. Accuracy is very good, but we need this app to recognise Indian regional languages so that we can use seamlessly and try to bring in the feature to read PDFs and other book formats with the best affordable rate for everybody now recently in India, iPhone purchasing has been increased by our visually impaired community. Do consider this. Thank you.

By Martijn - Spar… on Friday, October 11, 2024 - 14:03

Kaushik, at the moment PiccyBot supports the Indian languages Hindi, Bengali, Gujarati, Haryanvi, Marathi, Punjabi and Sindhi. Will add further languages in due time, when usage from those regions goes up.
Privateai, I released an update today that should fix the dropbox and whatsapp sharing issue. Hope it all works ok again!

By Gokul on Friday, October 11, 2024 - 14:03

@Martijn. do consider adding a couple of South Indian languages if you are at it because I know for a fact that there are a reasonable number of users from these parts to make the effort worthwhile.

By mr grieves on Friday, October 11, 2024 - 14:03

Sorry if I've missed this, but is sharing an image from email broken?

I'm using ios 18.0.1 and I share to the pixies and it makes the waiting noise but never seems to get past it. I ended up using Be My AI which worked OK but the descriptions weren't as verbose as I get with this app.

(By the way, I hope the dev doesn't find it annoying that I always refer to this app as the pixies. By the time I realised it wasn't called Pixie Bot it was already cemented in my brain as that. And I kinda like that it has a pet name. And given my username I can say it's definitely not meant as demeaning or anything like that.)

By Brad on Friday, October 11, 2024 - 14:03

If we get that donate button i'll use it.

I can't imagine where this will be this time next year, or the amazing stuf to come out by then, I can't wait!

By privatetai on Friday, October 11, 2024 - 14:03

Thanks for fixing it :) I store most of my photos in dropbox so I was sad not being able to use it :)

By Winter Roses on Friday, October 11, 2024 - 14:03

I want to start by saying I really love this app and the fact that it can describe videos. It’s an incredible feature, and I find it super useful. However, I do have some feedback that I think could make it even better.

Right now, the limitation is that the app can only describe 60 seconds of video, which is about one minute. I understand the challenges behind processing videos for descriptions, especially when the app needs to download and handle the video on the device. However, I wonder if there could be a way to work around this. For example, what if we could watch videos directly from platforms like YouTube, and somehow screen-share or sync it with the app to receive real-time descriptions?

As a blind person, I really appreciate being told what’s happening in a video, but it’s hard to know how frequently scenes are changing or what exactly is going on during more dynamic content. It would be great to have a way to know the timestamps for when events happen and how often things change from one scene to the next.

Another issue I’ve come across is that, to my knowledge, we currently can’t mute a video on YouTube and still have it described. I think it would be incredibly helpful if we could mute the original audio on videos, particularly for things like music videos, and have the app provide the description instead. This way, I could choose to listen to the video’s audio when I want, but also have the option to mute it and have the app describe the visual content for me.

I hope this feedback is helpful and that it’s something that can be looked into in the future. Thanks so much for all the hard work that’s gone into this app—I really appreciate it.

By mr grieves on Friday, October 11, 2024 - 14:03

I tried opening up the same image from my email later on and it worked fine, so must have been just a temporary glitch. Sorry, was a bit trigger happy with my post yesterday.

By Martijn - Spar… on Friday, October 11, 2024 - 14:03

Thanks for the feedback guys! Winter Roses, I will increase the length of the video that can be processed. Already did it for the Android version, the next iOS release will have at least double the duration for pro users.
I have also added support for the sharing of Reddit videos, there were quite a few requests for that. If any of you have any suggestions for more specific video sources that will be helpful to describe, let me know..

By privatetai on Friday, October 11, 2024 - 14:03

Not sure if it's set this way, but when I use the ask more feature, the responses are rather short. I prefer this app's long and detailed descriptions, and would like it if we can get similar length on chat- according to our setting preferences. Also, sometimes on long descriptions the text cuts out before getting to the end. Is there a way to put in a "continue" button or something to prompt it to finish from where it left off? From experience using chat AIs I know often all you have to do is type "continue" into the prompt, but when I did that pixxy simply re-analyse the photo and generated a new description rather than continuing the previous thought.

By Troy on Monday, October 14, 2024 - 14:03

I had a halarious moment with this app. I took a selfee and I do have a lot of skin tags and it said "That man must be in a lot of pain and discomfort with those skin liesions lol.

By Laszlo on Monday, October 21, 2024 - 14:03

No matter what I ask in the chat interface about an image or video, all answers get severely truncated. This seems model-independent as it is the same with Claude 3.5 Sonnet on images and also for videos which use another model. Answers are truncated to nearly the same length for images (about 40-50 chars), and somewhat to a longer length for videos, but for the latter case truncation is also very severe.
Instructing to continue doesn't help at all. In that case the initial description is reiterated, but also truncated severely. So I don't think at all that truncation occurs at the model level, but instead it happens somewhere between the model and the displaying of the answer. What I get as an answer shows that it would be completely coherent and appropriate hadn't it been truncated badly.
I set the length parameter in Settings to 100 %. As I am a lifetime subscriber I have access to that screen and I could adjust that.
I have the latest version (2.4). I use piccybot in Hungarian, however I seriously doubt this has any significance regarding thhis truncation phenomenon.
Unfortunately this bug renders the chat interface (invoked through the "Ask more" button at the bottom of the main screen) practically unusable. Otherwise I love the app very much!!!

By Martijn - Spar… on Monday, October 28, 2024 - 14:03

PrivateAI, Laszio, thanks for the feedback! An app update is available that should fix the chat responses. They should be medium length and take into account the information already given. Hope this works ok, let me know what you think?

By privatetai on Monday, October 28, 2024 - 14:03

Work's great so far! Thanks for the fix! I want to take this chance to request if we can adjust the volume of the AI in the APP itself. Currently, my default voiceover is way louder than the AI, so when I have my normal volume, I can hear my voiceover but can't hear the AI, then I turn volume way up on my phone, now I can hear AI but everything else is way too loud LOL.

By peter on Monday, October 28, 2024 - 14:03

If you add Voiceover volume to the rotor settings you can change the volume of voiceover relative to the overall volume on the phone. That also works on the Apple watch.


By Laszlo on Monday, October 28, 2024 - 14:03

I got the 2.5 update with the chat interface fix. Although I have it since only about an hour, I managed to test it in English and Hungarian, both with Mistral Pixtral and the video description model. I can report that I am satisfied, I've seen no truncation in the chat answers, they seem to comethrough fully and yes, they are to the point. I hope it stays so, and thanks for the quick fix!!!
I've experienced only one strange thing with 2.5 update. There is that edit box one or two right flicks away from the top of the main screen. I call it the prompt edit box, as it contains the instruction that mainly guides the image/video description process, and so that classifies as a prompt in AI terminology. Before this 2.5 update, if piccybot was set to Hungarian, by default the prompt edit box contained "Mi van ezen a képen?" (what's in this picture?) for images and "Mi van ebben a videóban?" (what's in this video?) for videos. Now by default this prompt edit box seems to read "Kérdezz a piccybottól a kérdéseddel" (literally ask Piccybot with your question - a sentence that definitely sounds clumsy in Hungarian), which is not appropriate for a Hungarian prompt text. Nevertheless descriptions seem to work okay this way too - so far. Piccybot is soooo versatile really that I simply haven't gone through all the combinations I use this app on with the new update yet: pictures and videos taken on the fly, pictures and videos from my gallery and also shared from other apps, like mail. So I don't know yet whether this prompt text thing is really a bug or not. Time will tell.
One more thing about Hungarian. Though every part of Piccybot seems to support it quite well, Hungarian cannot be selected from the supported languages list from the settings screen, because it is not listed in the dozens of languages there, nor can it be found with the search edit box on that screen. So I access it with the "phone system language" setting, and it works this way. This is only a very minor nuisance mostly, that can easily be fixed in one of the future versions with other bugfixes.
All in all with the chat interface fix seemingly in place, Piccybot is really a bright gem in my"vision toolbox" on my phone: many models, many languages, extremely diverse possibilities. So thanks much!!!

By Laszlo on Monday, November 4, 2024 - 14:03

Late night on this Tuesday (29 October) video descriptions stopped working abruptly and haven't returned to life since then. After the waiting sound "server error" is displayed where the description should appear. This is independent of language: I tried with several languages and the result is the same. I suspect an API change at the side of the video description model. I ruled out other regular causes of such a disruption: net etc. are all fine.
By the way I noticed accidentally that now Piccybot lets me record a video over one minute (I am a lifetime subscriber). Thanks for that much!

By Martijn - Spar… on Monday, November 4, 2024 - 14:03

Laszio, there was indeed a server issue on Tuesday, but it should be working ok now. Can you try restarting phone and PiccyBot and try again?

I will be adding backup services for these situations when one provider goes offline.

By Laszlo on Monday, November 4, 2024 - 14:03

Thanks much! Closing Piccybot from app switcher and then starting it again was enough to get it working again. I was quite sure that I had tried that simple remedy before, but it in fact turns out I haven't.
By the way after the app restart video descriptions come through in a drastically different style than before the server error. I know well that each generated text has a bit different style and characteristics, but this time the difference is much more pronounced. The video description is more compact, has a more straightforward style with less details, and I experience much more hallucinations than before and they are quite radical ones indeed. I haven't changed anything in the settings.
Have you somehow changed which model does the video descriptions or what may be going on?

By Martijn - Spar… on Monday, November 4, 2024 - 14:03

Laszio, thanks for confirming that this is a workaround for now. I have not made any changes in the setup from my end but on the side of the models things seem to have changed. Working on that the coming days to get it back to a fully stable and reliable setup. My focus has been on getting the realtime voice to work in PiccyBot, but this gets priority now.

By blindpk on Monday, November 4, 2024 - 14:03

That sounds interesting. I'm not much for talking with tech but in this case I might give it a shot when it is ready.
There are also some new models out now that might be of interest, both a new version of Claude 3.5 Sonet and a model called Molmo that is said to be quite good with images (in addition to llama 3.2 and chatgpt-latest which I mentioned before and that might be implemented already, was a while since I checked).

By Gokul on Monday, November 4, 2024 - 14:03

Does it mean what I think it means? No right?

By Laszlo on Monday, November 11, 2024 - 14:03

After installing the latest update that - among others - aims at improving video processing stability, I once more get those much more detailed, much more accurate and much more useful video descriptions that I had got before the "server error crisis" of 29 October. Thank you much for the fix, I highly praise it!

By Brad on Monday, November 11, 2024 - 14:03

They manage to play a bit, describe it, then play the next bit. Honestly if you could do this, or have it as a toggle, you'd have them beat in my oppinion, as you can already describe short youtube videos.

By inforover on Monday, November 11, 2024 - 14:03

Could this be a toggle if you do look into doing this? I prefer the way PiccyBot does it rather than the way Seeing AI does it.
Thanks :)

By mr grieves on Monday, November 11, 2024 - 14:03

I am repeating myself a little from the Seeing AI thread, but the way I see it is that PiccyBot and Seeing AI are providing an entirely different perspective on a video and I really appreciate having both options.

The pixies describe a video like someone reporting back on what happened. It goes into more detail and paints a pretty vivid picture of what is happening.

Seeing AI on the other hand feels like I am watching the video myself but is giving me a lot less detail.

I honestly like having both options available because they both serve very different purposes. You couldn't get the level of details the pixies give you if you were to use the Seeing AI way.

Having said that, a lot of people are basically for asking for Audio Description of the videos so there is clearly an interest in users of PiccyBot. But I'd hate to sacrifice the level of detail I get to achieve it.

I personally am happy switching between two different apps for this as I usually use the share sheet to get videos into PiccyBot and Seeing AI anyway. But if something like this does come to PiccyBot I too would like it to be optional. Actually if it was a toggle in the UI that appeared in the main interface when I was watching a video that would be even better so I cold quickly listen to one format or the other.

By Brad on Monday, November 11, 2024 - 14:03

I keep thinking of audio description then realising that we're not there yet.

I don't use this app so will let those that do right more about it and will stop. I don't need this app so shouldn't ask for things if i'm not going to use it.

I've tried the video feature and while it's not for me, I can see that a lot of work went into it, perhaps one day in the future we'll have an app, perhaps this one, who knows? That can be trained on Audio Description, let's see what the future holds for us :)

By Martijn - Spar… on Wednesday, November 20, 2024 - 14:03

I have added a new AI image description model to my app PiccyBot. It's called 'Gemini experimental 114'. Some comparisons indicate that this might be the best image description model yet.
If you are a PiccyBot Pro user, please try this model for image description and compare it with 'regular' models like GPT4o, Claude Sonnet or Mistral Pixtral. I wonder what you guys think. If it is indeed the best model at the moment I will replace the default OpenAI model in PiccyBot with this one, for free users as well.

I am also working on enhancing the video descriptions. A lot of you have been asking for descriptions of long videos. I can't do very long videos as that would just become too slow and expensive for me, but I am working on a compromise where PiccyBot describes videos upto 5 minutes completely, and if it is longer the app will look for any available transcription of the video and summarise it with that.

Still working on realtime speech and interacting with live video. That one is tough though, and OpenAI is only gradually reducing the API pricing for it. Progressing, but will be another few weeks I feel.

As always, really appreciate the feedback guys! Thanks!

By Brian on Wednesday, November 20, 2024 - 14:03

I am currently not a pro user. However, if you do enable this application to audio describe up to 5 minute videos, I will absolutely consider it. Because then I could use it to audio describe videos I take with my Meta smart glasses, which currently max out at 3 minutes.

By Louise on Wednesday, November 20, 2024 - 14:03

I tried the new model, but it just said processing, and then initializing, but never did describe the image. I switched back to GPT 4.0, and it worked as expected.
I do have pro.

Hope this helps.

By Gokul on Wednesday, November 20, 2024 - 14:03

I tried Gemini Experimental and I have to say this, it's brilliant! of all the models so far, it's given me the most detailed, factual descriptions, describing almost all the elements that can be described. I only tried with pictures taken in professional settings, so can't say if it can bring into focus the more abstract, emotive elements the way gpt4 can... Will try it some more. But it looks interesting, especially if what you want is detailed, vivid descriptions the way I want.

By Brassknucklebeauty on Wednesday, November 27, 2024 - 14:03

I only created this account because I was dismayed at some of the responses from individuals. Often times as blind individuals. The first thing we say is that we don’t like something or there’s something else that could be made. Yet innovation and different ideas is what this world is all about. That’s why we’ve moved forward as a community. as an individual that does accessibility testing I am thankful and grateful that there are different apps that have different functionality based on someone’s beautiful creative mind. Yes, there is be my AI that is a part of Be My Eyes, Seeing AI and too many more to name yet what I find amazing as a person that supports hundreds of blind content, creators is the fact that this gives funny descriptions for both video and audio, and we must remember that everybody does not have equitable access to specific elements of technology advancements or someone to teach them about these things. essentially, I’m learning that more people in this community need to learn how to deliver a response in a better way Lastly, if everyone just stopped at McDonald’s and believ A burger joint because there was only McDonald’s and there wouldn’t be Burger King, Wendy’s, Nations, red Robin ECT. There’s room for every technology advancement that is going to help blind people thrive. absolutely love the app 10 out of 10 recommend so many people are using it on TikTok in the blind community on the application. Thank you for your work. I love how it has sensitivity to detail such as how it describes African-American people, the scenery and so forth. This has been the first app that I’ve had easy access to that can describe things for me as I’m on my content, creator journey. thank you for so much care to detail and the price point was amazing! I wish you the best of luck in your endeavor.

By inforover on Wednesday, November 27, 2024 - 14:03

Hey Martijn. I can't seem to get videos described, it seems as though it's just stuck on please wait. This is for both videos on my phone, and when sharing from social media sites.

By Martijn - Spar… on Wednesday, November 27, 2024 - 14:03

Hi guys,

I released a new update today, which increases the duration of the videos PiccyBot can describe to 5 minutes. This is for the pro version. Hope this helps!

Inforover, I am not sure what is the issue, might have been a temporary glitch. If not, please restart phone and/or enable/disable any VPN you might have running?

Gokul, Louise, thanks for trying out the new Gemini model. It may have been down due to high popularity initially. It seems fine now, although it still may be slow.

BrassKnuckleBeauty, really appreciate your feedback, very encouraging. Thanks!

By Emre TEO on Wednesday, November 27, 2024 - 14:03

I can't select an AI model in the app's settings. When I double click on the relevant button, instead of opening the model list, the Select AI model button is selected and tapping again deselects the button.

By Gokul on Wednesday, November 27, 2024 - 14:03

Yeah, the new gemini model is a bit slow, but it does give incredible descriptions.
I really wish you'd at some point consider adding an option to have multiple images and a comparetive description, especially now with the new gemini model in, that'd be incredible.

By LaBoheme on Wednesday, December 4, 2024 - 14:03

so if there is a table, and there is a vase at the upper right corner of the table. now the user holds her phone landscape and take a perfect picture, the vase would now at the lower right corner of the table. didn't we fix this before?