New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Martijn - Spar… on Thursday, December 5, 2024 - 19:19

Regarding the auto orientation, it should work. Could you have turned on the orientation lock on your iPhone?
Reka should work, but it is not a huge company and the model could be facing slower responses now and then. Still, it is good to have smaller players remain there as an option.

By LaBoheme on Thursday, December 5, 2024 - 19:19

i don't know what happened. wouldn't it be better if the camera function invokes the native ios camera when taking photos? it seems a better interface for voiceover users, one can adjust zoom, exposure time, focus lock, etc. i don't know if one can do this from the current interface, but it's certainly not doable for vo users.

By Icosa on Thursday, December 5, 2024 - 19:19

The availability of multiple models is great and I found the price reasonable, thanks. If one model is less good or refuses to describe something it's handy being able to switch to another, and the refusal can absolutely be a legitimate issue when shopping for clothing whether for yourself or a friend/partner. GPT4O is very quick to say lets talk about something else.

By Martijn - Spar… on Thursday, December 5, 2024 - 19:19

LaBoheme: You are right in that the native iOS camera offers great features. But the use of custom camera helps to keep the app’s workflow simple and efficient. It lets you take photos instantly with the volume button, avoids the extra "retake/use" screen, and ensures front-camera images aren’t flipped by default. But I'll keep it in mind for updates, as the new iOS versions might further enhance the default camera.

By Martijn - Spar… on Saturday, December 14, 2024 - 19:19

Hi guys,

I released an update today that adds the audio description to the video when sharing it, using the share button on the home screen. For subscribed users only. This was a feature requested often, and I feel it will be quite helpful.

I am looking at adding a live video mode as well, either with OpenAI or Gemini or both. But have to figure out if this is feasible, OpenAI's live speech mode was horribly expensive as a third party developer. But now that they already have competition it may be more economical from the start.

I am also working at running a Whatsapp service to describe videos, so PiccyBot can be indirectly used with Meta Raybans and possibly other smartglasses.

Exciting times indeed!

By Ollie on Saturday, December 14, 2024 - 19:19

This is fantastic. I've not been out a great deal, hibernating, so not got much content to parse.

I'll have a play and a great call re the WhatsApp solution. They do seem to be moving in lockstep with the video service so they should become more competitive cost wise. It might be worth looking at the frames per second. I'm assuming that is the defining cost factor.

to be honest, if you included a means of adding in one's own API, for now at least, in the paid version, it's something I'd be happy with. Even if you switched to a subscription model, I think I had a life time purchase, though give those with lifetime a break in terms of free for a year etc.

I think we all understand that there are continuous background costs here. I know few of us have much disposable, but this tech is life changing. I'm comfortable with having a premium plus service that's a few bucks a month, if it works and is easy to use.This is fantastic. I've not been out a great deal, hybernating, so not got much content to parse.

I'll have a play and a great call re the whatsapp solution. They do seem to be moving in lockstep with the video service so they should become more competative

Also, voted for your cool app in the apple vis awards. You deserve it.

By miguel3025 on Saturday, December 14, 2024 - 19:19

Hey!
I've been using the app more frequently, and I've been considering purchasing a lifetime license. What models and configurations are available in the paid version? Who paid, did you like it? It is worth it?

By Gokul on Saturday, December 14, 2024 - 19:19

Yes, it's more than worth it; and you got almost all the models you can think of as far as visual processing is concerned in there. And if you find a model that isn't there and which is really good, the dev is been really, really responsive so far.

By Brian on Saturday, December 14, 2024 - 19:19

So I finally gave this app a whirl, used my Meta Smart glasses to record myself playing a round of Mortal Kombat 11 on my PC, and then used PiccyBot to describe the video. It was awesome!
I am using the monthly subscription, but will absolutely be purchasing the lifetime access. This is a marvelous application!
Also, and I hope I do not get in trouble for this, but I may or may not have voted for PiccyBot for the Golden Apples of 2024. 😇

PS I went with Gemini Pro for my AI engine, not sure if that one is any better or any worse than the others, but it is what I went with.

By mr grieves on Sunday, December 15, 2024 - 19:19

The life-time subscription is a bargain. So yes it's definitely worth it, if only to be able to disable personality mode. The dev deserves all the credit he gets for his dedication to the app.

I'm also really excited by the thought of being able to use this via WhatsApp on the Meta Ray-bans.

By Laszlo on Sunday, December 15, 2024 - 19:19

Sometimes it happens that the "Record video" button is greyed out in the video recording interface. I tried to find any system in when this takes place, but to no avail, this seems entirely random. I first noticed this with 2.6, but waited in order to see whether it would go away. But no, today I updated to 2.8 and still saw this multiple times.
When this happens, the "record video" button stays greyed out and is stuck in this state regardless how many times I press "cancel" and so return to the main screen and retry recording video. The only really reliable method to bring it back to life is to close Piccybot from the app switcher and to start it again.
Furthermore I suggest that there should be a "retry" button in the main screen near to the description area. Sometimes the "server is overloaded, please try again in some time" error message appears instead of the description, and then there is no way to resend the video or image I have just taken to the server. It is only possible to take another video/image and to try with that, but the interesting moment captured beforehand would be lost this way. For images it also happens sometimes tthat no error is displayed, but simply no description is presented. The "retry" facility would be an immense help in all these scenarios.
Now Gemini experimental 1206 seems quite stable, that is the image doesn't get rerouted to some other "inferiour" model due to overloading, which was quite often the case 2-3 weeks ago. So now I especially like this model, as it provides all the details I am interested in: people, shapes, actions, spatial positioning of each content element, colours, lighting, atmosphere etc. And all this in vivid detail, but in a balanced and not "overdone" way and what's more practically hallucination-freely.
What I particularly like about Piccybot is that it is extremely light on battery. 3 weeks ago I watched a famous soccer match with the help of Piccybot and I took about half-an-hour of video altogether in several pieces of course to get them described. During all this it consumed only about 15 % of battery charge. This is very impressive! So keep up the good work!

By blindpk on Monday, December 16, 2024 - 19:19

I second Laszlo's idea of a "retry" button. It doesn't happen often for me that the request doesn't go through, but when it does such a button would be great.
As I've said before, one of the coolest things with PiccyBot is the amount of models to choose from, both the "pro" and "fast" ones. There are some models that I'd like to try out that is not in the app now (or maybe they are, but not under those names):
OpenAI: Chatgpt-4o-latest (2024-11-20) (I mentioned this before, but there was some bug with it if I recall correctly). We also have the o1 model getting image support on the horizon, but that might be too expensive to be practical.
Meta: In the app there is a Llama 3, but there has been a Llama 3.2 Vision released (maybe 3.3 too but I'm not sure if this has vision support).
Anthropic: There is a Claude 3.5 Haiku model out now that maybe could (or already has) replace the Claude 3 Haiku model already in the app.
Mixtral: Pixtral-large-2411 (might also already be in the app under the "Mixtral pixtral" name)

By Earle on Wednesday, December 18, 2024 - 19:19

Like the subject says, I'm wanting to share TikToks to PiccyBot. I thought I read somewhere in this thread that I could choose share while viewing a TikTok and share it directly to PiccyBot. This isn't working for me. I see messages, WhatsApp, and other options, but not PiccyBot. I don't even see an option to see a list of additional apps. Saving the TikTok to my photos works, and I can share to PiccyBot from there, but I thought you could share directly to PiccyBot from TikTok. What am I doing wrong, or am I just misunderstanding how this is supposed to work? Any help is appreciated. I'm loving PiccyBot and it's worth every penny.

By Brian on Wednesday, December 18, 2024 - 19:19

Never even thought of describing TikTok vids, but if we can then cool. 🙂

By Martijn - Spar… on Wednesday, December 18, 2024 - 19:19

Eerie, Brian, you can share the TikTok video to PiccyBot and it should describe it. PiccyBot is usually a bit hidden in the share sheet under 'more', but it is there.

By Martijn - Spar… on Wednesday, December 18, 2024 - 19:19

Note that TikTok sharing is not 100% stable for sure, they seem to change the format on a regular basis. But most of the time it works.

By Martijn - Spar… on Thursday, December 19, 2024 - 16:19

Laszlo, Blindpk, thanks for the suggestion on the retry button. I added it with the latest update and I have to say it helped me as well, since sometimes for whatever network or model reason, the result doesn't come the first time.
I have also added an audio mixer option for subscribed users. In settings, you can set a percentage how loud you want the original audio and the PiccyBot description audio of a video to be. PiccyBot will now combine the two audio streams when you are sharing the video. So this should give complete freedom in whether you want a description only video, or some of the original sound, or whatever you like.
I know it adds to an already complex settings screen, so if you have any recommendations, please let me know.

By blindpk on Thursday, December 19, 2024 - 17:19

Thanky you very much for the fast implementation. As for the settings screen, I don't personally find it that cluttered. You could of course put some settings under separate screens, like "video settings" and "voice settings", with the obvious drawback that it would take longer if the user wants to change a setting.

By Shaik meharaj on Friday, December 20, 2024 - 05:19

like speakaboo, It would be great if we could assign a shortcut to the action button that could directly capture and describe the seen.

By LaBoheme on Saturday, December 21, 2024 - 10:19

when describing video using claude 3-5 sonic and the "ask more" function, it always starts the answer with "using only English and without relying on visual references".
for example, when asked to describe the fingernails of the person, it said "Okay, let's focus on the fingernails of the hand, using only English and without relying on visual references."
why is it doing this? how can the model describe anything without any visual references? and i didn't ask the model to describe in other languages.

By Firefly on Saturday, December 21, 2024 - 11:19

So since the ability to share the new audio described videos with yourself or another device has been implemented, I have been able to do something very special. Last week, my wife, unfortunately lost her battle with leukemia. This has been an extremely difficult in trying time for me. It’s been very difficult for me and is still difficult for me to come to terms with this. But I found something that may make it a little less painful. I have taken all of the videos that my wife and I have ever done together on my phone, run them through PiccyBot, had them audio described and then saved the new audio described video to my Device, which now includes the original audio alongside it. So now I can look back on all of our videos and remember each good memory as if it was happening all over again. So I’d like to say a personal thank you to the developers of this app.

By Brian on Saturday, December 21, 2024 - 14:19

First I just wanted to say that, while I will never know what you are going through, nor could I ever know just what your significant other meant to you, I am truly sorry for your loss. For what it's worth.
Second, I think it is really awesome that you are using software such as PiccyBot, to enhance your digital memories of the life you shared with your wife.
May they bring you some semblance of joy in your darkest hour. 🙇‍♂️

By Martijn - Spar… on Sunday, December 22, 2024 - 08:19

Thanks for sharing your experience, I cannot imagine how hard it must be, but I appreciate you sharing this feedback and thanks, despite it all. It is very rewarding for me to know that the effort on PiccyBot can be so impactful. It definitely motivates me to keep improving the app further. Thank you..

By privatetai on Monday, December 23, 2024 - 04:19

I am using mainly the Mistral Pixtralb with videos share from YouTube. Not sure if that makes a difference. But when I get the audio description, it gives it like a summary of the video rather than scene by scene in sequential order. So I go back and ask it to do the description in sequential scene by scene order, and after processing, it says, something like merging audio, failed, and just display the text on the screen with the new result.

By Martijn - Spar… on Monday, December 23, 2024 - 12:19

Privateai, I have just released an update that should fix the merging audio fail. Please try it out and let me know if this works fine? How did the description scene by scene work out?

By privatetai on Tuesday, December 24, 2024 - 05:19

first, let me say that I am truly amazed at how quick issues get addressed. Now until the testing result. Yes, the merge audio error message is gone, but the new result is not being read out by AI, it just displayed on the screen. No problem, I thought to myself that it should be OK if I save the video with the new description, which is better because it goes seen by scene sequentialy. When I saved the video, it has this new description in the video file name, but when I played the video, the audio track doing the description is still describing using the original description that sounds like a summary rather than describing each scene. So it looks like the audio merge is done initially when the video is first described, but when you ask for a new description, it does not merge the audio again?