New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Brad on Tuesday, January 28, 2025 - 21:11

If anyone else wants to try it, you won't be able to navigate using headings, buttons, or any of those features.

Like I said, I love the speed of this thing,, it's so smooth, but if i can't use navigation keys in the public version then I'd not want to buy the more enhanced one.

Sorry for being off topic but I just thought I'd let other blind people know.

By Laszlo on Tuesday, January 28, 2025 - 22:11

Hi Brad,

You CAN now use those navigation features in the public welfare version with Chromium-based browsers (e.g. Chrome, Edge etc.). So this restriction was partly lifted.
For a heap of further information, please check your e-mail and you will find my detailed reply to all your questions. I did my best to answer them.

By Brad on Tuesday, January 28, 2025 - 22:11

Thanks, I will do so.

By Martijn - Spar… on Wednesday, January 29, 2025 - 07:11

Laszlo, thanks for noticing the DeepSeek addition. It's the 7B model that I installed locally on one of my own servers. So not very powerful, as this server is not the best. It is more a proof of concept. One of the good things about it is that I have full control over it. I love open source and DeepSeek was clearly built on top of Meta's Llama, with a lot of smart optimisation steps.
The version I am running for PiccyBot only describes images, for video it will default to Gemini at the moment.

Now the stage is set.. With these kind of open source models available, it shouldn't be too expensive to train a model specifically tailored for blind and low vision use.

Another point is the censorship. At the moment the model will still walk the Chinese government rules and limit output that way. I am sure there will soon be models that will strip these restrictions. The current local model may be less censored as far as sexuality and such, still have to check that.

I have also updated PiccyBot, it should be more stable now, earlier it could get 'stuck' after many requests. It also includes a push notification to tell you when the processing of a video is finished. And you can minimise PiccyBot now while it is processing. It will play the description in the background even when you are continuing with another app.

Another development is the PiccyBot WhatsApp service. Particularly useful for Meta Rayban users who are banned from the 'look and tell' function. Sending a video or image to PiccyBot on WhatsApp will result in an audio description. Bit slow and somewhat clunky but at least it will enable handsfree video descriptions while wearing the glasses.

Good luck with the app guys, let me know how things work for you?

By mr grieves on Wednesday, January 29, 2025 - 14:11

This sounds great - is it available now? If so, how do I use it?

By Maldalain on Wednesday, January 29, 2025 - 15:11

Please make it available on the MacOS. We need an app like this.

By Gokul on Wednesday, January 29, 2025 - 16:11

@martijn Exactly that's what I am excited about! even with deepseek, everything is open source and available out there isn't it? Speaking of which, what about Llama?

By privatetai on Wednesday, January 29, 2025 - 20:29

thanks for the new update! I have tried it and can confirm that the audio will continue to play even when you lock your phone or go to another app. However, if I lock my phone or minimize the app and go to another app while processing, it seems to stop processing because when I come back to the screen, all it shows me is retry and no description was generated.
incidentally, I don't know. Have anyone requested this feature yet, but it would be nice to have a setting where we can set the app to auto retry when description fails or fail to mix audio, etc., waiting for 4:5 minute and then only to come back and have to manually hit retry again and again gets a little tedious. especially now, if the goal is to allow us to have it processing in the background, it makes sense if it would auto retry when fail. Maybe not indefinitely? Maybe auto retry five times or something and then sent a notification that says it has failed five times, please check the video, or something like that?