New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Gokul on Friday, May 3, 2024 - 09:12

It's been said here multiple times, but again, you can turn the personality off and it'll give you accurate describtions.

By privatetai on Friday, May 3, 2024 - 09:12

If you haven't, turn off personality in the settings will make it less ... mouthy and more factual. Also, if that's still not good enough, I usually prompt it with "Give me a 200 words factual description. Do not insert speculation or opinion, don't be poetic, give me facts only."

By Ambro on Friday, May 3, 2024 - 09:12

Do I need to subscribe to premium to deactivate the personality? Because in my app I don't see any way to access the settings. It just gives me the subscription button, then takes photos etc. Nothing about settings.

By mr grieves on Friday, May 3, 2024 - 09:12

Whereas I really like having the personality option available as it is fun to play with, it also pretty much renders the app useless in my opinion. So I appreciate it being there, but think it really should default to off. It seems like it is rightly confusing people, particularly if the option isn't available for non-subscribers.

By privatetai on Friday, May 3, 2024 - 09:12

"Whereas I really like having the personality option available as it is fun to play with, it also pretty much renders the app useless in my opinion. So I appreciate it being there, but think it really should default to off. It seems like it is rightly confusing people, particularly if the option isn't available for non-subscribers."

Totally agree. While it is fun to play with. I generally keep it off cause its accuracy suffers too much with personality is on- not to mention it's really quite annoying if you're having an off day :) So defaulting to being off for new users would make sense and just make it kind of a perk of subscribers.

Although, and I'm smiling as I type this, from a business point of view it almost makes sense to default it to be on. Like, it's so annoying that, when you pay for membership and can finally switch it off, in your head you go "Oh thank god..." But if it's off by default and I pay, and I find out what the "personality" is like, I may feel like "Ugh, I paid cause I thought its cool!" I want my money back!

By mr grieves on Friday, May 3, 2024 - 09:12

I don't think anyone is going to try an app, think "this is terrible, I'd better subscribe to see if that makes it good". But maybe that's just me.

By Ollie on Friday, May 3, 2024 - 09:12

I think there is, for me at least, an upgrade mentality that, if I throw some money at it, it will unlock the good stuff. This is despite over 40 years of disappointment...

The personality, to me, is the weakest part of the app. It's a novelty and kinda janky too. I get the idea behind it but it should go one way or the other, either be completely removed or be completely customisable with a user defined prompt as with Chat GPT. As it stands, it is wandering down the uncanny valley, confusing some and being a barrier to purchase for others.

As it isn't a subscription, and I'm probably wrong with this, I don't know if the dev can provide a trial period? Think that's an apple move as they want subscriptions not all out purchases. There is an app I use for my audiobook library on plex called prologue which seems similar in the way it has partial functionality until full purchase though, in that case, it is a far bigger user base and framed as supporting future development with only a small improvement in the apps functionality.

By mr grieves on Friday, May 3, 2024 - 09:12

I don't disagree that the voices can be thought of as a bit of a gimmick as they don't add any specific extra utility to the app. However, for me personally I find listening to more natural, human voices is just more pleasing so I enjoy using the app more because of them.

It's a bit like if I am online and have a long article to read, I will switch over to Edge and ask it to read it instead of using VoiceOver.

The VO voices are fine and do the job well enough, but I wouldn't say I actually like listening to any of them.

You can switch it off and just use VoiceOver - maybe this also needs the subscription.

I think if you do like the personality option then then they would make even more sense.

I'm assuming that now we have them, there isn't a huge amount of dev work going into that area but I might be wrong.

By LaBoheme on Friday, May 3, 2024 - 09:12

i just talked about how good it is, and now it's not there any more.

as for the personality thing, the regular output is still in the text. that is, if you have personality on, the voice output and text are different. you can read the "non personality output" using voiceover.

By Martijn - Spar… on Friday, May 3, 2024 - 09:12

Hi LaBoheme, there are some problems with the Claude Opus model since about a day. It is mixing up images. Had to disable it while working on it, hope I can add it back soon!
The Google Gemini model has been upgraded and has improved significantly from what I can tell. Have a look at that as well?

By Corsia on Friday, May 3, 2024 - 09:12

Thanks for your efforts first. My thought has already been mentioned above. About image description, Be My AI feature of Be My Eyes app is sufficient for me now. I additionally suggest video description for your development, or maybe simply an epub ebook reader first. Since the voice quality of VoiceOver has dropped suddenly, the voice is very muffle now, AI voices for reading text an document is very necessary.
Thank you for all your kindness again.

By Martijn - Spar… on Friday, May 3, 2024 - 09:12

As Martin reported, there was an issue with the sharing of images to PiccyBot. I have just released an update that should fix that. Please try it out.
The bug reported by PrivateAI was related to the Claude 3 Opus model. It somehow gets stuck on an image and/or mixes multiple images. I have disabled the model for now to sort this out.
The settings page has also been updated to refresh the image more consistently, as pointed out by Mr Grieves.
Thanks guys, keep bugging me and I'll see what I can do to further improve the app.

By Martin on Friday, May 3, 2024 - 09:12

Hi Martijn
It's still amazing regardless if I can't still share photos directly from X with the new update but I can always use other AI apps to get the descriptions... it's not a dealbreaker. I'm still paying for the monthly subscription...And yes please add the opus AI back to the AI models ASAP! I used that 1 the most but will give the other ones a go... thanks a lot for your help and work! You are doing an absolutely outstanding job! I really do appreciate this.

By Martijn - Spar… on Friday, May 3, 2024 - 09:12

Thanks Martin! I hoped that the fix for sharing larger images to PiccyBot would have helped you, but there is clearly still some thing to be done there. Will check that.
Claude 3 Opus is behaving again and I have added it back as an option. Good luck! If this or any of the other models responds wrongly, let me know please.

By LaBoheme on Friday, May 3, 2024 - 09:12

interestingly, one would think haiku is better than sonic, but sonic is actually better when it comes to image analysis in my own experience.

By Brad on Friday, May 3, 2024 - 09:12

If you want to use voiceover without them, turn them off. The dev has every right to pspice up their app with voices if they want to.

By Brad on Friday, May 3, 2024 - 09:12

You're very critical of this app and I don't see why? Just stop using it if you don't like it. The dev clearly just wants to have a bit of fun, they've done tonnes more by implementing other LLMs compared to BeMyEyes which is still just using chat gpt 4.

As for not wanting oppinions, in that case; stop posting, because you're going to get oppinions on your posts and their replies, that's how this hole thing works.

By Brad on Friday, May 3, 2024 - 09:12

I always find these kind of arguments to be amusing. It's attacking when someone disagrees with people like yourself but when you go after apps that's ok, right? I mean after all you're just wanting them to be improved, right?

It's not like your oppinion could be seen as an attack on the devs hard work, come on Lotty, I thought you were smarter than this.

Questioning is aloud; of course it is but if you put your oppinion out there then you must be prepared for people to find it to be wrong/disagree with you, that's how the internet works.

And I do disagree with you, I find some of your replies on this thread to be overly harsh for no reason. Just because an app is made for the blind doesn't mean we should crap all over the work if it doesn't meat our standards and before you say it, no that's not me being soft, it's me being tired of these harsh attacks towards devs with no consequences.

As for voices costing money, sure they do, but i'm sure the dev knows this and doesn't mind paying for them as it's a fun product for them.

If you really don't want to use the voices, pay for the product and turn them off, although i'm quite sure the free version has that now.

If you want to see this as me jumping to the devs defence; so be it, they're making an app that they enjoy and honestly? they didn't need to go as far as they have, it has tonnes of language moddels and even though I don't use it any more; i'm very impressed with that fact.

I just remembered something, the devs first release didn't have voiceover reading support, we got on here and asked for it and not only did we get it but buttons were labeled, language moddels were added and more, so yeah, i'm jumping to this persons defence because this is a damn good app with amazing response times from the dev.

@lotty you can think what you like about me, that doesn't bother me anymore, i'm a big boy and have my big boy pants on, but no, I won't stand for this crap, the dev is responsive and is very agreeable to changes and damn it, that's to be commended.

By privatetai on Friday, May 3, 2024 - 09:12

So with all this discussion regarding the included voices, some say they're really nice, some don't think they are all that useful... Earlier, I came across a youtube video with a guy reviewing all the AI chatbott APPs out there...and during the video, he mentioned how lousy the voices are on some of these, and the idea just popped into my head: What if this is not just a photo description APP? What if you can chat with the AI with its super nice voices. I immediately went to the APP and tried to chat with the AI. Unfortunately, it doesn't seem to want to talk about anything not related to the photo- and won't talk at all without a photo. And of course, the response speed is too slow and draggy for it to be a good chat AI, but I can totally see this being a great function if the response time is quick, and you can chat to the AI without describing photos.
Now I don't know anything about how complicated it is to turn a Photo description APP into a chatbott APP, but I figure the voices are there, personalities are there, the AIs are there, so it may be quite doable.
BTW, there's a chatbott APP, called Call Annie, that can chat, do scenarios, Role Play, and even describe photos. The voices it uses suck though :)

By Ollie on Friday, May 3, 2024 - 09:12

I'd sugest Pi AI if you want something that is more conversational. It doesn't do images as yet, but it's built to have deeper conversations than something like chat GPT which will just get the info for you in a chatty way. I don't know what PI uses for its voice, but it's very good most of the time, it's when she/he laughs that things get a bit weird.

By Martijn - Spar… on Friday, May 3, 2024 - 09:12

Thank you Brad for your support despite not even using the app. I enjoy developing these apps as the feedback from the community is great and inspiring.
Lottie, the app is set up like this at the moment and I don't want to change too much, I realise it is not ideal but I have to balance the income and costs as I am not sponsored by any outside party. like some of the other solutions are. It is what it is.
PrivateAI, thanks, I created chatbots before. My app Voice Answer was actually a hit at the time when Apple launched Siri. But I don't want to mix chatting in PiccyBot, instead remain focused on image recognition and perfect that as much as possible.
I have just added one more model to the model list of PiccyBot, Llama3. It is not the best model at the moment, but the special thing about it is that I am running it on my own machines, so there is no dependence on third parties for the image processing. This means full control to improve it specifically for low vision purposes and costs will be minimal. Try it out when you can please.

By Martijn - Spar… on Friday, May 10, 2024 - 09:12

Laboheme, this local Llama3 currently runs on a single pc with 32Gb ram and a RTX 3060. It uses the smallest Llama3 model. Next step would be to run a larger model on a more powerful machine. But I agree, it's a very encouraging start.

By Brad on Friday, May 10, 2024 - 09:12

I too would recommend pi ai if you want to chat, it's a bit hit and miss but the british male voice is quite nice in my oppinion, you can preview each voice too and it's free.

The accessibility of the app could do with a bit of work and as far as I know; there's no real way to delete your chats apart from deleting your account so think about that if you like to remove chats like i would. It's not even because of anything sexual or anything like that, i'd just like a bit more control.

By Martijn - Spar… on Friday, May 10, 2024 - 09:12

Hi guys,

I have added OpenAI's new GPT4o model to the list of AI model options to work with. For now, it is only accessible to signed up users. If it works well I'll use it as the default model in PiccyBot later this week.
To get the best impression of GPT4o's improved speed, set the voice to 'None' and put personality off.

I expect to make more use of the new OpenAI features and integrate them into PiccyBot the coming days.

Good luck trying it out!

By Gokul on Friday, May 17, 2024 - 09:12

I think the spead has improved somewat. But the groundbreaking thing would be to integrate the incredible things which the Open AI guys were demoing yesterday. Otherwise, the quality of the describtions are more or less the same, and honestly I think they are as good as they can get.

By Martin on Friday, May 17, 2024 - 09:12

Dear developer, I don't usually have the time and energy to come back and check this but I do eventually. I have a few questions for you.
Whatever happened to Claude 3 opus? Is it coming back? Which AI model gives the most accurate description of photos? Does it depend on which voice I choose to get accurate descriptions? Does it only depend on which model I choose to get the most accurate description? Are you going to implement any of the new open AI development into this app such as live image description? I would really enjoy that. I hope this app does not become obsolete when all of that gets implemented into ChatGPT and when Be My Eyes implements it into their app. I see that you've added the image processing of GBT 4O. I appreciate your response time and the quick development of everything you've done for this app so far. It's awesome! This is possibly one of the best apps I've dealt with over the last couple of months due to all the amazing development you've done. I'm happy that you've implemented so many tools from the ideas of the paid supporters... it's very fun to you to help us. I'm so grateful for you. The main reason why I support this app is, I like the fact that I can choose the different voices, the personality if I wanted that since that makes me laugh and also the choice I have of AI models. I enjoy getting all of those different models to describe my different types of photos then save them to my albums with descriptions. I was once fully sighted so photos do mean something to me. ☺️

By Martijn - Spar… on Friday, May 17, 2024 - 09:12

Thanks a lot for supporting PiccyBot! Due to changes at Anthropic, I have taken Opus off for now, but will add it back as soon as I can. Or maybe replace it with a new model by them, things keep moving. I did add Gemini Flash. I feel the current Gemini models are underappreciated.
I definitely aim to implement any new development by OpenAI into PiccyBot as soon as it becomes available.
You are right, first ChatGPT and then Be My Eyes will most probably be the first to include OpenAI models into the apps. However, as an independent developer I hope to benefit from new developments by Google, Meta and Anthropic as well, and pick and choose elements from the best.
I am now trying to add a video into PiccyBot. No promises on speed but it is the way forward so let's start with it.
I created a quick demo of loading PiccyBot on Vuzix smartglasses and have it describe the environment while I walk into my office: https://youtu.be/o9QeVxnkvzE
Getting this to work smooth will be the challenge, but with the current AI developments it could be weeks rather than months.
Once again, thanks a lot for supporting PiccyBot! Exciting times ahead for sure!

By Ollie on Friday, May 17, 2024 - 09:12

Very cool in principle. The delight at the AI being able to see the world is rather charming. Oo, look a door!

Good work. Will be interested to see how this works with various other AIs.

By Martin on Friday, May 17, 2024 - 09:12

Very cool development Martijn!
The way you have engineered that into smart glasses is amazing. You should definitely be proud of that.
I'm looking forward to seeing what develops with this app when open AI liberates that out into the public. In the meantime, thank you for your response, and all this cool stuff you're doing... I'm certainly looking forward to the progression of your AI models and hopefully as soon as possible. Thank you for the response...God speed & blessings to you.

By Martin on Friday, May 17, 2024 - 09:12

Are those smart glasses available to the general public and if so, how do we get a pair? I hope they're not very expensive but maybe they have a subscription model possibly. I'm interested in any info about these!
You're welcome for my support.
thank you.

By Martin on Friday, May 17, 2024 - 09:12

I always get excited about these inventions but they market it to a community who is not able to afford it which is sick to me. It makes absolutely no sense how they make these cool inventions for our community but we can't afford them! I heard about the Seleste smart glasses too and that was more affordable for me but sending them this money up front Like that then waiting for several weeks to get it seems outrageous for me. I understand that this is a small Neach community but they certainly need to work with us by understanding our finances and make things more accessible and affordable for us to use.
I know this was off-topic however, I like that these things are available but I don't like that I can't easily get them.
Oh well. Thanks for putting that info out there.

By Martijn - Spar… on Friday, May 17, 2024 - 09:12

I agree to aim for the most practical and economical solution. I think at the moment that is simply developing PiccyBot further as an app. Everyone already has a phone, so let's use that. Or otherwise use a device like smartglasses linked to that phone. The glasses I used here are standalone and that has its use for institutions where it can be shared between users easily, but for individual use not as much.
So I believe apps and your phone are not dead yet. I will keep the focus on PiccyBot. Glasses do catch quite some attention though ;-)

By Ollie on Friday, May 17, 2024 - 09:12

The trouble we have is economy of scale. We just need more blind people in the world and it will bring the costs down.

I really don't think there is a conspiracy here. It's rare that businesses use premium pricing to drive up exclusivity, instead they make it as cheap as they can because they want to sell multiple units. Fact is, if it is a low yield product the manufacturing costs per unit are sky high. Personally I find such things as tooling, production lines etc really interesting because I'm an utter nerd.

Here, this app is great on phone because phones are mass produced and affordable. I think there will be the same with smart glasses, google have some in the works, hail the return of google glasses, but I don't doubt apple will jump in in a couple of years time, or maybe that will be their end game glasses with display etc.

this is what is so exciting about the modern age, mass consumption devices that can be used for accessible applications. No longer do we need a bunch of different devices doing different things and costing an arm and a leg, now we have our phones which, for me at least, covers about 80 % of my IT interaction, the rest being on my mac and apple watch and then iPad mini when I'm watching something or just want a break from the head space of work or distraction.

Anyway, went on a bit of a rant there.

By Brad on Friday, May 17, 2024 - 09:12

at least with those you don't have to take out your phone eatch time.

I'd not mind paying a monthly fee, but to own the glasses not to keep paying for them.

I'd not mind donating to surver costs but i do want to buy the glasses outright if this becomes an option.

i'm really not a fan of the subscription moddle way of things.

The Ai sounded scared lol, it's like, um, there's a door, please don't end my existence!

By Ollie on Friday, May 17, 2024 - 09:12

The A.R. glasses will be fine, it's just one aspect we'll not be able to use. There are already aspects of the meta glasses which aren't really useful to us, the image part of videos, for example. No doubt there will be other improvements in the V3 meta glasses too, better sound,, so I'm looking forward to them.

By Ollie on Friday, May 17, 2024 - 09:12

True, though I think that, like phones, there will be many ways to use these devices. I guess we just won't know until they come out. Until then we've got the rather average experience of the meta glasses which we're also fringe users for. We can use them, certainly, but the experience is far from optimised for us. It will be interesting to see if Meta does start catering to us. The current glasses might be a good indicator of things to come. If not meta though, there will be others that will cater for us, google, apple, various other open source projects.

By Winter Roses on Friday, May 17, 2024 - 09:12

If you're able to get the app to describe videos, then, can you please post a demo? Thank you for all the hard work that you're doing. I truly appreciate it, and customer feedback is amazing. Keep up the good work

By Brad on Friday, May 17, 2024 - 09:12

It sounds great on paper but, let's say you have a 20 minute game play video, will you be able to run it through an AI and then after it's done processing, listen to it with the audio or or will it be on the fly.

Also, this AI isn't trained on audio description data so I don't know how useful it will be.

By Ollie on Friday, May 17, 2024 - 09:12

It's done on the fly though others over on the open AI reddit, as toxic as it sounds, did point out the credit limit for even paying customers. I don't think, currently, we can just leave it running forever, unless the appp provider pays through the Api. I'm not sure if there is going to be a deal with be my eyes, so few people are blind it probably wouldn't overload things and we'd not want our visual guide dropping out at an awkward moment.

By Winter Roses on Friday, May 17, 2024 - 09:12

Realistically, I wasn't necessarily thinking of videos that are 20 minutes long, more like less than five minutes, like music videos, or short presentations.

By Gokul on Friday, May 17, 2024 - 09:12

@the dev: very interesting that you're trying the live video thing out. question: are we able to ask specific realtime questions and get the answers on the fly? for example: in your demo, can I ask if the chairs are empty or occupied? Or, where is a given door wrt my position etc. And, will it be able to guide me towards an empty chair/doorhandle on being prompted?

By Martin on Friday, May 17, 2024 - 09:12

Martijn, I appreciate the forward thoughts on this project.
It would also be appreciative if these companies would actually pay it forward by given some donations to these users who can't afford the smart glasses. I am sure they're making enough profit by the prices of them so why not try to make the company look better by offering Some unfortunate individuals a pair or 2 several times a year. People like me with so appreciate that.
In the meantime, yes smart phones do a good job as well and they are quite expensive too.
Some of us have payment plans for our smart phones so we can have them. *wink

Looking forward for your next update on PiccyBot.
By the way, I was wondering if you could remove that one screen where you get to pick a photo from your albums and you get to make adjustments to it because I just skip past that part I don't know why it's there but it certainly would not be missed.
I am legally blind and I don't think I would care if the photo needs to be rotated or not but that's just me.

By Ollie on Friday, May 17, 2024 - 09:12

Ha, okay, yes, phones are very expensive, I agree, but there is enough of a range of them and options of price plans that they are affordable for most, more than that, they are essential for a variety of reasons. A single use product that costs several times the amount is just out of most of our budgets.

I also agree, the crop and rotate screen is more of an annoyance than anything useful, though I notice if I add a picture via share sheet from whatsapp, for example, it doesn't seem to appear. Also, still request the ability to give custom instructions to the ai as a default for description.

By Eglė on Friday, May 24, 2024 - 09:12

Hey there, I really like this app, it's more detail descriptions with voice feetback than Be my eyes, but it's buggy. sometimes after send picture to recognise it makes another picture, and my phone is away from thing I captured. Sometimes I need press few times share audio, and this button suppose to be on main app window, not in settings, I should have ability to choose what I want to share in the main app window.. I understand this is thirst version, and hopefully buggs will be fixed.

By Martijn - Spar… on Friday, May 24, 2024 - 09:12

Hi guys, I have added video processing to PiccyBot. It's still a work in progress, the sharing of audio doesn't always work, the quality of the descriptions is not the best yet and the speed has to improve still. But since quite a few of you had mentioned you were looking forward to this, I decided to release this version.
Now there are two buttons, video and camera, and you get the option to take a direct video or photo, or select a video or photo from your image library.
The upload of the video takes the most time, depending on your network. After that you can go to settings (if subscribed) and switch voices, personality, language or ask specific questions about the video, which will be faster.
Have fun playing with it and let me know what your impressions are? I'll take along the feedback from you guys as always.

By Brad on Friday, May 24, 2024 - 09:12

personally I'll wait till these appps get live feeds but again I have to commend you on getting features out so fast, this app is so cheep for what you get guys.

Just cause this video part isn't for me, just yet, does not mean by any means that it's bad, I've not tested it and honestly if there was a donate button, I'd gladly donate.

I've never been so impressed with an app that isn't for me every single time an update comes out, well done dev, well done!

By Gokul on Friday, May 24, 2024 - 09:12

First off, thanks for the wonderful work! Going to try it!
Btw, does video processing here mean live processing? or is it that I have to take a video and upload it? Also do I have to select any particular model for this?