New AI app for describing images and video: PiccyBot

By Martijn - Spar…, 1 March, 2024

Forum
iOS and iPadOS

Hello guys,

I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.

I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!

The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317

I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!

Thanks and best regards,

Martijn van der Spek

Options

Comments

By Brad on Sunday, February 25, 2024 - 18:52

I'll check out the app but we do already have one that is powered by chat gpt4, it's BeMyEyes AI feature.

By Brad on Sunday, February 25, 2024 - 18:52

First, BeMyAI gives you what a subscription to this app would: detailed descriptions.

Second: the voice is nice, it's Open AIs chat gpt4 mail voice, you can unlock more voices with a $2 monthly subscription.

Thirdly, you can't share images with the app like you can with BeMyEyes.

Over all; this app doesn't do anything that's new, so I'd recommend sticking with BeMyEyes.

The only thing I actually like a lot more over BeMyEyes is the app uses the chat gpt4 voices, I wish BeMyEyes did that.

By Martijn - Spar… on Sunday, February 25, 2024 - 18:52

Hi Brad, thanks for the thoughts on my app. It's a first version so all feedback is very welcome!

I can add the share image and/or share description with the next update the coming week. It was in my earlier app, Phoice, which I launched a few months ago. But the idea of Piccybot is for you to ask details about the image, have a conversation with the Bot about it. Phoice just gave a single description.

I am a bit rusty in developing for the community, my app Talking Goggles is now ten years old. But it is very rewarding to do and I aim to further improve PiccyBot fast.

Cheers,
Martijn

By Erick on Sunday, February 25, 2024 - 18:52

I use Be my AI a lot, so I don't really see any reason to use this app but I will check it out just in case.

By Malcolm13 on Sunday, February 25, 2024 - 18:52

One thing that would be useful is if it was possible to review the description with VoiceOver. Having the descriptions read out is fine but it cannot be reviewed.

By Brennen on Sunday, March 3, 2024 - 18:52

It would be nice if there is a way that you could be guided into taking a picture. For example, if you’re going to take a picture or something and have it described, it would be helpful if there was some kind of a sound or voice telling you if you were on the right track and aiming the camera or not or if there is enough light or not to get a good picture in my opinion, that would make this app stand out over Be My Eyes for sure

By Gokul on Sunday, March 3, 2024 - 18:52

one handy feature which'd be nice to have would be the ability to save and then identify people. Say, if I'm loging in from an account and if I'm able to save the images of people with specific names, say from my family/work place etc, and whenever next a picture is clicked with those people in the frame, the app should be able to identify the people and automatically use the names in the describtion.

By OldBear on Sunday, March 3, 2024 - 18:52

Lottie wrote, "I like posting Alt Text on Mastodon and forgetting to attach the image!"
Oh, that's good... Making the sighted see with their imagination, not their eyes.

By Daniel Angus M… on Sunday, March 3, 2024 - 18:52

when I herd there is a subscription, I was interested to see what it offers from the free tear. some apps have a lifetime option, which this app thankfully does. the voice choice is the main selling point, it seems, with some other settings I haven't played around with. nice app so far.

By Martijn - Spar… on Sunday, March 3, 2024 - 18:52

Thank you for the suggestions, looking at them closely! Regarding the copying and reviewing of descriptions, is the share feature not sufficient for that? You can share the text to a file or any other app and process it there? What can I do to improve on that?

By Malcolm13 on Sunday, March 3, 2024 - 18:52

Hello,

Once the descriptions are retrieved it would be useful if one could review the results with VoiceOver sharing to another app or file is a clumsy way of accessing the results. Other apps let you review their results and this is useful. Thanks for the app and keep up the good work.

By HarmonicaPlayer on Sunday, March 3, 2024 - 18:52

i was a talking goggles user. i found that the apps mentioned above i used occationally as well and might give your new app a spin sometime

By IPhoneski on Sunday, March 3, 2024 - 18:52

“I have one remark - the app provides descriptions in English. I’m familiar with it, but the description should be given in the system’s set language.”

By Michael on Sunday, March 3, 2024 - 18:52

I just purchased a lifetime subscription to the app even before trying it out because I want to support the developer.
I had one question. Most if not all my photos I'd be getting described are all in my camera roll and I noticed piccybot is not available in the share sheet to have those photos described. Can we get this added on please?

By Missy Hoppe on Sunday, March 3, 2024 - 18:52

I've just played around with this app a little. I like that the voices it uses are unique, but would like it if we could hear samples of each voice while in settings. Also, the ability to change speech rate doesn't seem to be working as expected, or, if it is, I couldn't figure it out. I had a bit of a tough time trying to ask follow-up questions, but I think that's more of a server problem than any fault of this app. In any case, I think it's a great start, and I always like to welcome new vision assistance apps to my arsenal.

By Michael on Sunday, March 3, 2024 - 18:52

I am finding the follow up question and answer very inconsistent. For instance, I asked it what the hair of the individual in the photo was. rather than telling me the hair color, it kept repeating the previous description.
In addition, I do not like that one has to hit the return key to send the message. We need a dedicated send button similar to the one found in be my AI.
Finally, the descriptions provided are not showing up when you swipe through. If you missed the description spoken aloud, you can review it as there is no text. I would like this to be made available as well.

By Martijn - Spar… on Sunday, March 3, 2024 - 18:52

Hi guys,

Based on the feedback you have given me, I have updated the app. It is available in the Apple app store now: https://apps.apple.com/us/app/piccybot/id6476859317

The main change is that it will now display the text description as well as speak it out, so it can be read again using VoiceOver.

I have also added a preview sample of the voices, as was requested by you.

The duration of the output can now be selected in steps instead of a slider, so it works properly using VoiceOver.

I have also added a new AI model to the app, Claude 3. This model has only been launched last week and is on par with GPT4. So now you have a choice of AI models: GPT4, Google Gemini or Claude 3. I don't believe any app offers this as yet. Each model has their advantages. GPT4 is more witty, Gemini can recognize people best, and Claude 3 can be very descriptive.

I really appreciate the feedback I have been getting from you guys, please continue so together we can create a great vision app!

Cheers,
Martijn

By Malcolm13 on Sunday, March 3, 2024 - 18:52

Thanks Martijn for implementing our suggestions I’m sure there will be more in the future, keep up the good work.

By Martijn - Spar… on Wednesday, March 13, 2024 - 18:52

Based on the feedback from Brad and Michael, I have implemented the share sheet for the app. So now you can go to your camera roll and share any picture directly to PiccyBot to describe it for you.

The update is available now in the Apple app store.

By Emre TEO on Wednesday, March 13, 2024 - 18:52

Artificial intelligence supports many languages for description, but the application gives English description by default for each query, this setting should be edited from within the application as in chat gpt. In addition, support for a siri shortcut can be quite useful.

By Brad on Wednesday, March 13, 2024 - 18:52

I already have an ap for me if I need images described but I do agree. People should be able to select their language at settup or in setttings.

By Martijn - Spar… on Wednesday, March 13, 2024 - 18:52

I'll see what I can do about the language setup. But note that if you ask a question about the image in a particular language (I tried Dutch and French), PiccyBot will answer your question in that language. And at least my native Dutch is sounding very good actually.

This only works if you have selected OpenAI or Claude 3 as model though, Google Gemini defaults to English.

By Michael on Wednesday, March 13, 2024 - 18:52

Firstly, I want to take the time to thank the developer for implementing the feedback in such a timely fashion.
I wanted to inquire about the quality of the descriptions. When I have chat gpt enabled, I find that the quality of descriptions is still different than be my eye. Can the developer explain why might this be the case? is it a different llmversion?
Secondly, can you have an option to disable text to speech with descriptions? Now that we have text descriptions, I am more than happy just reading the text using voiceover.
Lastly, a dedicated send button would be nice as I am finding the search button in the text box unreliable. For instance, when I tried it this morning, I found that double tapping on the send or return didn't actually send the message.

By Laurent Cadet … on Wednesday, March 13, 2024 - 18:52

Thanks Martijn for developing this app. While I believe that Be My AI from Be my Eyes certainly does the job, Piccybot has a more premium feel/sound to it. It feels like its still a young app, with BeMyAI having a smoother feel to it at the moment, but you appear to be a very responsive developer, so I'm confident that wrinkles will be ironed out very soon and so bought the lifetime subscription with no hesitation. For only $14, its great value and I'd encourage everybody to do the same if you can.

Also, the app has a 1 star rating on the appstore. It certainly deserves more than that, so please show the dev some love and rate.

By Brad on Wednesday, March 13, 2024 - 18:52

I'm going to buy a lifetime subscription to support the dev.

I love the voice that's used, if I'm correct, it's the chat gpt4 voice isn't it? 11labs is amazing but I wish it had a voice like that.

I'd honestly love it to be a screen reader option.

By Brad on Wednesday, March 13, 2024 - 18:52

I hope it helps you guys who use the app.

I'd love for these voices to come to our screen readers one day, I can only hope.

By Martijn - Spar… on Wednesday, March 13, 2024 - 18:52

Thanks for the amazing feedback and support, Brad, Laurent and Michael!

The latest update covers the point Michael raised, by offering a 'no voice' option. The performance of the app will be better, and you can use your own VoiceOver on the text.
I still feel the OpenAI voices (correct, Brad) are nice and keeping those in. 11Labs is pricey but can see what I can do there.
Michael, regarding the difference between BeMyAI and PiccyBot when using GPT4, I am not sure. It could be the initial question sent to OpenAI is simply different.

The other thing the update handles is localisation of the result. PiccyBot will use the default language set in your iPhone/iPad to present the text and voice in that language. In a future update I will make the language selection fully free within the app itself.

The update gives some minor adjustments in the interface. It's now faster to switch between AI models and compare their output. Also, you can swipe between image and text. I get that swiping is not low vision friendly, will look into that further.

Let me know what you guys think?

By Laurent Cadet … on Wednesday, March 13, 2024 - 18:52

Hi!
I've found that the default OpenAI responses tend to end abruptly, i.e. starting to read something and then ending mid-thought, like saying that there is a list, and ending by saying "1". The Claud responses on the other hand are often brilliant, but take longer to generate. Has anybody else noticed this?
Martijn, anything you know of that might be making the OpenAI responses cut off?

By Martijn - Spar… on Wednesday, March 13, 2024 - 18:52

Thanks for letting me know Laurent. The duration setting determines the number of tokens used for the response. The 100% should give a lengthy response from GPT4 but I see what you mean about the incompleteness. Will look into it. In general, Claude 3 can give very elaborate responses though. Which can be great but takes time. So depending on your need you can choose a model and a duration.

LaBoheme: The default setting for the response is 40%, with a fixed voice and model. After upgrading you can adjust the response length and set different voices and models.

By LaBoheme on Wednesday, March 13, 2024 - 18:52

after subscribing, it is now very generous, no longer stingy.
if you can add an option to prevent screen lock, that would be even better. when the analysis takes a longer time or the answer is a bit longer, sometimes the screen locks due to inactivity. either add a prevent lock option or make the system thinks the user is actively touching the screen.

By Michael on Wednesday, March 13, 2024 - 18:52

I have my AI engine set to claude3, and getting a 404 when I try having a photo described.

By Martijn - Spar… on Wednesday, March 13, 2024 - 18:52

Michael, thanks for the heads up. There was an issue with the server routing the requests, it should all be working again.

LaBoheme, will look into avoiding screen lock. You should always remain in control, agreed.

By LaBoheme on Wednesday, March 13, 2024 - 18:52

when taking a picture. the app always assumes the user is holding the device in the upright position, causing the ai engine to incorrectly interpret the picture. although there are ways to adjust the picture before submitting, it's not always easy for people who can't see the picture in the first place. plus, landscape mode is preferable in some situations.

it would be very helpful it the app takes pictures based on how the user orient the device, portrait or landscape.

By Michael on Wednesday, March 13, 2024 - 18:52

I just wanted to let people know that claud3 appears to be significantly less accurate than google or chat gpt. I had a photo described of two females but the photo was described as two little boys playing soccer.
this is by no means a hit against the amazing developer but merely a serious drawback for using claud3. I will probably stick with chat gpt or google.

By LaBoheme on Wednesday, March 13, 2024 - 18:52

in earlier message, you wrote "The update gives some minor adjustments in the interface. It's now faster to switch between AI models"

the only way i can figure to change ai model is to go into setting, is that what you mean?

well, that's easy enough, but maybe there is an easier way. how about a slider on the main screen to switch between models? also, make it possible to arrange the order of the models.

most people should find out sooner or later which model they prefer. for example, personally claud 3 works best for me, gpt4 next, and gemini last if i feel i need to verify something the other two models give me. so with the preferred ai models properly arranged, i can slide up to resend my query to gpt4, slide down to go between my preferred models...or slide all the way up if i need to verify something.

also, when i clear a follow up question then switch to another ai engine, the follow up question still get sent. here is the scenario

i take a picture and get the following reply:
The picture shows a tote bag with a beach scene printed on it. The bag is resting on a wooden floor. The top of the bag is a deep teal color, which fades into a lighter teal, representing the ocean. There is a white foamy wave in the middle, and the bottom of the bag is a soft pink, like sand. The handles of the bag are gray and appear to be made of a sturdy fabric. In the background, there's a glimpse of a person's foot.

i then ask if the person is wearing any footwear and was told the person is indeed wearing blue sandals.

now i want to compare what other ai model has to say about the bag, so i clear the follow up question from the text field and switch to another module. the new module also tells me the person is wearing blue flipflop. ok, close enough, but i no longer care about the footwear, i want to know more about the bag instead. shouldn't clear the text field also clear the question?

By Martijn - Spar… on Wednesday, March 20, 2024 - 18:52

Hi guys,

The new update in the App Store has the following:

- Orientation: the camera now assumes the orientation of the phone (based on LaBoheme's feedback regarding this)

- Screen lock should now not occur anymore (again from LaBoheme's feedback)

- During processing, a sound effect is played (borrowed this idea from BeMyEyes)

- Each voice now has a personality as well. So responses will be in a different style depending on the voice selected. Note that the text output will still be 'regular' and can be played with VoiceOver. You can disable the voice completely or disable the personality effect. The choice is yours.

Let me know what you think?

By Jokyboy129 on Wednesday, March 20, 2024 - 18:52

It would be cool to get a free trial before subscription

By Winter Roses on Wednesday, March 20, 2024 - 18:52

OK, so I tried this app on my phone, and, except for the ads, it's not too bad. I wonder if there is a specific way to ask for clarification about a picture. There was a text feel, but when I typed in the box, I saw a search button, and it said something like, connecting to server? Yeah, I'm not entirely sure what that was all about. Also, there is a voice that narrates the content of the picture, and I guess it's a little bit sarcastic, and might even be funny, if you're into that kind of thing. Like, one of the descriptions that I got said, "this is a person, and they have blessed us, yes, blessed us, with a side view of their profile. Imagine this, you are lying on a beach in a tropical resort with a bottle of sparkling champagne, like, the most expensive bottle, that money can buy, yeah, my friend, this is what this picture is portraying. They have long, golden brown hair, with blazing brown eyes to match. They are wearing a pink tank top with white studs that flare off at the bottom like a flag swaying in the breeze, and a black, like solid Black, pair of sandals. I wonder where they got it? Right hand propped on hip, left hand holding a glass containing their beverage. They are smiling wide, but not just any smile, no, a grin that could turn your entire day around just by looking at it. Yeah, they are certainly giving Mona Lisa a run for their money in the smiling department, although, that could just be me, because I'm not sure if that's the right comparison here. They have a porcelain complexion, one that could rival any debutant. Can you hear the wind? Can you taste the rhythm of the tropics? Smell the salty air. Listen to the sound of the seagulls. The ocean is calling, singing like a whale. Yep. I'm jealous of you right now, my friend. Can you hear my heart cracking from the pain? Oh, the horror, the horror. Hey, would you mind bringing me back some of those yummy coconut floats? I hear they are literally the bomb, and there you have it folks, your photo, in a sparkling, dazzling, mind blowing, nutshell." I do want to mention that this is paraphrasing since I couldn't exactly type out what they said word for word. That text is not showed on screen, only the basics, meaning the content, of the picture. The voice narrates the details, and the text on screen is a basic summary of what you heard, without the sarcastic elements. I didn't get a chance to play around with the application much, but I don't recall seeing a repeat button, or a reanalyze button. Personally, I prefer to receive my descriptions without the funny or sarcastic elements, especially if I'm in a hurry, but I can certainly appreciate the thought and the details that went into this, instead of making the descriptions cut and dry. It would be nice if we could copy the entire description, or have the output done via text or voice, instead of both. I'm not sure how this application is different from the Be My Eyes app, but I'm only one common user, and there is always room for improvement. Regardless, it's always good to have choices on the market, so, if you're reading this, give it a try for yourself, and see if it's something you'd like. If the developer reads this comment, keep going. My opinion is only one of many, and this application is still in the development stages.

By Martijn - Spar… on Wednesday, March 20, 2024 - 18:52

Thanks for the feedback Winter Roses. PiccyBot now has a personality linked to each voice available. Still tweaking the behavior of it though.
The pro version (monthly subscription or lifetime) removes the ads, gives the choice of voices (or none) and a selection of different AI engines. You can also remove the personality effect completely.
But yes, the free version is limited in that regard. I'll see how the feedback is about PiccyBot's basic responses at the moment and adjust it accordingly.

By Emre TEO on Wednesday, March 20, 2024 - 18:52

Hello, I have come to give you new suggestions.
1- can you add a description option that starts taking and processing photos with a single finger gesture, as in be my eyes? Currently, it requires four-finger movement after opening the application, and this negatively affects practicality in instant captures. 2- The app crashes while the conversation interface is unreliable while using mic in between. 3- The speech speed of the aı sound in the application should be able to be adjusted. 4- When the photo is uploaded through the share menu, it will be useful to give the description through a window without opening the application and to prevent the division of the workflow. 5- Having a conversation history will be very useful.,

By Martijn - Spar… on Wednesday, March 20, 2024 - 18:52

Hi Emre TEO,

Thanks. The microphone bug and the conversation history should be done the coming week. Your other points are also on the list..

By Lee on Wednesday, March 20, 2024 - 18:52

Hi is this not available in the UK? Did a search and all that came up was pizza places weirdly from all around the world but not this app.

By Winter Roses on Wednesday, March 27, 2024 - 18:52

I'm not sure where I stand in regards to this app, but I'll definitely keep checking back on this post for future updates. Either way, I'm anll about having a choice on the market. It's not good to put all your eggs in one basket, as they say. Hey, keep up the good work. Never stop improving and creating.

By Gokul on Wednesday, March 27, 2024 - 18:52

So how does one make the personality tuning work? That sounds interesting for when you want to have some fun.

By Brad on Wednesday, March 27, 2024 - 18:52

For some reason the voices aren't reading out the text.

By Brooke on Wednesday, March 27, 2024 - 18:52

I just bought the lifetime subscription and like how customizable the app is. Thanks for all your hard work!

By mr grieves on Wednesday, March 27, 2024 - 18:52

Almost ignored this but I thought I'd download and it seems good. I bought the lifetime subscription and now have a load of options to play with, which I will do when I get some time.

One option I would love is a way to paste in an image from the clipboard. This would make it easy to get image descriptions from Facebook which doesn't appear to have a way to use the normal share sheet. Is that possible?

By Brooke on Wednesday, March 27, 2024 - 18:52

How do you even add an image to the clipboard, is that possible? I know doing the 3-finger quadruple tap will save last spoken text...