Hello guys,
I have created the free app PiccyBot that speaks out the description of the photo/image you give it. And you can then ask detailed questions about it.
I have adjusted the app to make it as low vision friendly as I could, but I would love to receive feedback on how to improve it further!
The App Store link can be found here:
https://apps.apple.com/us/app/piccybot/id6476859317
I am really hoping it will be of use to some. I have earlier created the app 'Talking Goggles' which was well received by the low vision community, but PiccyBot is a lot more powerful and hopefully useful!
Thanks and best regards,
Martijn van der Spek
Comments
@the dev
I can't seam to play the audio atached to my mail when sharing with piccybot, is that a bug or just a me thing?
I'm using vlc to play it if that helps.
The AI moddles.
I think that a little description for each moddle would be nice, why would I want to use moddle x over y?
If that's already in the question mark thing on the top left, at least that's how it is for a blind user, great! I just thought i'd throw it out there.
The more I listen to AIs right, the more there's a pattern, at the moment I'd not say they right like a human does, I really do wonder where we'll be with that this time next year.
Oh my goodness Brad, that is fantastic isn't it!
As I said, I don't use the app. That description was totally amazing!
It was!
Now, it's not for me, but once they manage to sink the description with the audio/visuals of the video, if that's possible, then it will be great and if they make an addon for firefox, I'll pay for it straight away.
If possible I'd like more voices but that can be added later if at all.
Please focus on the video side of things and syncing audio and video together if you can.
@brad
I love the amount of description you get, I think it's great. I want a picture painted in my head and the more info it gives, the better IMO. I was also born blind so I think this is a very subjective thing.
As for an audio description like experience, I think that'd be crazy difficult to immplilment. You'd have to have it understand the contexts of the video to put description in the right places somehow and that's going to require a lot of work on Martijn's end, if it's even possible at all.
Love the idea of a donate button on the app as well.
@Brad So what do you use the app for?
Thank you, but what do you use the app for then if not descriptions? Maybe I'm thick here, but I'm not getting what the benefit of this app is? Just trying to understand.
Subjectivity.
I guess the amount of details that one prefers in a description is entirely a subjective thing; I am born blind, and prefer rich, detailed descriptions with a detailing of the colors, tone etc almost 70% of the times. It's only when dealing with work-related data or something that I prefer a concise approach. But maybe a setting to determine what kind of description one needs might be nice.
Born blind here as well
I was born blind and I am loving the detailed descriptions this and every app gives. I think it is a person to person preference. I am going to test this app on youtube on some shorts, this is a great app!
Oh it totally is a preference thing.
@gaybearUK, I don't really use the app.
I try it, find a feature to be neet and then delete it.
@inforover oh it would be really hard to do,, I don't think we're there yet, but I do think we'll get there one day and it won't be far off.
Sharing from dropbox broken?
After updating to the latest version, now when I try to share a photo from dropbox, no matter via "share" or "export" piccy shows "fetching data" and then "please wait" and then sits there forever doing nothing like it's frozen.
Add Indian regional languages for recognising images
Hello developer, this is Kaushik from India. Your apps. Accuracy is very good, but we need this app to recognise Indian regional languages so that we can use seamlessly and try to bring in the feature to read PDFs and other book formats with the best affordable rate for everybody now recently in India, iPhone purchasing has been increased by our visually impaired community. Do consider this. Thank you.
Dropbox and regional languages
Kaushik, at the moment PiccyBot supports the Indian languages Hindi, Bengali, Gujarati, Haryanvi, Marathi, Punjabi and Sindhi. Will add further languages in due time, when usage from those regions goes up.
Privateai, I released an update today that should fix the dropbox and whatsapp sharing issue. Hope it all works ok again!
Regional languages
@Martijn. do consider adding a couple of South Indian languages if you are at it because I know for a fact that there are a reasonable number of users from these parts to make the effort worthwhile.
Sharing images from email
Sorry if I've missed this, but is sharing an image from email broken?
I'm using ios 18.0.1 and I share to the pixies and it makes the waiting noise but never seems to get past it. I ended up using Be My AI which worked OK but the descriptions weren't as verbose as I get with this app.
(By the way, I hope the dev doesn't find it annoying that I always refer to this app as the pixies. By the time I realised it wasn't called Pixie Bot it was already cemented in my brain as that. And I kinda like that it has a pet name. And given my username I can say it's definitely not meant as demeaning or anything like that.)
It's like having your very own Dev Shop
The way this app is improved, almost on request, is so very impressive!
It really is.
If we get that donate button i'll use it.
I can't imagine where this will be this time next year, or the amazing stuf to come out by then, I can't wait!
re: Dropbox and regional languages
Thanks for fixing it :) I store most of my photos in dropbox so I was sad not being able to use it :)
Thoughts
I want to start by saying I really love this app and the fact that it can describe videos. It’s an incredible feature, and I find it super useful. However, I do have some feedback that I think could make it even better.
Right now, the limitation is that the app can only describe 60 seconds of video, which is about one minute. I understand the challenges behind processing videos for descriptions, especially when the app needs to download and handle the video on the device. However, I wonder if there could be a way to work around this. For example, what if we could watch videos directly from platforms like YouTube, and somehow screen-share or sync it with the app to receive real-time descriptions?
As a blind person, I really appreciate being told what’s happening in a video, but it’s hard to know how frequently scenes are changing or what exactly is going on during more dynamic content. It would be great to have a way to know the timestamps for when events happen and how often things change from one scene to the next.
Another issue I’ve come across is that, to my knowledge, we currently can’t mute a video on YouTube and still have it described. I think it would be incredibly helpful if we could mute the original audio on videos, particularly for things like music videos, and have the app provide the description instead. This way, I could choose to listen to the video’s audio when I want, but also have the option to mute it and have the app describe the visual content for me.
I hope this feedback is helpful and that it’s something that can be looked into in the future. Thanks so much for all the hard work that’s gone into this app—I really appreciate it.
Email pt 2
I tried opening up the same image from my email later on and it worked fine, so must have been just a temporary glitch. Sorry, was a bit trigger happy with my post yesterday.
Increased video duration and Reddit support
Thanks for the feedback guys! Winter Roses, I will increase the length of the video that can be processed. Already did it for the Android version, the next iOS release will have at least double the duration for pro users.
I have also added support for the sharing of Reddit videos, there were quite a few requests for that. If any of you have any suggestions for more specific video sources that will be helpful to describe, let me know..
Text length in chat
Not sure if it's set this way, but when I use the ask more feature, the responses are rather short. I prefer this app's long and detailed descriptions, and would like it if we can get similar length on chat- according to our setting preferences. Also, sometimes on long descriptions the text cuts out before getting to the end. Is there a way to put in a "continue" button or something to prompt it to finish from where it left off? From experience using chat AIs I know often all you have to do is type "continue" into the prompt, but when I did that pixxy simply re-analyse the photo and generated a new description rather than continuing the previous thought.
I had a halarious moment…
I had a halarious moment with this app. I took a selfee and I do have a lot of skin tags and it said "That man must be in a lot of pain and discomfort with those skin liesions lol.
Bug: severe truncation on all answers in the chat interface
No matter what I ask in the chat interface about an image or video, all answers get severely truncated. This seems model-independent as it is the same with Claude 3.5 Sonnet on images and also for videos which use another model. Answers are truncated to nearly the same length for images (about 40-50 chars), and somewhat to a longer length for videos, but for the latter case truncation is also very severe.
Instructing to continue doesn't help at all. In that case the initial description is reiterated, but also truncated severely. So I don't think at all that truncation occurs at the model level, but instead it happens somewhere between the model and the displaying of the answer. What I get as an answer shows that it would be completely coherent and appropriate hadn't it been truncated badly.
I set the length parameter in Settings to 100 %. As I am a lifetime subscriber I have access to that screen and I could adjust that.
I have the latest version (2.4). I use piccybot in Hungarian, however I seriously doubt this has any significance regarding thhis truncation phenomenon.
Unfortunately this bug renders the chat interface (invoked through the "Ask more" button at the bottom of the main screen) practically unusable. Otherwise I love the app very much!!!
Updated app with chat interface fix
PrivateAI, Laszio, thanks for the feedback! An app update is available that should fix the chat responses. They should be medium length and take into account the information already given. Hope this works ok, let me know what you think?
re: Updated app with chat interface fix
Work's great so far! Thanks for the fix! I want to take this chance to request if we can adjust the volume of the AI in the APP itself. Currently, my default voiceover is way louder than the AI, so when I have my normal volume, I can hear my voiceover but can't hear the AI, then I turn volume way up on my phone, now I can hear AI but everything else is way too loud LOL.
Voiceover volume
If you add Voiceover volume to the rotor settings you can change the volume of voiceover relative to the overall volume on the phone. That also works on the Apple watch.
--Pete
2.5 update
I got the 2.5 update with the chat interface fix. Although I have it since only about an hour, I managed to test it in English and Hungarian, both with Mistral Pixtral and the video description model. I can report that I am satisfied, I've seen no truncation in the chat answers, they seem to comethrough fully and yes, they are to the point. I hope it stays so, and thanks for the quick fix!!!
I've experienced only one strange thing with 2.5 update. There is that edit box one or two right flicks away from the top of the main screen. I call it the prompt edit box, as it contains the instruction that mainly guides the image/video description process, and so that classifies as a prompt in AI terminology. Before this 2.5 update, if piccybot was set to Hungarian, by default the prompt edit box contained "Mi van ezen a képen?" (what's in this picture?) for images and "Mi van ebben a videóban?" (what's in this video?) for videos. Now by default this prompt edit box seems to read "Kérdezz a piccybottól a kérdéseddel" (literally ask Piccybot with your question - a sentence that definitely sounds clumsy in Hungarian), which is not appropriate for a Hungarian prompt text. Nevertheless descriptions seem to work okay this way too - so far. Piccybot is soooo versatile really that I simply haven't gone through all the combinations I use this app on with the new update yet: pictures and videos taken on the fly, pictures and videos from my gallery and also shared from other apps, like mail. So I don't know yet whether this prompt text thing is really a bug or not. Time will tell.
One more thing about Hungarian. Though every part of Piccybot seems to support it quite well, Hungarian cannot be selected from the supported languages list from the settings screen, because it is not listed in the dozens of languages there, nor can it be found with the search edit box on that screen. So I access it with the "phone system language" setting, and it works this way. This is only a very minor nuisance mostly, that can easily be fixed in one of the future versions with other bugfixes.
All in all with the chat interface fix seemingly in place, Piccybot is really a bright gem in my"vision toolbox" on my phone: many models, many languages, extremely diverse possibilities. So thanks much!!!
Video descriptions stopped working
Late night on this Tuesday (29 October) video descriptions stopped working abruptly and haven't returned to life since then. After the waiting sound "server error" is displayed where the description should appear. This is independent of language: I tried with several languages and the result is the same. I suspect an API change at the side of the video description model. I ruled out other regular causes of such a disruption: net etc. are all fine.
By the way I noticed accidentally that now Piccybot lets me record a video over one minute (I am a lifetime subscriber). Thanks for that much!
Video disruption
Laszio, there was indeed a server issue on Tuesday, but it should be working ok now. Can you try restarting phone and PiccyBot and try again?
I will be adding backup services for these situations when one provider goes offline.
Success!
Thanks much! Closing Piccybot from app switcher and then starting it again was enough to get it working again. I was quite sure that I had tried that simple remedy before, but it in fact turns out I haven't.
By the way after the app restart video descriptions come through in a drastically different style than before the server error. I know well that each generated text has a bit different style and characteristics, but this time the difference is much more pronounced. The video description is more compact, has a more straightforward style with less details, and I experience much more hallucinations than before and they are quite radical ones indeed. I haven't changed anything in the settings.
Have you somehow changed which model does the video descriptions or what may be going on?
Video update
Laszio, thanks for confirming that this is a workaround for now. I have not made any changes in the setup from my end but on the side of the models things seem to have changed. Working on that the coming days to get it back to a fully stable and reliable setup. My focus has been on getting the realtime voice to work in PiccyBot, but this gets priority now.
Real-time voice
That sounds interesting. I'm not much for talking with tech but in this case I might give it a shot when it is ready.
There are also some new models out now that might be of interest, both a new version of Claude 3.5 Sonet and a model called Molmo that is said to be quite good with images (in addition to llama 3.2 and chatgpt-latest which I mentioned before and that might be implemented already, was a while since I checked).
Realtime voice?
Does it mean what I think it means? No right?
video descriptions
Hi all,
unfortunately I haven't got a short enough video to try this out on, but I checked out this demo on YouTube. For those interested, it's about 10 minutes in. It's a really cool feature.
https://www.youtube.com/watch?v=AGGKaw6V7Y8
2.6 update
After installing the latest update that - among others - aims at improving video processing stability, I once more get those much more detailed, much more accurate and much more useful video descriptions that I had got before the "server error crisis" of 29 October. Thank you much for the fix, I highly praise it!
Could you look into how seeing AI does their descriptions?
They manage to play a bit, describe it, then play the next bit. Honestly if you could do this, or have it as a toggle, you'd have them beat in my oppinion, as you can already describe short youtube videos.
How seeing AI Does it
Could this be a toggle if you do look into doing this? I prefer the way PiccyBot does it rather than the way Seeing AI does it.
Thanks :)
Seeing AI vs the Pixies
I am repeating myself a little from the Seeing AI thread, but the way I see it is that PiccyBot and Seeing AI are providing an entirely different perspective on a video and I really appreciate having both options.
The pixies describe a video like someone reporting back on what happened. It goes into more detail and paints a pretty vivid picture of what is happening.
Seeing AI on the other hand feels like I am watching the video myself but is giving me a lot less detail.
I honestly like having both options available because they both serve very different purposes. You couldn't get the level of details the pixies give you if you were to use the Seeing AI way.
Having said that, a lot of people are basically for asking for Audio Description of the videos so there is clearly an interest in users of PiccyBot. But I'd hate to sacrifice the level of detail I get to achieve it.
I personally am happy switching between two different apps for this as I usually use the share sheet to get videos into PiccyBot and Seeing AI anyway. But if something like this does come to PiccyBot I too would like it to be optional. Actually if it was a toggle in the UI that appeared in the main interface when I was watching a video that would be even better so I cold quickly listen to one format or the other.
I think I'll take back what I said.
I keep thinking of audio description then realising that we're not there yet.
I don't use this app so will let those that do right more about it and will stop. I don't need this app so shouldn't ask for things if i'm not going to use it.
I've tried the video feature and while it's not for me, I can see that a lot of work went into it, perhaps one day in the future we'll have an app, perhaps this one, who knows? That can be trained on Audio Description, let's see what the future holds for us :)