App request for developers: A AI driven photo description app that adds captions to images either in apple photos etc, or has it's own album

By Oliver, 12 April, 2024

Forum

iOS and iPadOS

Hi,

I know chat GPT and be my eyes and other such AI driven apps can describe images. What I'm looking for though is a way for us blind folk to take pictures from family and friends, facebook and instagram etc, and be able to easily mark them up with descriptions. The work flow, to my mind would be:

Choose picture > Share > AI Photo Description App (AIDI) > automatically adds description > Save button > Sends image back to origin/album in app.

There are expansions on this, the ability to inquire more of the image, change the way it is described, focusing more on the building, for example, than the people around it, and also the ability to pull out images from it's own album and to ask questions about it.

I'm really looking for something to simplify the way we mark up and catalogue images. We live in a wonderful world now where we're not excluded from such things any more, it's just getting the technology in a format that works best for us. Going through many photographs, copy and pasting, etc, is a pain in the bottom.

I know there is a shortcut kicking around that uses AI to describe a screen, but we need to be able to caption these photographs so we can flick through them quickly and then, if we want, interrogate their content.

Thanks...

Options

Comments

Integrating this feature into an already-existing app

Integrating this feature into an already-existing app sounds more feasible than making an entirely new app for this task. I know there are apps that provide image descriptions and those that let you label images or search your photos for a particular thing by entering keywords. What has to be done is combine these two in one single app, and it's most likely much easier to begin with an app that already does either one.

I'll leave that for…

I'll leave that for developers to decide. It seems like a nice little project for a smaller developer plus comes with the benefit of not being feature bloated like Seeing AI or Be My Eyes which struggle to present in an intuative way.

It wouldn't be bad if Object Voice had these features.

Object Voice is an app that has one simple function: Describe whatever it detects in taken or imported photos, or when you point the camera at something. Plus, it's offline so everything should be more privacy-oriented, smoother and faster. Another thing is that the other apps that utilize online image processing often tend to omit certain details especially when describing people, and people are one of the most essential things that need to be identified in images so that we know who is featured in each photo. It would be great to be able to hear names instead of long descriptions without having to mess around with any of the technical limitations and measures that make the current models slower and their descriptions often lacking certain details. Object Voice should gain new and better capabilities if it can make use of the AI expected to be incorporated into iOS 18.

An Issue

I just added a caption to a picture of a hawk in my camera roll. VO describes it as an owl sitting on a pile of trash. Close enough, it was on a bird it had just preyed upon.
It turns out that this caption is not detected and red by VO, so it still says it's an owl on a pile of trash. The only way I can find to read the caption is to double tap on the picture, then go into the picture info, as if I were going to add the caption.
That's a big ordeal if I'm browsing through an album, almost useless. But hey, if there's a way to directly display the picture captions, please tell me.
I think a better way is if this hypothetical, AI app renames the picture file with a short description, then saves it wherever it's going to end up.
It's a little like what I do now, manually. Share it to Files app or my Dropbox then rename it and keep it in a folder either on my computer or on my phone.

Of course, the ideal is that…

Of course, the ideal is that apple use all its tasty new AI to do this natively. I've found the image description native to voiceover on IOS really hit and miss, sometimes it works sometimes it doesn't and sometimes it continues to describe every item I move over including apps on the homescreen etc.

Maybe I should have a go at creating such an app with GPT 4. It might also be done using a shortcut something like:

Share picture, describe picture, caption picture with output... I just don't know if there is that sort of communication between apps allows allowed on IOS. I'll have a play when I have some time.

Slight progress for me

That caption feature in the photos app is useless with Voice Over, and it seems that sighted people find it almost useless too, going by some of my Google searches. However, some of the terms you used in the OP lead me to dig deeper in the info tab, such as the markup option. Somewhere in all the options--I hope I can find it again--there is an "add description" option. This does get announced by VO right after the photo date when browsing through the thumbnails, even though the process of adding it is a huge pain, as you point out. Thank you, thank you, thank you, Ollie! I've been trying to figure this out for years!
I think I understand where you're coming from now, and it would be a good app, if it could do all that for the blind user in an easy way.
As far as the native VO image description... it isn't too bad, but I usually have it turned off for most parts of iOS. It gives better image descriptions and text recognition than the alt text Facebook inserts into its pictures, so I like it there, and in the photos app.

How to add descriptions manually, thanks to Old Bear

This is a vague transcript of an apple youtube video that covers this:

To add descriptions that voiceover can read in the photos app:

1. Open a photograph.
2. Tap on edit.
3. Tap on markup.
4. Tap on add, it's right at the end of the screen.
5. Tap on descrptions and add your description in the edit box. Hit done.
6. Hit done again.
7. Hit done again.. I think...

It's really buried deep but now, when you open the photos app it will give you the description you just added on the photograph. I'm using the open AI shortcut listed elsewhere on this site to get the description for the photographs and now I can apply them, all be it very slowly to my photographs. I'm going to do some more digging. If markup has some shortcuts there may be a way of applying open AI descriptions much faster. Fingers crossed.

I've also sent a request to apple to include this in its new AI suite of tools, though knowing the speed apple pushes out new cutting edge tech, it might be a couple of years, hence the space for an app that can grind through a photo album and incert AI created descriptions to the metadata.

Does anyone know if the meta data is in the photo file or is referential in the photos app? It might be getting photos out of the photos app will proove the quickest way to process them.

@Ollie

Ah, I would never have found that.

A little while ago I started writing a Python script that would go through a load of files and add captions to images from OpenAI. I hadn't found the description field and was using caption instead, which you get to through Info.

There is a standard (I think) way of adding meta data to images using exif. I found that captions were not stored there and had to be retrieved using a 3rd party library which I think was probably simulating some Apple SDK. I chickened out of trying to write the data back.

But it's possible the description is different. I will have to have another play around and see.

I'm guessing it will be an Apple proprietary thing. In which case it might be better to write using exif and then have something else that reads those and adds them to the photos in an Apple way. That way the data is still available if you are on Windows or Android and can find a way to get it.

Yeah, the python solution…

Yeah, the python solution seems it might be the most workable.

As far as I can tell, getting a photograph I've added a description to on mac and checking it out with just a look at the file, command I, the description remains suggesting it's just meta data that is possible to write. I"m really not a coder so I'm probably over simplifying things here. I'm not sure if there is something like mp3 Tag only for photographs.

It seems that generating the descriptions is one half of it, getting that description into teh meta data of the photo, or markup or whatever they're calling it, is the second part.

Yeah, I did label a load of images this morning and foolishly put them into captions.

There does seem to be a character limit on the descriptions displayed in the photos app though, which we need to be aware of. Furthermore, I couldn't find a way of reading the whole description.

Apple is Maddening

It's unclear to me if the description in the markup is added to the metadata of the actual photo file. I shared the photo of the hawk to my Dropbox and looked at the properties over in Linux. The caption that VO does not announce is there, but not the markup description. There's also some other information missing in the shared photo that can be accessed in the information tab in the photos app.
I'm going to guess that this information is stored in iCloud and on Apple devices, but doesn't get shared outside of the "garden." I've also read that on the Mac, you can add a title, but that's not on iOS, or at least not easily accessible on iOS. *Slams face into pillow and screams*

Re: caption

I think exif is the mp3 tagging thing for photos. I definitely couldn't find the caption from my Python script when I looked.

However I must admit I get totally lost in the Mac Photos app.

I think what I would really want for this sort of thing is for it to be aware of who was in the photo and where it was taken. I guess OpenAI won't know who anyone is. I did see in Photos that you can select faces in a photo and name them, but I presume you would have to do that for every photo individually which is a bit tricky when you can't see the faces. The location should be available somewhere though.

I think it wouldn't be too much of a stretch to think that AI will soon be able to recognise faces it has seen before in your photos and be able to automatically detect them for you. This feels like the kind of thing Apple could bring in if it can do it on-device.

My gut feeling is that although the AI descriptions are amazing, and in their current form add a hell of a lot of detail to a photo that I would love to be able to get to more easily, that it wouldn't be sufficient for generating something I could flip through quickly. For example, as I swipe I might want something like "Bob and Dave leaning against a stile in the Lake District" but I wouldn't want to know about every blade of grass or what else was going on unless I specifically asked for more details. Whereas I think the full Be My AI description would get a bit tiring after the first few swipes.

Sorry I realise I am going off on a little tangent. But I think the three problems we have are getting the right info, being able to apply it in batch to our photos and then being able to easily browse it later.

Agreed. I think the ideal is…

Agreed. I think the ideal is having a short title:

Well dressed and hansom man on horse in the snow fighting goblins... and include location and date.

As far as I understand it, once you have enough people in your album and labelled them up, you should be able to have people detected automatically, which would be amazing.

At the momen or, in the interim, I'm just looking for something to differentiate images in an album. The fictional app I'm hoping for would allow us to go through the album, find, through the short description, the picture we're interested in and then generate a longer description if we want and allow us to ask questions.

Just imagine us uploading all our photographs, short descriptions being generated automatically, then allowing us to browse and dig deeper as we wish.

I think this is really how apple photos should work, but there are too many hoops to jump through as well as nativation of photos being broken in that swiping doesn't take me through photoes, I've got to find them with a direct touch instead.