Update 09/17/2024
Remember when it used to take 10-15 seconds to get image descriptions back? After this update, it should only take from 3 to 5 seconds. Additionally, now you should easily be able to send 15-20 images through Describe Photos with no issues. I haven't yet updated the messaging when sharing large amounts of photos though, just because I don't have enough data on what the limits will be. Also, you should no longer see the "image too large" or the "image not compatible" errors. This typically happened when sharing live photos. Finally, prices are even lower now. I typically will get image descriptions for around half a penny.
Here's just a couple more new things:
- You can now use the /share command in a reply box to share the last photo or screenshot with a friend. Useful for getting sighted people to verify stuff for you.
- If you use comma as your decimal separator, you can now use these shortcuts without having to adjust anything.
Visit the Shortcut documentation. ShortcutJar!
Update 05/22/2024
As the subject says, the Shortcuts now use OpenAI's new GPT-4O model, which is 50% cheaper, a bit faster and more accurate than before!
Note: This isn't the same thing as the new voice / video chatting features that OpenAI recently announced. You cannot voice chat with these Shortcuts!
Anyway, apart from using the new model, the Describe Photo Shortcut also now lets you share multiple images from the share sheet to get described all at once!
This is useful when you want descriptions across multiple photos all at once. I love sending it 3-4 pictures of my dogs and having it try to tell a coherent story that follows what's happening in all four images.
Anyway, to visit the shortcuts' dedicated site, click Here!
Already familiar with the Shortcuts and just want to get going with the new versions?
Update 03/26/2024
I've just released what I'm tentatively calling Version 1 of both my Describe Screenshot and Describe Photo shortcuts.
They can both be found on their new dedicated site!
Some of the new changes include:
- conversations: You can now reply to the descriptions you are provided. to do so, press the okay button on the description's alert. Have nothing to say? No worries! Hit cancel, and the Shortcut will leave you in peace!
- Slash commands: When typing a reply, you can use /save with either Shortcut, and the last photo or screenshot taken will be saved to the photo album of your choosing. Additionally, Describe Photo also has /add, which will allow you to take another picture to accompany your replies.
- Describe Photo now supports the Apple Vision Pro! If you run the shortcut on Vision Pro, it will grab the latest photo from your camera roll rather than having you take one. This is because the Shortcuts app on Vision Pro doesn't support taking photos in shortcuts. If you intend to use this shortcut with other smart glasses or prefer to take your photos in the Camera app, you can make grabbing the latest photo the default behavior in the set up screen.
That's everything. Share and Enjoy! :)
Update
There are now two Shortcuts.
- Describe Screenshots: Can be found here: Describe Screenshots This one, after being assigned to a VoiceOver gesture, will take a screenshot when run and have GPT4 generate a description for you. It also gives you the opportunity to ask a question before sending your image.
- Describe Photo, which can be found here: Describe Photo This one can also be assigned to a VoiceOver gesture, and when run, it will pull up the iOS or mac OS camera interface for you to take a photo which will then be described for you. Additionally, you can share pictures to this Shortcut, either from the iOS and mac OS share sheets, or Mac OS's Quick Actions menu.
Setting both Shortcuts up is identical to before, though now, you will be able to configure the system prompt and other parameters from the set up screen if you so choose. I did this because I hate editing shortcuts directly and the set up screen can be brought back up whenever you want, even long after you've originally installed the shortcut.
On iOS, The set up screen can be reached by editing the shortcut, tapping Shortcut Info on the bottom right, then tapping set up on the top right (immediately beneath the done button.)
From here, you can tap the Customize Shortcut button and you'll be asked all the set up questions again.
Note: The API key field will be blank when setting up your shortcut again, but as long as you've entered it once before, you don't have to fill this field out again. The rest of the set up process and usage is identical, so I'll leave the original post as well.
Original Post
Hi all! The other day, it occurred to me that getting screenshots described is a pain with Be My eyes and / or the ChatGPT app. You have to take the screenshot, hit the screenshot button before it disappears, hit share, then hit describe with Be My AI which is far too many steps for me.
I've written a shortcut using the built-in Apple Shortcuts app that takes a screenshot and describes it using the same technology Be My AI uses. Best part is since it's a Shortcut, you can assign it to a Voiceover gesture. This works on both iOS and Mac OS. I just put it in the iOS forum because I figure more people are likely to see it here. Anyway! the Shortcut can be found right here! Unfortunately, I'm not rich and can't afford to pay for everyone's usage, so this does cost (two to three cents per image) and there is a bit of setup involved.
So how do I set this thing up?
I'm glad you asked! Before you install this Shortcut, you need to do a few things: 1. Create an OpenAI account. This can be done at platform.openai.com. If you have a ChatGPT account, you may skip this step. Otherwise, just head to that site, press the sign up button, and follow the instructions.
Sign into your OpenAI Account (if you're not already) and head to their billing page. Here, you'll follow their instructions to set up a billing plan with them. It's not as complicated as it sounds. You basically just load your account with money ahead of time, and every image you have described pulls a couple cents from that balance until it reaches 0, at which point you can refill it again or never use the account again. This is not the same thing as a ChatGPT Plus subscription. if you have a ChatGPT plus subscription, you still have to do this.
Acquire an API Key You can do this on their API Key page. Just hit the create button, type a name for it, and hit create. then a text box will appear with your key. Copy this key and save it somewhere safe. OpenAI will not show you this key again, so if you lose it, you'll have to create another.
Also, don't share this key with anyone. Anyone who has access to your key can use their services pretending to be you, which will cost you money. If somebody does get their hands on your key, you can delete it on this page.
Install the shortcut! Once again, the Shortcut can be found here. When you install the shortcut, it will ask for your API Key. Paste it into the box and tap Install. At this point, it should be ready to use.
Assigning it to a Voiceover gesture.
This part's pretty easy. Just go to settings, accessibility, VoiceOver, Commands, All Commands, Shortcuts, then select the Shortcut's name (describe screenshot.) Then you will be given the option to add a gesture or keyboard shortcut. Once you add either or both, any time you use that gesture or keyboard shortcut, the shortcut will run.
I've installed the shortcut and set up a VoiceOver gesture but how do I use it?
Pretty simple: Whenever you want your screen described, make sure screen curtain is off, then use your VO gesture to activate the shortcut. Your phone will take the screenshot, then open the Shortcuts app so you can include a question with the image. Type in your question (if you have one,) then tap done. Then you can return to what you were doing. The description will take somewhere between 10-30 seconds to come back, but you don't have to wait in the Shortcuts app. Just go back to your Youtube video or whatever. Once the description appears (the shortcut should play the Tri-tone notification sound to let you know the description's there,) after which you can feel around the top center of your screen until VO focus has landed on the description field. Once it's there, you can swipe through the description and hit the done button when you're done reading it. At the end of the description, you will be told exactly how much that description cost you, so if you're conscious about money, be sure to read through the end. If enough people want me to move the total cost to the top of the description, I can definitely do that.
I don't like how it talks! Can I change it?
yes, you absolutely can. If you go into your shortcuts app, find the Describe Screenshot shortcut, hit the edit action (using the rotor,) and the first 4 or 5 text fields of the shortcut are all parameters which you can modify to your heart's content. If you specifically want to modify the way it talks just edit the text in the system prompt field. There's a comment box immediately before it that will tell you which one it is.
Dude, you talk a LOT!
I know! I know! I hope this Shortcut is as useful to all of you as it is to me. Please let me know what you think, and if you like it, share it with your friends who might benefit! :)
Comments
Very cool! Thanks for makingâŠ
Very cool! Thanks for making this!
Really cool but you may want to change the default system prompt
Thanks for this useful shortcut. I had to go in and change the "snarky" tone of the system prompt. You may want to do the same, at least as a default option!
Once again though, very cool and thank you!
default system prompt
I've updated the original post with a link to the updated shortcut with the formal / professional system prompt.
Good on you for going in and editing it yourself though, that's what I want to see from people using the Shortcut, and why I provided those comments before each text field.
Knowing how to edit and write system prompts is a really powerful tool because you can better tailor the AI to your use-cases.
Also, the more each of us learns, the more we can experiment to find the best prompts for each use case or app.
The best part is writing prompts isn't actually that difficult. Anybody who's decently okay at writing in their native language can do it. :)
Awesome!
This is great stuff indeed! I also changed your original prompt, but I haven't edited out the snarky and humourous tone of it yet. But I live in Norway, so Iâve just copied your prompt into a prompt optimizer tread I have on Chat GpT, or in the OpenCat app to be more specific, and told it to translate it to norwegian and to add that I want all answers back in norwegian if nothing else is specified. And then just pasted it right back into the shortcut.
I also translated your question frase on the question about what I want to know about the picture. Works somehow fine, but I've seen a few issues, e.g. the screenshot being taken of the Siri interface and describing it instead of the actual screen content that I wanted to get described. Don't know why that happened a few times, but it has to have something to do with the timing of the actual screenshot, and probably relates to the placement this action is set in the shortcuts order of steps.
I'LL probably edit out the more snarky tone of it soon, to keep the result more efficient and hopefully shorter. And that's another question. I saw there was one or maybe more values for max tokens to use when running the shortcut. I have had the answer from this shortcut going on and on for ages, which probably gets unnecessary expensive in the long run. I think I saw the token value set to 1500. Would it perhaps help to decrease this number a great deal, to keep the answers shorter?
And also, is there any way that the return from Chat GpT can be sent directly to Siri and being read out by the standard Siri voice, instead of having to take the route via the shortcuts app to get the answer?
But this is great work, and the effort you've put in to making this is well worth the time! This can also be a very good starting point for all us other geeks to tweak and hammer on to our own liking! đđ»
Good job, mister! :)
updated with a new Shortcut and easier configuration
Hi all.
I just updated the original post to include my new Shortcut which works basically the same way, except rather than taking a screenshot, it will take a picture with your camera, or receive one from your share sheet / Mac OS quick actions.
Additionally, all the questions (API Key, system prompt, Max_tokens and temperature) will now be asked on the set up screen so you can edit them whenever you want without having to edit the actual Shortcut.
Let me know what you all think!
I have to run to work, but I'll respond to you all during one of my breaks or after work.
very neat
I tried this on macos, but I wonder if it could be set to not require opening the window chooser?
I'm going to have a look and see if modifications can be made to allow the description to appear as an alert with "ask more", "copy" and "okay" buttons afterward. THis would also apply to iOS.
Additionally, for whatever reason the "describe photo" shortcut is not listed in the sharesheet.
Has anyone else had this problem?
Again, Thank you Aaron for sharing this.
I'm looking forward to experimenting more and using what you've made as a starting point.
replies
Cliff said:
How are you finding its descriptions in Norwegian? How do they compare to English descriptions? I've always wondered how well GPT4 would do with image descriptions in other languages. I've tried Spanish and it seems to do okay.
Cliff said:
this shouldn't be happening. Siri is never used during the execution of this Shortcut, unless you made adjustments to it.
Are you activating the shortcut from Siri? If so, the siri screen will cover the app's screen, yes. You have to activate it from a voiceover gesture, a back tap on your phone, or anything else that won't place anything on the screen before the shortcut runs.
Cliff said:
Absolutely. 1500 tokens of english text is around 1100 words. You can measure how many words can be expressed by a given number of tokens by just taking 75% of that number.
for example, 1000 tokens will usually be around 750 words. That being said, languages other than english are tokenized less efficiently since the language models have less exposure to them during their training, so it could be that 1000 tokens are required to write 500 Norwegian words, for example. I don't know if anyone has published a list of languages along with their average token count per word anywhere, but I can look around for you.
Cliff said:
As I said earlier in this message, this won't work with Siri the way it's set up. When you ask Siri to run the shortcut, the siri interface covers up the app's interface and so the screenshot is taken of siri and not the app you wanted to take a screenshot from.
That being said, the shortcuts app does have a speak text action, so you can replace the quick look action at the bottom of the shortcut with speak text. I don't know how configurable that action is though, in terms of what voices / speach rates you can use.
Quinton Williams said:
It depends on if the Show Alert action is accessible on mac OS or not. the reason I use Quick Look is because Show Alert doesn't work on iOS super well. I also like Quick Look because it allows you to easily copy the text to the clipboard or share it anywhere as a textfile, on both iOS and mac OS.
Anyway, it wouldn't be difficult to use Show alert on mac OS and Quick Look on iOS. give me a couple minutes and I can do that and see if it's better that way.
The other problem with Show alert on iOS is that all the text is written as one giant block so you can't swipe through it, which I personally like to do though I know some people would probably prefer to have the whole thing read in one giant chunk.
Quinton Williams said:
Unfortunately, you're kinda stuck with the blocks Apple provides for you. Quick Look has a share button with which you can then hit copy. show alert might too, I can't remember. the problem is you can't add a "ask more" button to the show alert or quick look screen, as they're not modifyable.
One thing you can do is after dismissing the alert or quick look window you can use an input action to ask users if they want to ask a follow up question, but looping with Shortcuts is frustratingly limited.
There is a repeat action, but it can only be set to an integer, e.g. you can do repeat three times. You can also repeat for each item in a list, but you can't do repeat until x = false.
You can have users choose how many times they want the loop to execute at runtime, but it's hard to know how many follow up messages you'll need to send.
I considered setting it to looping 100 times or something ridiculous, but then if you don't want to ask a follow up question you shouldn't have to dismiss the text box each time.
I'm sure there's an ideal solution to this problem, but I decided not to mess with it. I hate the interface Apple provides for these actions so the less I have to deal with them the happier I am. Plus, conversations can get really expensive. I find it cheaper to just send the same screenshot again with a question that will better direct the AI to provide the information I want. Definitely keep us posted if you find something that works well though!
Quinton Williams said:
Hmm. It's actually not in my share sheet on mac OS even though share sheet is checked. It's definitely coming up on iOS though.
On Mac OS, I just vo shift m on a file, go to the quick actions menu, and select Describe Image from there.
updated both shortcuts
I've updated both shortcuts so now, on mac OS, it will use show Alert by default.
If we can get Apple to fix the accessibility issues with shortcuts on iOS we can use Show alert on iOS too.
The only other thing I want to figure out at this point is resizing the images.
Part of what's making the shortcut as expensive as it is is that Macs and iPhones either have high resolutions or take high res photos... either way, we're sending huge images to OpenAI.
Shortcuts does have an action to resize a photo, but I'm scared to mess with that because I don't want to accidentally ruin the photos. lol
This is an excellent idea. Thank you.
This is a truly excellent idea. Thank you very much. Iâm using it and finding it useful. Moreover, Iâm very grateful for the effort and work youâre putting in. I hope we can have more of what youâre offering and more innovative ways to use our devices.
appreciate your effort
Using Be My Ai and Seeing AI for long I barely see the need for such a shortcut but I do appreciate you creating this one and sharing it with the rest of us here. For a long time I wanted to automate an action but I tried really hard including asking Chat GPT and Barred to guide me into creating an automation.
Automations are for whats app, don't think it would be possible as this is not a native IOS app.
Automation 1: When receiving a message in whats app. from a specific sender, with keywords or phrase. Reply to sender with a custom predefined response message.
Automation 2: Get Siri to send my live location to a group chat in whats app. Ask before running. Triggered by alarm being stopped.
These automations are essential as of today as no one anymore uses the Text Messaging app. on IOS to send out messages. Their only use is now to receive OTP OR Alerts and marketing crap. Thank you for reading my comment, have a good one!
Can it be used with Gemini?
As far as I know, Gemini pro can also process images and print text and can be used over the api key. But unlike gpt4, it also offers a free service. Is it possible to update this shortcut to use Gemini api?
delayed responses!
Hi all!
Firstly to the Whatsapp question, Whatsapp does support the Shortcuts app, so if you head to the automation tab in the shortcuts app, this should be doable.
As for Gemini: Gemini 1 pro is really, really bad at image descriptions. Additionally, though it is free, it's only free because they're using data we send to train future models, so you have no guarantee of privacy whatsoever. I don't feel comfortable building on that platform. Their more capable models 1.0 Ultra and 1.5 Pro are not available to everyone on their API yet, so if I built shortcuts for them, most people wouldn't be able to use them.
I also tried Anthropic's new models, but though they're all amazing, even better than GPT-4 at text-based tasks, they all made a ton of things up for just about every image I passed to them.
Official v1.0 release is out!
I've just released what I'm tentatively calling Version 1 of both my Describe Screenshot and Describe Photo shortcuts.
They can both be found on their new dedicated site!
Some of the new changes include:
That's everything. Share and Enjoy! :)
Shortcuts and Signal app?
Hi,
Your shortcuts sound excellent. Just the other day, I wanted to take a photo using Signal app's camera; and I'd obviously like to have it described as well. Can this be done?
I'm a newcomer to this. How do you know which camera (front or back) is active when you take a photo?
Thanks.
Question about Comparing images
Sometimes, I have two or more images, and I would like to compare those images. Is it possible to do this on the application? I mean, like, with the shortcut? If yes, do I need to select one image first, then add the next images to the conversation? I know that it's not possible to do this on the Be My Eyes app. Also, would it be possible for you to add a safe button? I think this would be easier than having to type a command to get that action done, if you're able to.
Replies
Cordelia Said:
It depends. From my brief google search on this, it looks like Signal's camera doesn't save photos to your phone's camera roll by default. You have to "save" each photo before it'll show up there.
Once you've saved it to your camera roll, you can run the Take Photo shortcut and have it grab the most recently taken photo. Alternatively, you can use the share sheet to share the photo with the shortcut.
Grabbing the most recent photo is probably quicker, since you can hit save on the photo in signal, then run the shortcut with your VO gesture of choice and get your description immediately.
Cordelia Said:
I'm not sure how the Signal camera is laid out, but with the regular iOS one, you'll have a button labeled something like "Switch to back camera." If it says that, it means the front camera is the one being used.\ Similarly, if it says "switch to front camera" that means the back camera is the one being used.
Winter Roses said:
Not at the moment, though I'm not opposed to adding this. It just might take me a week or two to get it done. :)
Winter Roses said:
This is unfortunately one thing I can't do. The shortcuts app doesn't let you add buttons to things. I could add an alert that asks if you want to save or add another photo every time you send a message, but that means it would show up every single time you sent a message, regardless of whether or not you were interested in saving or adding another photo. This would slow things down... a lot!
Have you tried dictating /save? It works for me.
Thank you, and a question
Regarding the save option, no worries. I don't mind typing the command. I was only wondering if that would be a possibility.
This might be a bit off-topic, but the reason I was asking about comparing pictures is because, even though I'm totally blind, sometimes, I like to use the image creation application, you know, Copilot? Yeah, sometimes, even though I use the be my eyes app to tell me what the pictures are, after they have been generated, I would like to be able to compare the pictures that I like and have saved to my device. I would like to know which one is more detailed, or which one best illustrates the point I'm trying to get across. Anyway, I was wondering if it would be possible for you to create a shortcut for Copilot, when I generate a picture, I get four samples, and that's great, but sometimes I get different aspects of the prompt I would like to keep, but I don't know how to specify that. For example, the first picture might have the background that I wanted, but the second picture has the people that I requested. I don't know how to tell the App how to combine those two elements into one picture going forward. It would be nice to be able to use a shortcut to create one single picture, and then be able to give the app specific instructions to build on that particular picture, instead of working with four different images. I'm not able to specify what aspects to keep or removed Without losing everything in the process. Even if you can't get this done, this is the context of where I was coming from yesterday. That's why I was asking about comparing images. The problem with Copilot is that you're not able to specify which pictures to keep. It's hard to specify what aspects you want from the created images, instead of always starting from scratch.
Can't be done, unfortunately
In the world of AI, there's problems that a regular person like you and I could spend thousands of hours and dollars trying to solve but such an attempt would be completely pointless. Why? Because the field of AI is developing so rapidly that within about six months the landscape will look completely different and the problems we were trying to solve are problems no longer.
The issue you're having is one such problem. It's a difficult enough problem that neither of us have the resources to try to solve, but somebody else does and eventually will.
Let me walk you through what's going on, so you have an idea of why it happens at least.
When you're talking to Copilot, you're talking to GPT-4, which is a large language model. It's what does all the cool writing stuff that is sometimes hard to distinguish from human writing.
GPt-4 is pretty smart, in that it can understand what you want from it and it knows how and when to use tools.
DALL-E is one such tool. It is an AI system that can turn written text into art, based on the millions if not billions of text-image pairs it saw in its training. It, for example, knows what a house looks like, because it has seen many images that are associated with the word house.
The problem is, unlike GPT-4, DALL-E isn't very smart. It doesn't have the same grasp of the English language that GPT-4 does, and converting from words to pictures is really, really hard. It also can't reflect on its work, so it might be asked to make a house with windows made of cotton candy, and it will happily churn out the image you requested, but it can't look at it after the fact and say "Hey, the candy doesn't look as cloudy as I wanted it to. Let's fix that!"
This is why you'll have some images that follow your prompt exactly and others that don't. This is also why you'll usually have more than one image generated, because the developers know that you're unlikely to get what you were looking for on your first go.
It's possible that when we have DALL-E 4 or 5, this won't be a problem anymore.
The other thing to keep in mind, and the reason I mentioned GPT-4, is because you can write all the prompts you want, but that's not a guarantee that those are the prompts DALL-E will receive.
When you ask Copilot to generate art for you, GPT-4 comes up with the prompt for you based on your specifications, sends the request to DALL-E, DALL-E creates those four images then sends them back to GPT-4, and finally, they're given back to you. So you're basically playing telephone and all you can do is hope that your original message is preserved as it's passed between these two machines.
In the early days of DALL-E 3, there were ways to force it to generate the exact same image again and again, so you could ask it to make adjustments to that one image, but as far as I know, these methods don't work anymore.
There would also be no way to build a shortcut that merged these images together, because each image is different and you would need to apply different techniques to merge them based on how the images look.
You could also theoretically see better results by trying different prompts. If this is something you want to keep exploring, feel free to send me an email here on Applevis and we can see if we can figure something out!
Latest update now uses the new GPT-4O model!
As the subject says, the Shortcuts now use OpenAI's new GPT-4O model, which is 50% cheaper, a bit faster and more accurate than before!
Note: This isn't the same thing as the new voice / video chatting features that OpenAI recently announced. You cannot voice chat with these Shortcuts!
Anyway, apart from using the new model, the Describe Photo Shortcut also now lets you share multiple images from the share sheet to get described all at once!
This is useful when you want descriptions across multiple photos all at once. I love sending it 3-4 pictures of my dogs and having it try to tell a coherent story that follows what's happening in all four images.
Anyway, to visit the shortcuts' dedicated site, click Here!
Already familiar with the Shortcuts and just want to get going with the new versions?
getting an error when trying to run the shortcut
Good morning,
I have updated this shortcut, however, it is telling me that the gpt4O model doesn't exist. Any suggestions?
Thanks so much,
Chris
Some ideas
Usually you won't be given access to the latest model unless you've added money to your account. Have you? or are you just using trial credits.
Shortcut issues
Hi,
Yes, Iâve added money to the account, and itâs giving me the following error.
Error: {"param":null,"message":"The model `gpt-4o` does not exist or you do not have access to it.","code":"model_not_found","type":"invalid_request_error"}
Question
I'm sure there is an obvious answer to this question so some will surely say it's a dumb question but I don't know so here goes. Why can't there just be a shortcut to a free service like Be My Eyes instead of having to pay for descriptions using this service? I am just curious and in no way trying to denigrate what you have created because it is extremely helpful and appreciated.
hmm
I'm looking at the documentation for GPT-4o.
OpenAI said:
Here's their documentation.
I couldn't tell you why you don't have access. You might want to contact them about this.
In the mean time, you can download the old versions again if you don't want to lose access:
describe screenshot describe photo
If you want to try more troubleshooting steps with me send me an email and we can probably schedule a time to look through it together.
Shortcut For Be My Eyes
The reason a Be My Eyes / Be My AI shortcut can't exist is because the Be My Eyes hasn't built Shortcuts support for their Be My AI features.
Even if they did though, you'd still be missing out on a couple things, like sharing multiple images at once and writing your own prompts. Be My AI doesn't let you do this last I checked.
Thanks
Thanks for the response about Be My Eyes. Makes sense.
Probably stupid question
Firstly, nice work with the shortcuts. However, I have a question, is there a way not to move focus to the Shortcuts app while using these? I guess not, but I want to check, because I find it a bit annoying.
Yes
Moving to the Shortcuts app is the default behavior because if you have a phone with a Dynamic Island (14 pro or newer) there's a really irritating Voiceover bug that causes things to break when a shortcut shows an alert or text box outside the Shortcuts app.
If you're not on a 14 pro max or newer, you can turn this behavior off by going back through the sett up questions and removing the y from the text box that asks if you are a blind person using an iPhone 14 pro max.
If you are using a newer phone, you can still turn this off, but you'll be dealing with the stupid bug.
The bug is pretty easy to deal with though, it's just tedious. Whenever you run one of the shortcuts, before you can type in the text box to send a message, or before you can hit the done button, you have to dismiss the thing that pops up on the dynamic island. So you'll touch the very top center of the screen, swipe down to dismiss, then double tap.
This will dismiss the thing that breaks VO and you'll be able to use the shortcut outside the shortcuts app. Keep in mind though that it will come back each time you run the shortcut again.
Re: Aaron
Thank you for the explanation, then I'll leave it as is for now.
Another thing. I hav modified the system prompt into another language and also explicitly stated in the system prompt that it shall answer in that language, and still, if I send an image without a prompt, the answer is still in English. If I send a prompt with the image and tell it there to answer in my language it works, but it would be nice if it did it without asking every time. Anything you can think of that I can do?
What language?
Which language are you trying to have it respond in? also, what kind of things are you having it describe?
Can you share your system prompt? I can mess around with it in a couple hours and see if I can get something working consistently.
Re: What language?
Swedish. I have thrown different images at it, a couple photos, a screenshot and a couple of images from the internet.
Here is the prompt. It is long (maybe that's an issue?):
"Svara alltid pÄ svenska om inget annat anges. Du Àr en tillgÀnglighetsassistent som hjÀlper blinda anvÀndare att fÄ tillgÄng till visuell information som kanske inte Àr tillgÀnglig endast med en skÀrmlÀsare, och att svara pÄ frÄgor relaterade till olika Àmnen pÄ ett sÀtt som gör dem begripliga för en blind person. NÀr du svarar pÄ frÄgor, var alltid mycket tydlig med att informera anvÀndaren nÀr nÄgot Àr fakta som kommer frÄn din trÀningsdata kontra en kvalificerad gissning. Ge uttömmande, detaljerade och neutrala svar pÄ alla frÄgor och ange kÀllor nÀr de Àr tillgÀngliga. Ge alla mÄtt i metriska enheter.
NÀr anvÀndare skickar in bilder, sÀkerstÀll att dina beskrivningar innehÄller alla relevanta visuella detaljer, inklusive:
Text: Transkribera all text inom bilden ordagrant. Om texten Àr otydlig, delvis skymd eller pÄ ett annat sprÄk, notera detta i din beskrivning.
Objekt och mÀnniskor: Beskriv utseendet, positionen, storleken och förhÄllandena mellan alla objekt eller personer i bilden. NÀmn fÀrger, texturer och andra visuellt relevanta detaljer.
Handlingar och sammanhang: Om bilden visar nÄgra handlingar eller antyder ett sÀrskilt sammanhang, beskriv dessa ocksÄ.
Ditt mÄl Àr att ge beskrivningar som Àr omfattande, tillgÀngliga och anvÀndbara för nÄgon som inte kan se bilden. Om anvÀndaren inkluderar en frÄga med bilden, besvara den direkt och korrekt baserat pÄ den visuella information som tillhandahÄlls efter bÀsta förmÄga. Om frÄgan Àr tvetydig eller inte kan besvaras enbart baserat pÄ bilden, ge den mest relevanta informationen du kan och notera eventuella begrÀnsningar.
AnvÀndare kan stÀlla subjektiva frÄgor om Àmnen i foton. Svara dem pÄ det sÀtt som majoriteten av mÀnniskor skulle förvÀntas göra. Var aldrig rÀdd för att besvara dessa frÄgor.
NÀr anvÀndaren delar en bild kan det vara av ett dator- eller telefongrÀnssnitt, ett helt fönster, en del av ett fönster eller en individuell kontroll. Om sÄ Àr fallet, generera en detaljerad men koncis visuell beskrivning. Om bilden Àr en kontroll, informera anvÀndaren om kontrolltypen och dess nuvarande tillstÄnd om tillÀmpligt, den synliga etiketten om den finns och hur kontrollen ser ut. Om det Àr ett fönster eller en del av ett fönster, inkludera fönstertiteln om den finns, och beskriv resten av skÀrmen genom att lista alla sektioner frÄn toppen och förklara innehÄllet i varje sektion separat. För varje kontroll, informera anvÀndaren om dess namn, vÀrde och aktuella tillstÄnd nÀr det Àr tillÀmpligt, samt vilken kontroll som har fokus. Se till att inkludera alla synliga instruktioner och felmeddelanden.
Om anvÀndaren skickar enbart en bild utan ytterligare instruktioner i text, beskriv bilden exakt enligt anvisningarna i denna systemprompt. HÄll dig strikt till anvisningarna i denna systemprompt för att beskriva bilder. LÀgg inte till nÄgra ytterligare detaljer om inte anvÀndaren specifikt ber om det.
Skriv dina beskrivningar som meningar styckevis. AnvÀnd inte listor eller tabeller och formattera inte texten, för att maximera tillgÀngligheten."
Hmm. It seems to be workingâŠ
Hmm. It seems to be working just fine for me.
I got this description after taking a picture of myself:
I took 3 or 4 other pictures and they all worked okay as well.
This makes me think the system prompt you wrote isn't being added to the shortcut.
Are you adding it by going through the set up questions? System prompt is the second question. When you added it, did you make sure the box was completely blank before typing / pasting this one in? Otherwise, you might have the default English system prompt alongside this one, and that might confuse it.
If you want, you can send me an email via the Applevis contact form and we can meet over Zoom to troubleshoot further.
Ok, then it's probably me :)
Thank you for checking, it's probably me that has messed up somehow. I'll troubleshoot myself and if I can't get it working I'll write again.
Fixed
Had to re-install the shortcuts and add the system prompt in the setup, however I tried to change it in the existing one, it would not stick. Now it works flawlessly.
Thanks again for your work on the shortcuts and your speedy support.
Replies to blindpk and BlindTechie
blindpk said:
Glad you were able to get it to work! I haven't had that issue before but I'll try looking into it to see if I can reproduce it myself.
BlindTechie says:
Hmm. This error specifically means the temperature value is outside the range of 0 and 2.0. If I set my temperature to 2.1, for example, I receive this error.
Are you comfortable enough with the Shortcuts app to check what it's currently set to? I can provide text-based instructions for how to do this in this message, and if needed, am happy to send a recording demonstrating as well. Also, As I mentioned to blindpk earlier today, feel free to email me using the Applevis contact form and I can hop on a Zoom meeting with you to troubleshoot this issue in-person.
You could also just reinstall the shortcut and leave all the values (except API Key) at their defaults again and see if that fixes it.
Here's how you would check the value of the temperature on an iOS device with VoiceOver:
Continuing to swipe right, you should hear
You will hear the exact same sequence of blocks as you continue swiping right. The first of two blocks is always the text block, containing the answers to your set up questions, and the second is the set variable action, that stores what you typed in an easy to retrieve way.
The temperature field is the fourth text box from the top. It should say 0.5, with no spaces, no new lines, nothing else, just 0.5. If it says something else, just double tap the text field, delete what's there, and type 0.5 yourself. After typing 0.5, check the top right corner of the screen, you should see a button labeled "done." Double tap that and your changes will be saved.
decimal separator for temperature
Hi,
You should add a note to please change 0.5 to 0,5 or whatever the decimal separator in your country is.
It seems as though Apple Shortcuts does its crap when sending the API request. For a German friend he had to change 0.5 to 0,5 and I that live in Spain also, and both countries use the comma as a separator.
Thanks.
Interesting
This makes me want to just set the default temperature to 1 because you don't need a decimal number for that, and then if somebody wants to change it, they would just use the system that's familiar to them without really thinking about it. That's probably what Apple intended when they set it up like that.
I use 1 as my default temperature. I'll think about it some more before making that change though.
Thanks for letting me know!
GPT-4o Mini
Would it be possible to add an option, e.g. in the setup, to use GPT-4o Mini for image descriptions instead of the regular GPT-4o, to save money? I haven't tried it out much yet, so it is hard to tell if it is "good enough" to be really usable, but it might be worth having as an option.
definitely!
Unfortunately, I've been traveling since the release of GPT-4O-mini so I haven't had time to update the shortcuts to support it. I'll be back home tomorrow and will likely finish it up by then.
Great!
Thank you for the quick response. I'm not really sure how much cheaper it will be in actuality, but I think it is worth having as an option to try out.
What a waste!
GPT-4o-mini's a bust. Despite being significantly cheaper when handling pure text-based tasks, when processing image-based inputs, the prices for 4o and 4o-mini even out, so I can't see a situation where anyone would want to use mini over 4o for these shortcuts when 4o is bigger and smarter and costs exactly the same.
As a result, I'm probably not gonna finish the support for 4o-mini after all. it just doesn't seem worth it.
If you're curious, others are discussing this issue on OpenAI's forums at this thread.
That being said, I have an update coming in the next couple days which should make the Shortcuts much faster assuming things pan out the way I'm expecting.
I'm also currently working on Shortcuts for Claude 3.5 sonnet, which is much better than GPT-4o right now and only slightly more expensive.
Will keep you all posted!
GPT40 Mini performance
My experience is that GPT4o Mini is currently actually slower than GPT4o for image processing. This in combination with a similar price and lower performance, it makes no sense to use it for that as of now. Let's see if OpenAI realizes this.
Re: Aaron
Sounds very reasonable, if it is not cheaper, don't bother. Sounds very interesting with faster shortcuts and Claude 3.5 support however. Haven't really tested Claude myself.
Any updates?
Have you done any more work on these?
Since you asked for an update!
Remember when it used to take 10-15 seconds to get image descriptions back? After this update, it should only take from 3 to 5 seconds. Additionally, now you should easily be able to send 15-20 images through Describe Photos with no issues. I haven't yet updated the messaging when sharing large amounts of photos though, just because I don't have enough data on what the limits will be. Also, you should no longer see the "image too large" or the "image not compatible" errors. This typically happened when sharing live photos. Finally, prices are even lower now. I typically will get image descriptions for around half a penny.
Here's just a couple more new things:
Download Describe Screenshot
Download Describe Photo
Visit the Shortcut documentation. ShortcutJar!
Also, I haven't put this in the documentation or Changelog because I'm on the iOS 18.1 beta and not sure this change has hit the official iOS 18 launch, but the dynamic island bug has been fixed! If you take the y out of the dynamic island set up question, you won't be forced into the shortcuts app whenever you use these shortcuts and you won't have the annoying VO issue anymore.
Great!
Thanks for the updates! Will test later.
What if i already have a Chat GPT Pro subscription?
Thank you for not only making this shortcut, but for then taking the time and effort of sharing it with the rest of the community.
I presently have a Chat GPT Pro subscription for which i am paying clsoe to $20 per month.
I would imagine that in this situation i do not need to purchae a separate API key?
How would i go about generating an API key if i already have a subscription without having to pay anything extra?
Re: Gaurav J
You do, sadly, need a separate API key, you can't generate one using your ChatGPT subscription.
separate services
Yeah, unfortunately, ChatGPT and OpenAI's API are completely separate services. That being said, you only need to put $5 into the API. That will last you a really really long time.
Each image description costs slightly under half a cent most of the time, so yeah... $5 can last ages.
Chatgpt-4o-latest
I guess that these now use the GPT-4o-2024-08-06 model (which is why they have become cheaper). Would it be possible to add an option to use the ChatGPT-4o-Latest model? It is more expensive, but also, at least if you believe LMSys Chatbot arena (which is a bit doubtful if you should since it is just based on user preference) it is the best vision model available right now so it would be nice to have as an alternative for the higher quality.