Update 09/17/2024
Remember when it used to take 10-15 seconds to get image descriptions back? After this update, it should only take from 3 to 5 seconds. Additionally, now you should easily be able to send 15-20 images through Describe Photos with no issues. I haven't yet updated the messaging when sharing large amounts of photos though, just because I don't have enough data on what the limits will be. Also, you should no longer see the "image too large" or the "image not compatible" errors. This typically happened when sharing live photos. Finally, prices are even lower now. I typically will get image descriptions for around half a penny.
Here's just a couple more new things:
- You can now use the /share command in a reply box to share the last photo or screenshot with a friend. Useful for getting sighted people to verify stuff for you.
- If you use comma as your decimal separator, you can now use these shortcuts without having to adjust anything.
Visit the Shortcut documentation. ShortcutJar!
Update 05/22/2024
As the subject says, the Shortcuts now use OpenAI's new GPT-4O model, which is 50% cheaper, a bit faster and more accurate than before!
Note: This isn't the same thing as the new voice / video chatting features that OpenAI recently announced. You cannot voice chat with these Shortcuts!
Anyway, apart from using the new model, the Describe Photo Shortcut also now lets you share multiple images from the share sheet to get described all at once!
This is useful when you want descriptions across multiple photos all at once. I love sending it 3-4 pictures of my dogs and having it try to tell a coherent story that follows what's happening in all four images.
Anyway, to visit the shortcuts' dedicated site, click Here!
Already familiar with the Shortcuts and just want to get going with the new versions?
Update 03/26/2024
I've just released what I'm tentatively calling Version 1 of both my Describe Screenshot and Describe Photo shortcuts.
They can both be found on their new dedicated site!
Some of the new changes include:
- conversations: You can now reply to the descriptions you are provided. to do so, press the okay button on the description's alert. Have nothing to say? No worries! Hit cancel, and the Shortcut will leave you in peace!
- Slash commands: When typing a reply, you can use /save with either Shortcut, and the last photo or screenshot taken will be saved to the photo album of your choosing. Additionally, Describe Photo also has /add, which will allow you to take another picture to accompany your replies.
- Describe Photo now supports the Apple Vision Pro! If you run the shortcut on Vision Pro, it will grab the latest photo from your camera roll rather than having you take one. This is because the Shortcuts app on Vision Pro doesn't support taking photos in shortcuts. If you intend to use this shortcut with other smart glasses or prefer to take your photos in the Camera app, you can make grabbing the latest photo the default behavior in the set up screen.
That's everything. Share and Enjoy! :)
Update
There are now two Shortcuts.
- Describe Screenshots: Can be found here: Describe Screenshots This one, after being assigned to a VoiceOver gesture, will take a screenshot when run and have GPT4 generate a description for you. It also gives you the opportunity to ask a question before sending your image.
- Describe Photo, which can be found here: Describe Photo This one can also be assigned to a VoiceOver gesture, and when run, it will pull up the iOS or mac OS camera interface for you to take a photo which will then be described for you. Additionally, you can share pictures to this Shortcut, either from the iOS and mac OS share sheets, or Mac OS's Quick Actions menu.
Setting both Shortcuts up is identical to before, though now, you will be able to configure the system prompt and other parameters from the set up screen if you so choose. I did this because I hate editing shortcuts directly and the set up screen can be brought back up whenever you want, even long after you've originally installed the shortcut.
On iOS, The set up screen can be reached by editing the shortcut, tapping Shortcut Info on the bottom right, then tapping set up on the top right (immediately beneath the done button.)
From here, you can tap the Customize Shortcut button and you'll be asked all the set up questions again.
Note: The API key field will be blank when setting up your shortcut again, but as long as you've entered it once before, you don't have to fill this field out again. The rest of the set up process and usage is identical, so I'll leave the original post as well.
Original Post
Hi all! The other day, it occurred to me that getting screenshots described is a pain with Be My eyes and / or the ChatGPT app. You have to take the screenshot, hit the screenshot button before it disappears, hit share, then hit describe with Be My AI which is far too many steps for me.
I've written a shortcut using the built-in Apple Shortcuts app that takes a screenshot and describes it using the same technology Be My AI uses. Best part is since it's a Shortcut, you can assign it to a Voiceover gesture. This works on both iOS and Mac OS. I just put it in the iOS forum because I figure more people are likely to see it here. Anyway! the Shortcut can be found right here! Unfortunately, I'm not rich and can't afford to pay for everyone's usage, so this does cost (two to three cents per image) and there is a bit of setup involved.
So how do I set this thing up?
I'm glad you asked! Before you install this Shortcut, you need to do a few things: 1. Create an OpenAI account. This can be done at platform.openai.com. If you have a ChatGPT account, you may skip this step. Otherwise, just head to that site, press the sign up button, and follow the instructions.
Sign into your OpenAI Account (if you're not already) and head to their billing page. Here, you'll follow their instructions to set up a billing plan with them. It's not as complicated as it sounds. You basically just load your account with money ahead of time, and every image you have described pulls a couple cents from that balance until it reaches 0, at which point you can refill it again or never use the account again. This is not the same thing as a ChatGPT Plus subscription. if you have a ChatGPT plus subscription, you still have to do this.
Acquire an API Key You can do this on their API Key page. Just hit the create button, type a name for it, and hit create. then a text box will appear with your key. Copy this key and save it somewhere safe. OpenAI will not show you this key again, so if you lose it, you'll have to create another.
Also, don't share this key with anyone. Anyone who has access to your key can use their services pretending to be you, which will cost you money. If somebody does get their hands on your key, you can delete it on this page.
Install the shortcut! Once again, the Shortcut can be found here. When you install the shortcut, it will ask for your API Key. Paste it into the box and tap Install. At this point, it should be ready to use.
Assigning it to a Voiceover gesture.
This part's pretty easy. Just go to settings, accessibility, VoiceOver, Commands, All Commands, Shortcuts, then select the Shortcut's name (describe screenshot.) Then you will be given the option to add a gesture or keyboard shortcut. Once you add either or both, any time you use that gesture or keyboard shortcut, the shortcut will run.
I've installed the shortcut and set up a VoiceOver gesture but how do I use it?
Pretty simple: Whenever you want your screen described, make sure screen curtain is off, then use your VO gesture to activate the shortcut. Your phone will take the screenshot, then open the Shortcuts app so you can include a question with the image. Type in your question (if you have one,) then tap done. Then you can return to what you were doing. The description will take somewhere between 10-30 seconds to come back, but you don't have to wait in the Shortcuts app. Just go back to your Youtube video or whatever. Once the description appears (the shortcut should play the Tri-tone notification sound to let you know the description's there,) after which you can feel around the top center of your screen until VO focus has landed on the description field. Once it's there, you can swipe through the description and hit the done button when you're done reading it. At the end of the description, you will be told exactly how much that description cost you, so if you're conscious about money, be sure to read through the end. If enough people want me to move the total cost to the top of the description, I can definitely do that.
I don't like how it talks! Can I change it?
yes, you absolutely can. If you go into your shortcuts app, find the Describe Screenshot shortcut, hit the edit action (using the rotor,) and the first 4 or 5 text fields of the shortcut are all parameters which you can modify to your heart's content. If you specifically want to modify the way it talks just edit the text in the system prompt field. There's a comment box immediately before it that will tell you which one it is.
Dude, you talk a LOT!
I know! I know! I hope this Shortcut is as useful to all of you as it is to me. Please let me know what you think, and if you like it, share it with your friends who might benefit! :)
Comments
Wish Granted!
Support for chatgpt-latest has been added as part of the 2024-09-29 update.
I've been playing around with it myself and it is loads better, but it does cost quite a bit more. I typically pay between 1 and 2 cents. Still much cheaper than the early days though, when things were 2-4 cents. Add a y to the second question's text box to try it out!
Also, Apple fixed the dynamic island bug in iOS 18, so the blind dynamic island user setting is defaulting to off. If you haven't updated to iOS 18 yet, you'll need to re-add the y to that text box if you're on a phone with a dynamic island.
have fun, and let me know what you think!
Greatly appreciated!
Thank you for implementing this so quickly. Will test more later, but works good so far.
do we have any podcast or audio tour for this great short cut
any good podcast for this?
should I have chat gpt account?
do I need to have the chat gpt account?
No podcast yet
I've been meaning to record something for months now but just haven't found the time.
You need an account with OpenAI, yes. Whether you create it through the ChatGPT site or the OpenAI platform site doesn't matter too much. Once you have an account, you can sign into it, put money into the account and generate the API key needed for the shortcuts.
Please see the documentation I wrote for some more specific information about the process.
This is, unfortunately, a slightly involved process, but you only have to do it once before setting up the shortcuts.
Please let me know if you have more questions. I'm happy to help!
Update 2024-10-11
Hi all!
With the new version, AI responses are automatically copied to the clipboard. You can turn this off in one of the set up questions. Additionally, the shortcut now takes prompt caching into account when telling you how much something cost.
If you don't know, prompt caching is something OpenAI introduced last week which makes long conversations significantly cheaper. Unfortunately, it only works with the default model and not the latest one.
The full changelog can be found here.