Hi all!
One day I realized that getting screenshots descriptions required too many steps: take a screenshot, have time to click the "Share" button, select ChatGPT, write a prompt, wait for a response - it seems that there are too many steps to just look at a meme received from a friend in the messenger :) There have long been addons for Windows and NVDA that allow you to perform the same task with the click of a button. If I understand correctly, the problem with iOS is that a third-party application simply does not have the ability to receive screen content, show windows on top of the current application, and the like, which is why all similar solutions involve sending the original image through the "Share" dialog. Therefore, we are limited to the capabilities provided within iOS Shortcuts app.
Acknowledgements:
@aaron ramirez for excellent Shortcut, familiarization with which allowed me to understand that the capabilities of iOS Shortcuts are generally sufficient to solve such a problem;
@alekssamos for great NVDA addon, which gave me the idea of using the free api, as well as adding some additional features, like text recognition.
Quick Start
For those who are not interested in reading long manuals, installation instructions and feature descriptions, here is a link to the Shortcut itself, the initial setup of which should be intuitive. Simply assign a VoiceOver command for executing this shortcut, after which the Shortcut will take a screenshot and offer a menu with a choice of available options.
Current functionality
Currently the Shortcut uses Piccy Bot API to get free descriptions, but it is implemented in such a way that adding a new model whose API is ChatGPT compatible is not too difficult. The functionality supported by this Shortcut includes:
Getting image descriptions using a given model;
Optical text recognition from images using the OCR-Engine built into iOS;
Additional questions about the image whose description the model generates;
Using a screenshot as an input image or the ability to send your own image through the "Share" dialog;
Displaying lists, tables and other formatting elements in the generated image description;
Copying the last answer to the clipboard;
Optional sound notification upon completion of description generation;
The experimental feature for automatically turning the VoiceOver off while taking a screenshot, so you don't have to worry about the current Screen Curtain state (requires some additional actions);
The ability to get answers in any language supported by the model (due to technical limitations of Shortcuts, the language itself must be specified manually).
Setting things up
Install Shortcut by following the link.
A dialog will appear asking you to set some settings:
Play sound: Determines whether to play a sound after description generation is complete, enter 'y' or 'n';
Disable VoiceOver: Experimental feature for disabling the VoiceOver before taking a screenshot and enabling it back after, increases the delay in the appearance of the options menu by about a second, enter 'y' or 'n';
Note: this only works if you run Shortcut through a separate Shortcut-Executor.
Description prompt: prompt that will be sent to the model to get an image description , enter '/default' to use a preset prompt or your own prompt;
Language: the language in which the model will generate responses, enter the full name of the language, for example, 'English'.
If you want to use the feature for temporarily disable VoiceOver, install this Shortcut as well.
Assign VoiceOver command for executing this Shortcut:
Go to the Settings --> Accessibility --> VoiceOver --> Commands --> All Commands --> Shortcuts;
Depending on whether you have enabled the option to temporarily turn VoiceOver off, select CloudVision or CloudVisionExecutor.
Note: If you choose CloudVisionExecutor, after each image description, the Shortcuts app will launch on top of the current app. This is a technical limitation of the method used to temporarily turn VoiceOver off; if this is inconvenient for you, use the default Shortcut, first turning off the screen curtain manually.
- Assign the gesture or the keyboard command for this Shortcut.
Usage instructions
Perform a VoiceOver Command or select CloudVision in the "Share" dialog to get an image description.
A menu will appear with the following options:
Describe image: get an image description using the model selected in the Shortcut settings;
Recognize text: Recognize text from an image using the OCR-Engine built into the system;
Cancel: quit.
After selecting one of the options, it will take some time to receive a response (in the case of description generation, this can be about ten seconds, while text recognition occurs almost instantly).
The results of the image analysis will appear in a separate window. After familiarizing yourself with which, you can close the window using the button located in the upper right corner.
After viewing the generated description, a menu will appear with the following options:
Chat with model: ask an additional question about the image being analyzed;
Copy last answer: Copy the model's last answer to the clipboard;
Cancel: quit.
After selecting the "Chat with model" option, a dialog will appear asking you to enter your question.
Similar to the original description, the generated response will appear in a separate window after a while.
After viewing the received response, you can continue asking follow-up questions or end your interaction with Shortcut.
Adding your own models
The following instructions involve extensive interaction with the Shortcut implementation. If necessary, detailed instructions for creating Shortcuts can be found here.
Open the Shortcuts app, find CloudVision Shortcut and, using the VoiceOver rotor, select an "Edit" action.
Find the
description_modelvariable, in the text field located right before it, enter the human-readable name of the model you are adding.Find the
description_model_api_keyvariable, in the text field located right before it, enter your API key (if necessary).Find the
description_modelsvariable, using the "Add New Item" button located right before it, create an entry for your model, selectingDictionaryas the value type.For the
key, enter the model name specified in step 2.Click on a value to go to the
dictionaryediting screen.Create the following entries with parameters for your model:
Required, type:
text,Key: 'url',value: URL to which requests will be executed, for example 'https://api.openai.com/v1/chat/completions';Optional, type:
text,key: 'user_agent',value:User-Agent, with which requests to the model will be sent, if omitted, the default value is 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36';Required, type:
text,key: 'model_name',value: The value of themodelfield in the request;Optional, type:
text,key: 'request_messages_key',value: The key by which the request contains an array of the messages, if omitted, the default value is 'messages';Optional, type:
dictionary,key: 'additional_parameters',value:dictionarywhose elements will be directly added to the request, can be used to specify parameters such as 'max_tokens' or 'temperature', if omitted, the default value is emptydictionary, i.e. no additional parameters will be added to the request;Optional, type:
text,key: 'response_messages_key',value: The key (or path) where the response contains the text of the received answer, if omitted, the default value is 'choices.1.message.content'.
Note: If you want to omit any of the fields marked as optional, do not create an entry in the
dictionarycorresponding to that field.After filling in all the specified fields, you can complete editing the Shortcut.
To switch between already added models, simply assign the
description_modelvariable a value that matches the correspondingkeyin thedescription_modelsdictionary.
After all
I hope you find this Shortcut useful. You can leave any questions, bugs and suggestions in the comments below.
Comments
Siri and Chat GPT
For those with device supporting Apple intelligence, you can simply ask Siri about what is on the screen. Siri will then ask if you are happy for it to send a screenshot to ChatGPT. You can then ask questions. As usual, any responses from ChatGPT will have a copy button included. The response is always very quick. No need to save a screenshot or remember gestures. Just ask Siri.
Re: Siri and Chat GPT
Unfortunately, Apple Intelligence is not available in all regions.
Also, I personally prefer the classic interface to voice assistants. And, in practice, asking Siri takes longer: the VoiceOver gesture is performed almost instantly, and then everything happens automatically - there is no need to save the screenshot somewhere, et cetera, et cetera.
Error
So I downloaded and installed the shortcut. Assigned a VoiceOver gesture to it. For me, I chose a triple back tap gesture. When I activate it, I get the following error:
Error: Description model should be specified
Re: error
Unfortunately, there was a bug in the original version that prevented the model name from being given a default value. The updated Shortcut can be installed from this link.
I also updated the original post, at the same time slightly simplifying the instructions for adding new models, making some fields optional.
PS. Apple has quite specific behavior for Shortcuts when imported onto other devices, for example, all top-level text fields are cleared. Unfortunately, I cannot test the correct behavior of Shortcut when imported to another device. Therefore, I will be glad for pointing out any errors that you find during testing.
It still has the same error.
I installed the new shortcut you added by accessing the new link, and it still shows the same error saying that the model must be specified.
Re: It still has the same error.
I've updated the Shortcut again, sorry for the inconvenience. New link
Original post also updated.
It seams like Import Questions has binded to the incorrect fields, should be fixed now.
The shortcut is working, but only in English.
When I leave the shortcut configured in English, it works normally. However, when I configure the shortcut and set the language to Portuguese, it stops working.
I am Brazilian and I would really like the shortcut to be created and to work in Portuguese. I have already tried setting Portuguese in all possible ways: with the first letter uppercase and lowercase, with and without accents, but the result is always the same.
Whenever I set the shortcut language to Portuguese and try to use it, an error appears saying that the language was not specified.
Re: The shortcut is working, but only in English
I just tried specifying 'Portuguese' as the language and everything seems to work fine -- Shortcut provides the description in the specified language.
Please note that the language must be specified exactly as written (Portuguese) without any accented letters or similar. Shortcuts tries to find the language code in a `dictionary` which keys are language names, so it should match exactly.
Or we can wait like a couple months
Or, we could just wait a couple months and then we’ll be able to ask Siri directly what’s on the screen
Re: Or we can wait like a couple months
As I already stated, there are two main issues:
1) Apple Intelligence is only available in a fairly limited number of regions.
2) Personally, and possibly many others, I don’t really like voice assistants as a way to interact with a smartphone.
The ideal solution would be to integrate image descriptions directly into VO, and considering that VO is already able to successfully read text from images and describe what is happening (the Surrounding Recognition function), it seems that the technical part has already been implemented; it is enough to make it possible to receive descriptions not only from the device’s camera, but also directly from the screen. However, since VoiceOver does not provide an API for writing custom scripts, we are limited to the capabilities provided by Shortcuts app.
Changing a setting
Hi,
Never really used shortcuts so I maybe being thick here. I want to change VO to off as suggested so that screen curtain doesn't need to be turned off first. I didn't pick Y when setting up and I can't figure out how to change this to yes instead of no. Any help appreciated.
The shortcut is working, but it fails when disabling and re-enab
The shortcut is working, and the issue was the word “Portuguese,” which I had written incorrectly. However, when I set the shortcut to turn VoiceOver off and then turn it back on, it only turns VoiceOver off and does not activate it again.
Re: Changing a setting
You can reconfigure any setting at any time by simply going to the Shortcuts app, finding the CloudVision shortcut, selecting the Edit action in VO Rotor, clicking the Info button at the bottom of the screen, then Import Questions, and finally using the Setup Shortcut button at the bottom of the screen.
Re: The shortcut is working
That's exactly why I called this feature experimental and disabled it by default. The problem is that the screen curtain must turn itself off while the screenshot is being taken, and it does this if you use physical buttons to take a screenshot, but in the case of a programmatic call to a similar function in the Shortcuts application, the state of the screen curtain is not taken into account in any way. Trying to temporarily disable VoiceOver is essentially a dirty hack. Initially, I tried to do this without any delay at all, but it turned out that by the time VoiceOver tried to turn back on, it had not yet been completely turned off, as a result, the system simply ignored the action to turn it on. I tried adding an one second delay before VoiceOver turned on, which worked fine on my device, but it seems to vary by model. Of course, I could update Shortcut to increase the delay to, say, two seconds, but that seems to negate any benefit of automatically turning off VoiceOver -- performing a gesture to turn off the screen curtain is definitely faster than waiting for the system to allow VoiceOver to re-enable.
This is cool!
It's working here. I kept the default setting because changing it caused Voiceover to turn off but not back on. I'm enjoying using this shortcut!
Update
The main issue with temporarily turning VoiceOver off feature was that, when invoked by VoiceOver, Shortcut would stop working immediately after VoiceOver itself was disabled. You can get around this by adding an intermediate Shortcut whose only job is to run the main one. However, this approach, due to the need to execute two Shortcuts instead of one, increases the delay in the appearance of the initial menu, but given that you already have to wait for VoiceOver to restart, this may be acceptable. Link to the intermediate Shortcut, just install it next to the main one and reassign the VoiceOver command to run CloudVisionExecutor instead of CloudVision.
The original post has also been updated. Again, I highly recommend using the old Shortcut unless you need to temporarily disable VoiceOver, because while the additional Shortcut will still work, it will have a delay in showing the initial menu.
Everything is working.
Everything is working, including using the shortcut to turn VoiceOver off, turn it back on, and run the main shortcut. The only thing that is happening is that when I run the VoiceOver command, it opens the Shortcuts app, but that is not a problem for me. Congratulations on the shortcut — great work!
Re: Everything is working.
Thank you for your feedback.
It seems that this is an inevitable consequence of using a chain of two Shortcuts; the Shortcuts app in this case runs the main command instead of VoiceOver, so that disabling the latter does not interrupt the execution of the Shortcut. Apple doesn't provide an API for managing active apps within Shortcut, so we can't automatically switch to a previously opened app. Personally, this method still seems less convenient to me than simply turning off the screen curtain before receiving an image description, however, perhaps for some such compromises will be acceptable.
Going back to the old shortcut.
I’m going back to the old shortcut because when I run the Cloud Vision Executor, it opens the Shortcuts app window and takes a screenshot with the Shortcuts app open, instead of the image I want to describe. The only issue with going back to the old shortcut is that I often forget to turn Screen Curtain back on after the screenshot is taken, but there is no other way.