describe images using local llms and shortcuts to do this

By Diego, 11 May, 2026

Forum
iOS and iPadOS

Hey guys!
I would like to try using the iPhone to play. For this, since using Gemini uses up a lot of tokens quickly, I would like to ask how good local models are for describing images and if it is possible to make a shortcut for this.
The idea would be the following:
I press a button on the controller. The ption change. I make a gesture on the iPhone screen with VoiceOver. silently, it takes a screenshot of the screen, sends it to llm with a specific prompt, speaks and deletes the prompt.
Do you think it would work? Find out which option is in focus, player status, among others.

Options