As a App developer, I’m curious to know how much the AppleVis community is utilizing VoiceOver Recognition. It can be enabled in Settings > Accessibility > VoiceOver > VoiceOver Recognition, where you can turn on specific recognition features.
This feature enhances VoiceOver by detecting text in images or providing custom accessibility labels for elements that lack proper accessibility properties.
However, turning on Recognition, sometimes introduce new issues like announcing a label twice. Have you faced similar issues? if so kindly share your experience.
Comments
Saying it twice
It's not too much of an issue for me if it describes a properly labeled control because it comes after VO has spoken the label, and I can ignore it, or I don't even hear it because I've already swiped past it. However, I don't like the thought that recognition would be used by developers as an excuse to not label things for accessibility.
Thanks OldBear!
Yeah, I totally agree with OldBear. As developers, it’s our responsibility to provide meaningful descriptions for elements. However, we can’t prevent VoiceOver Recognition from announcing the label again, since it detects and reads content on the screen independently of its context.
I’m curious—how much of an issue is this for you? Does hearing the same label twice feel disruptive, or have you gotten used to it?
No.
Never use it.
Thanks Holger Fiallo!
By default, the Text Recognition feature is enabled, allowing VoiceOver to detect and read text within images. Are you currently using this feature?
I love it, it’s great for…
I love it, it’s great for images on Reddit and similar things like screenshots, it’s OK for inaccessible apps like what I use for my meal plan, and it’s overall really useful, I’d say I use it at least once a day, mostly for images, sometimes for apps.
Every day
I just keep this feature on. If I want to know what a photo is without putting it thru a third party app it will just do it.
Praveen
It does not do a good job with pictures. Do not know why? JAWS does a great job with describing pictures.
I use screen recognition if the app or website is porly coded
It is never a fix for improper coding. the app should always be coded following apple's standards.
A Question for Praveen
Hi Praveen,
What is it you are looking to accomplish? I feel like if we have a better idea of what your question is, we would be able to provide you a more informed answer. Especially if you are developing apps, it would be great to be able to answer your underlying question directly. These recognition features are tools and nuanced, and they all have their place, but these are all supplementary to good accessible design. A VoiceOver-generated image description will never take the place of a properly tagged image in your app, Screen Recognition will never (and should never) take the place of an accessible UI, etc. etc. etc.
I agree with Michael
I agree with Michael. We all would love to help you. Below is the link to making your app accessible with Voiceover. I am of course willing to beta test.
https://developer.apple.com/documentation/accessibility/supporting_voiceover_in_your_app
I use it but...
It's there as a tool to improve apps that might be inaccessible, sometimes it works and sometimes it doesn't.
Just because it exists; does not mean it's going to make your app work with voiceover, you are going to have to put in the work if you want us to use your app/apps.
Answer to Michael Hansen
Thanks, Michael. I’m experiencing issues with Text Recognition—despite setting proper accessibility properties for an image, the feature still detects text overlaid on image. According to an Apple Framework engineer, this is a bug, as it shouldn’t recognize independent text overlaid on an image. I was curious to know how many users actively use this feature and what challenges they face when it’s enabled.
text overlaid on image
I'm having a difficult time understanding why independent "text overlaid on an image" should not be recognized, or why I would not want it to be recognized when I have recognition turned on. If this is something like Facebook attempting to generate alt text descriptions for images in posts, I very much welcome the additional descriptions of pictures and text to possibly fill in information that Facebook's inferior AI has missed. It doesn't bother me at all having two descriptions of an image or text, but if it did, I could turn recognition off for that app, or swipe away from the image before the Apple recognition kicks in.
* I read a little about what overlaid text might mean. If this is something like the HTML CSS effect with a text holding container visually placed on top of an image holding container by way of code, then I can see how this might be a bug. One case that might be a problem for me is if the image takes up the whole screen and all the text on the screen keeps triggering the recognition during swiping between elements. I've never come across that, but I would turn off recognition if it happened. Otherwise, it wouldn't bother me if text was repeated now and then due to recognition. I'm already used to it because of things like the auto generated alt text for images on certain sites.
Voice Over Recognition
I use the voiceover recognition features especially image recognition. I also use screen recognition to read facebook images that are inaccessible with VO. As far as your issue with text overlay on an image; VO should be able to read that text without enabling VO recognition features like screen recognition.
OldBear
I honestly believe you just described somebody’s worst nightmare!
Forget all this technology stuff, I’m going back to my abacus. 😫
Reply to OldBear
Yeah, this is more of a dev-side thing. Like, every New Year we usually show some kind of offer or discount in the app using a nice promotional image. Since the offer changes every year, we don’t hardcode any text inside the image. Designers just give us a clean decorative image, and we overlay the actual offer text programmatically.
Now the issue is—iOS ends up detecting that overlaid text as part of the image(as per iOS it should consider it as separate text ). So VoiceOver ends up reading both the proper accessibility label and the detected text, which feels kind of repetitive. Just wanted to share the context in case anyone else has run into something similar.
Per app basis
For the most part I find these features very useful. I’d only turn on full screen recognition where an app is inaccessible or has an inaccessible screen, to get me out of a jam.
Image recognition is really useful to get a quick idea of what’s in a picture, and text recognition has its place too.
But it certainly can be too much in some places. For example the app Callsheet. The image descriptions on every tile became extremely irritating to me, and not really useful. Simple solution, using the VoiceOver quick settings or the rotor, simply switch off image descriptions for that app.
Dave
Yes and No
I have used the various features of VoiceOver Recognition on my phone with mixed results, and would appreciate an audio walkthrough. One of the apps that I've found to work particularly well on iOS with VoiceOver Recognition is eSpeak NG. However, it appears that these features are not needed anymore for that app. The Mac is a different story though, and I'm hoping Apple can make all these features available on that platform. The dedicated app for my hearing aids remained unchanged even with recognition turned on. But there seems to have been an update to that app this week, and I'm hoping my father can check that out this weekend.
That Praveen?
> What is it you are looking to accomplish? I feel like if we have a better idea of what your question is, we would be able to provide you a more informed answer. Especially if you are developing apps, it would be great to be able to answer your underlying question directly. These recognition features are tools and nuanced, and they all have their place, but these are all supplementary to good accessible design. A VoiceOver-generated image description will never take the place of a properly tagged image in your app, Screen Recognition will never (and should never) take the place of an accessible UI, etc. etc. etc.
This is probably a huge coincidence, I've never heard the name Praveen before, and the project manager at the company I'm currently working for has the same name, so it would be amusing if they turned out to be the same person. If that's the case then I can understand the secrecy, as our product is yet to launch, otherwise it's just a really interesting coincidence.
That Praveen?
This is probably a huge coincidence, I've never heard the name Praveen before, and the project manager at the company I'm currently working for has the same name, so it would be amusing if they turned out to be the same person. If that's the case then I can understand the secrecy, as our product is yet to launch, otherwise it's just a really interesting coincidence.
This is actually something that I have experienced myself as a developer, and it relates to the fact that, at least on iOS, VoiceOver attempts to read whatever graphical content happens to be displayed inside the bounding box of the screen-reader's cursor. So, for example, if you make a view slightly transparent, users with Screen Recognition enable will have whatever happens to be rendered behind that view announced in addition to the actual accessibility content provided by the view. Personally I agree with the framework engineer who said this shouldn't be happening, as VoiceOver should be tapping into the content of the accessibility element that has the accessibility focus, however since Apple accessibility in general is completely orthogonal to their compositor and window server engine, I guess that this would not be trivial for them to implement correctly.
Had to edit my post because somehow I submitted it accidentally while scrolling back to quote Praveen.