I’m looking for an OCR app that lets me read printed material more like I used to read when I was sighted. It seems like artificial intelligence should make this possible. Here’s some examples.
Example 1: Receipts. I want to skip past the name and address, date and time, phone and fax numbers, web page URLs, and advertisements. I only want to read the itemized list of my purchases and their prices.
Example 2: Restaurant menus. I want to skip past entire sections like appetizers and entrees, find the sandwich section, read through the titles of each sandwich, then read the full description and price for the two or three sandwiches I’m interested in.
Example 3: Tax forms. As a sighted person, I can use cues like font sizes, boxes, and tables to get a feel for the layout of the form, skip past unimportant sections or instructions, read a few lines of text, and find the one box with the one number I’m looking for.
Example 4: Mailed package labels. When someone delivers a package, I want to glance at the label and see whether it’s for me or my spouse. I should be able to quickly ignore GUIDs and other information that is meaningful only to the shipper.
All of these examples have one thing in common. The printed material displays information organized visually in a way that is obvious if you’re sighted, making it trivial to disregard extraneous text and quickly find pertinent information. And, in all these examples, existing OCR apps completely fail to detect and preserve that organization.
Artificial intelligence should be able to detect and preserve the visual, often hierarchical, organization of printed information and present it to blind users in a well-organized manner that’s easy for us to navigate and comprehend. We should not have to try to zero in on key information manually, as with Seeing AI’s short text feature. Nor should we be forced to wade through line after line of flat text, as with KNFB Reader or Seeing AI’s document feature. Don’t get me wrong. I’m not complaining. These OCR apps are great. I just expect that artificial intelligence should do a much better job at processing and organizing scanned text before it presents it to blind users.
If there’s a smarter OCR app out there that does what I want, please make my day and tell me about it.
Comments
OrCam
This is not an app but if you could get the OrCam to work well I believe it does something like this. where you can do a voice command for example find the phone number on something you are looking at. Have not kept up with the OrCam but maybe it has gotten better at this or worse. could not say. Very pricy device though. Hopefully something like this could come to an app if it is not around already. With the OCR of IOS 15 native camera and if apple works on an a AI maybe some day you can just ask siri the questions you posted above but that is to many if's.
Work Around
Hi Paul,
The best I've come up with is converting the results to a text document, opening it it text edit, pages, or whatever, and conducting a command f search for the pertinent information, e.g., a dollar sign if you're looking at a receipt, or your last name if you're looking at a mailing label, etc.
Hope this helps.
Bruce
Thanks
The Orcam has always sounded promising and seemed to perform very well in their marketing videos. Bruce, your workaround is pretty much what I do, if I have a keyboard handy. Still, both these suggestions aren't quite what I'm looking for. I'd like an OCR app that preserves the organizational structure of whatever text it scans and presents it with that structure, so I can skip over text I deem extraneous and focus in on text I deem relevant. Isn't this how the human visual system works?
The history behind my post is that I've lost my remaining vision over the past few years, and I've been comparing how I functioned with some eyesight to how I now function with no eyesight. One huge difference I've noticed is that my current OCR tools don't allow me to parse text anything like I used to when I was sighted. My OCR tools present flat text but fail to preserve even obvious organizational elements like sections with headings, tables, or text boxes. This all seems totally possible.
If a tool like this exists, what is it? And if it doesn't, why not?
Oh god, this is the holy…
Oh god, this is the holy grail, isn't it? Skim reading, looking at something in a quick glance and taking in all of it, rather than having to get mired in all the details of stuff you don't care about. I've never been able to see, but I've spent a lifetime wishing I could do that, ☺️.
I think we'll get there one day, but I honestly have no idea when. I mean heck, we can read handwriting these days with OCR as long as it's not too insane and I never thought we'd get to that. I don't know why formatting is so much harder than text. I mean, I don't seem to be able to find an app that can even scan an image of a calendar and get anything useful. Why you can't scan something and get something closer to a web documents with headers and tables and stuff I don't know.
As for AI, well maybe. I doubt it's as easy as you'd think, and it's the edge cases that I reckon would kill you every time. I mean it has to know *what* the document is as well as all the elements in it before it can give you information like that. Easy to do in a demo where you know all the types of documents you'll get, and harder in the real world where you have to try to scan a document and get a great scan, which for totally blind people is tricky, and also know it's a bill or a receipt or a menu or whatever.
Anyway, with all that said, one thing I've been experimenting with is Seeing AIs explore by touch feature. For example, I was able to read the picture of the calendar I mentioned by sliding my hand around the screen of my iPad. I could see the columns and the layout with some practice. Particularly in the case of a document that fits on one screen, this might be worth playing with, particularly on something like an iPad.
Working spacially like this isn't easy, and for me who's been totally blind for more than 40 years I think it's taken some kind of brain reorganisation from using an iPad all day every day for 5 years, but I'm finding myself thinking more about where stuff is, so for example I can touch on or very near the thing I want on my screen.
Agree with Yvonne but I’m confident we’ll get there
This really would be the holy grail and I’m sure it will happen at some point, if a human brain can understand structure, meaning and formatthen it seems likely machine learning will be able to at some point. For the time being I get the best results by far with voicedream scanner. It seems to have at least rudimentary understanding of layout and formatting, nothing like we are all looking for but the most advanced I’ve found so far. Along the same line of thinking, how long until a phone can do what a guide dog does? Self driving cars are leading the charge on machines that are aware of the world around them and using that information to navigate it. Seems like a logical next step the same awareness could be loaded into a phone. I don’t think my phone will be as good at cuddles on the sofa or play fighting though.
my thoughts
@Andy Lane one thing your not thinking of regarding guide dogs if you had an open manhole cover would your phone tell you there is one there? Would it tell you what way to go around it or har far to go in one direction? Will your phone let you know a car is coming at you from the side when crossing a street? Will your phone help you walk down the sidewalk in a straight line? My point in asking all these questions is to get people thinking. Your phone can help you but it won't replace a guide dog.
interestingly, they're…
interestingly, they're having similar problems with self dri'ving cars that we've been talking about, edge cases. At least for now, that's how AI is. It gets like 95 percent of the way and run into things that require real judgement and creativity.
Not really concerned about edge cases
This is not the same as a self-driving car that must make split-second life or death decisions autonomously. It just needs to figure out that I've got a frozen dinner on the counter, identify the cooking directions, nutrition facts, and the marketing hype, and then give me a choice of which I want to read.
There are a ton of image recognition apps that come up with probabilities for what they see. If my hypothetical dream OCR app thinks there's an equal probability that my receipt is a piece of junk mail, I'll tell it the correct answer, and it can learn over time. I won't fall into an open manhole if it's temporarily confused.
We know OCR apps can differentiate between fonts. We know they can identify columns and text inset boxes. iOS already analyzes text to identify names, addresses, dates and times, phone numbers, links, email addresses, and more. All the capabilities are there. We just need a developer with enough creativity and imagination to provide a better interface than flat text.
Approach Developers With This
Why not approach Microsoft, whoever develops the Voice Dream apps, etc. with these thoughts and suggestions?
and yes, I've been wanting this sort of reading convenience my whole life as well. Imagine all the time and brain power we'd save, if we wern't always doing extra listening, sorting, orienting...
Optacon as a solution
I still have the Optacon I got in 1975, and it can do some of the things you are looking for. The Optacon consists of a unit about the size of a cassette recorder and a little camera attached by a cable. You run the camera across a line of printed text, with one hand and the print letters are produced by an array of vibrating pins that you read with your finger on the other hand. I can easily zeroe in on the middle of an address label, read a receit and read directions on some packages.
Sadly, Optacons haven't been produced for quite a while. There was talk in the 80s of trying to shrink it down so you could slide the camera and read with the same hand. It should be possible these days if only someone would be willing to work on it.
Again, seeing ai has let me…
Again, seeing ai has let me do some stuff like that with explore image. Sort of like an electronic Optacon.
the 80% rule
Bearing in mind that this sort of AI is pretty sophisticated, the fact that we probably could have it right now but don't comes down IMO to the 80% rule familiar to folks working in web accessibility. The top 20% of pages get 80% of traffic, so concentrate on those; and develop for the 80% of users, because pissing off the other 20% (like the disabled or people not using Chrome...) makes more economic sense than investing in the last 20%. These sorts of AI applications are low return on investment compared to face recognition and the like. I'm really surprised by how far image recognition in Chrome and IOS have come, but I think those are piggybacking on things like Google Images technology.
That being said, examples 1-3 are happening now, because there's mainstream appeal in web-based restaurant menus and tax forms (just did mine). Most receipts get emailed now, and I can only hope that comes to a grocery store near me/you very soon. Package labels would be awesome. I suspect a bar code used by the postal service/ups will be how that gets done.
Not saying way better AI for OCR wouldn't be great or that it won't happen. There just needs to be a business model for it. Disabled-only applications don't get very far or very fast. Hell, look what happened to Nearby Explorer and KNFBReader, the two most expensive apps on my phone! Sadly, a common way that blind people's needs are addressed in the face of these head winds, as we all know, is through a plaintiff law firm.
Boy, I seem to be in a cynical mood today. Probably because I owe on my taxes this year.
Re: 80% Rule
I get what you're saying, Voracious P, and you're not wrong. Several companies were going to be sued, and that's why accessibility was built into certain products. Have a good one.
I don't know that it's an…
I don't know that it's an entirely *bad* thing, really. Things specially designed for disabled people are often hidiously expensive or don't get created in the first place.
On the other hand, if you can figure out a way mainstream people can use it, it's everywhere, it's cheap, and your aunt who's just lost her sight can probably find it.
Want ebooks? or Audio books? When they're just for blind people the publishers scream murder about having to provide them and most of what's available is 10 ears old or more or not what you want to read. Sighted people want to use them and if you can afford it you can read practically whatever you want.
Want decent speech technology? Sighted people suddenly want their computers to talk to them so it's everywhere and decent quality.
Heck, even OCR is in that category. I can still remember when a scanner cost as much as a car, ☺️.
OmniPage
OmniPage Pro is available for Mac. It seems to do a better job of getting the reading order right than other software packages, though you might experiment with trial versions of both it and Abbyy FineReader. I personally use Kurzweil 1000 on Windows because it has tools that let me easily change a page's reading order and it has ranked spelling, which works by frequency of occurrance rather than chronology. But I also use OmniPage and just be sure to get the Pro version which is more expensive because it has the better document analysis.
OCR Apps
Hi Paul, Be my AI sounds like what you are looking for. If you have access to it, you could try taking a photograph of a document and then asking the AI for certain information.
I am yet to try this with restaurant menus and receipts, but I have done this with food packaging and it has worked well. The other day, I took a photograph of some food out of my freezer. I then asked the AI for the cooking instructions. It proceeded to give me the cooking instructions apart from those that pertain to convection cooking. I then asked for the instructions for convection cooking and it told me that I need to send another photograph of the document as it couldn’t quite see the information. I was then able to take a better quality photograph by moving the product slightly closer to my phone, and the AI proceeded to give me the information I had asked for. It was extremely easy to do. Bear in mind, I have no useful Sight and I have been blind since birth. I am really impressed with the results I’m getting from the Be My AI product and I feel that the developers of Be My Eyes have done a very good job. It sounds exactly like what you are looking for.
Be My Eyes is exactly what you need
I fed it a picture that I found in a news article. What it answered in return follows below.
The picture is a table with information about rainfall measurements in millimeters. The table is divided into two sections, one for the rainfall measurements during the period of "IANOS" (17/09-20/09/2020) and the other for "DANIEL" (04/09-07/09/2023). The table has a blue and white color scheme.
On the left side under "IANOS", there are eight locations with their respective rainfall measurements in millimeters. The highest is "Petroúli" with 317 mm, followed by "Mouzaki" with 274 mm, "Ithaki" with 227 mm, "Mpiriki Karpenisiou" with 224 mm, "Karitsa" with 213 mm, "Karpenisi" with 187 mm, "Valia Kalda Fthiotidas" with 183 mm, and "Askipiou Xanion" with 164 mm.
On the right side under "DANIEL", there are also eight locations with their respective rainfall measurements in millimeters. The highest is "Zavorda Pinioy" with 910 mm, followed by "Portaria Pinioy" with 885 mm, "An
You'll have to sign up for the beta for be my eyes.
I'm blind so am used to the way we do things but being able to skim read might be useful, it's not something I really need.
BeMyEyes can allow you to take a picture and tell you what it is then you should be able to ask questions about the picture, what's the number and so on.
Remember it can get things wrong though. I do think AI is the future for us and that we'll have something amazing one day, it's not exactly there yet but hopefully one day we'll get something like a live AI feed that doesn't involve a huge laptop and battery thing.
It’s kind of amazing to read the comments above.
Only a year and a bit ago we were thinking about an application with the capabilities of Be My AI. No it’s not there completely but it’s just a baby at the moment. It has years of growing ahead of it and I’m going to bet the Be My AI we have in 3 years will do significantly more than the baby version we currently have. It’s an exciting time.
it really is.
I'd not be surprised if in the future blind people would be able to see using AI.
It's a bit out there for now but give it another 20 or so years and who knows.
Be my AI
I've been betaing for Be My AI for couple of months now; while it's path-breaking, as far as skim-reading is concerned, what I've come to realise is that we need something more, let's say, physical. like, for somebody with sight, its looking at the print, zeroing-in on the required info and then comprehending it. Asking an AI tool to do that is useful, but not exactly there. What we need is something like optacon, but with AI built into it to make it's working much smoother which, I guess, is entirely possible at this point if someone were ready to invest time and money on it.And anyway, Be My AI doesn't exactly have OCR capabilities at the moment, in that as of now, it just compares and contrasts the image, which is not exactly performing an OCR, which the app itself has told me several times by now. And btw, the age when AI actually let's a blind person 'see' may not be as far as 20 years away...
Totally digging Be My Eyes
First, let me say that I agree. Be My Eyes was what I was looking for back when I originally posted this. I was granted beta access only recently, but have already found it incredibly useful. Some flaws, too, but the future looks bright.
This technology has so much potential beyond just smarter OCR. I was at a conference this past weekend, and I hired a professional sighted guide to assist me. With the recent advances in AI, we joked a bit about how my sighted guide might be replaced by AI in the future. Ha ha. But honestly, I could see AI providing much of the same services. An AI with access to a conference floor plan ought to be able to guide me from one meeting room to the next, and even identify people I know as I pass them in hallways.
The future is coming. Imagine an AI replacement for SIRI to help deal with all those inaccessible websites. We are living in amazing times, people.
I agree.
AN AI to guide the blind should be doable in a couple years.
Imagine being able to leave your cane at home one day and not need it cause of the AI, that would be scary but amazing!
What about 10 years from now, there could be AI eyes out there, if there are I'd be so interested in those. Imagine being able to play any videogame or hell, get a job. These things that sounded like dreams in the 90s are becoming a reality now.
Identifying people
theoretically, the identifying people bit can be worked out even at this moment. As of now, the Be My AI hides the faces of people for privasy reasons but initially, it was not like that. It used to describe faces, and even used to describe emotions and make subjective judgements like beautiful etc. Now imagine one being able to personalize the app and tie one's Be My AI account to one's, say, social media accounts such that the database has access to the metadata available therein. now if you send the app a picture containing any of the people, say, in your followers list, it should theoretically be able to recognize that the picture has got that person in it. From that point, AI being used in a similar way to describe live video stremes is just a matter of increasing internet speeds, which will anyway happen...
Could point and speak be a possible solution?
One of the problems with existing OCR apps as discussed here is that we cannot easily zero-in on a piece of text, say, an address in the middle of a block of text. Now, if my camera could track my finger as is done in the new point to speak feature, it could certainly track my finger moving across the text, making it possible to read along...