Something Different is Coming
A Progressive Web App for Blind and Visually Impaired Users | Works on All Smartphones
I need to tell you about something I built. Not because it's "the best" or "revolutionary" – everyone says that. But because it works in a way that genuinely surprised me when I tested it.
The Problem I Kept Running Into
You know the drill with most vision AI apps:
Point your camera → AI speaks a sentence → that's it.
"It's a living room with a couch and a table."
Cool. But where's the couch exactly? What color? How far? What else is there? Can you tell me about that corner again?
You have to point again. Ask again. Wait again. Listen again.
You're always asking. The AI is always deciding what matters. You never get to just... explore.
What If Photos Worked Like Books?
Stay with me here.
When someone reads you a book, you can say "wait, go back." You can ask them to re-read that paragraph. You can spend five minutes on one page if you want. You control the pace of information.
But photos? Someone gives you one description and that's it. Take it or leave it. They decided what's important. They decided what to mention. They decided when you're done.
We thought: What if photos worked like books?
What if you could explore them at your own pace? Go back to parts that interest you? Discover details the other person missed? Spend as long as you want?
The 6×6 Grid: Your Photo, Your Exploration
Here's what we built:
Upload any photo. Any photo at all.
The AI divides it into 36 zones – a 6×6 grid covering every inch of the image.
Now drag your finger across your phone screen like you're reading a tactile graphic.
What This Actually Feels Like:
You're exploring a photo of your living room:
Start in the top-left corner – drag your finger there:
"Smooth cream-colored wall with matte finish, cool to imagine touching, painted evenly"
Slide your finger right:
"Large window with soft natural light streaming through, sheer white curtains that would feel delicate and silky between your fingers"
Down a bit:
"Polished oak coffee table, glossy surface that would feel smooth and slightly cool, rich honey-brown color"
To the left:
"Plush beige carpet, deep pile that looks like it would feel soft and springy underfoot, slightly worn in the center from foot traffic"
Wait, go back to that window – drag back up:
"Large window with soft natural light streaming through, sheer white curtains..."
You're in control. You decide what to explore. You decide how long to spend. You decide what matters.
Go to the bottom-right corner – what's there?
"Wooden bookshelf against the wall, dark walnut finish with visible grain, would feel smooth with slight ridges"
Move to the zone right above it:
"Books lined up on shelf, various colored spines, some leather-bound that would feel textured and aged"
This Changes Everything
You're not being told about the photo.
You're exploring it.
You can go back to that window five times if you want. You can ignore the couch and focus on the corner. You can trace the room's perimeter. You can jump around randomly.
It's your photo. You explore it your way.
And here's the thing: the information doesn't disappear. It's not one-and-done. It stays there, explorable, for as long as you want.
Now Take That Same Idea and Put It in Physical Space
You walk into a hotel room at midnight. You're exhausted. Strange space. No idea where anything is.
Usually? You either stumble around carefully, or ask someone to walk you through, or just... deal with it till morning.
New option:
Point your camera. Capture one frame. The AI maps it into a 4×4 grid.
Now drag your finger across your screen:
• Top-left: "Window ahead 9 feet with heavy curtains"
• Slide right: "Clear wall space"
• Keep going: "Closet with sliding doors 8 feet on the right"
• Bottom-left: "Clear floor space"
• Center-bottom: "Bed directly ahead 5 feet, queen size"
• Bottom-right: "Nightstand right side 4 feet with lamp and alarm clock"
You just mapped the entire room in 30 seconds. Without taking a step. Without asking someone. Without turning on any lights.
Want to know what's on the left side again? Drag your finger back over there. Want to double-check the right? Drag there.
The information stays right there on your screen. You can reference it. You can re-explore it. You can take your time understanding the space.
The Core Difference
Most apps: Point → Wait → AI decides what to tell you → Move on → Repeat
This app: Explore → Control the pace → Discover what matters to YOU → Information persists → Return anytime
That's not a small difference. That's a fundamentally different interaction model.
You're Not a Passive Receiver
You're an active explorer.
You don't wait for the AI to decide what's important in a photo. You decide which zone to explore.
You don't lose the room layout the moment it's spoken. It stays mapped on your screen.
You don't get one chance to understand. You can explore as long as you want, go back, re-check.
This is what "accessible" should actually mean: Not just access to information, but control over how you receive and interact with it.
I have big plans for this feature to expand it as well.
Oh Right, It Also Does All The Normal Stuff
Because yeah, sometimes you just need quick answers.
Live Camera Scanning
Point anywhere, AI describes continuously:
• Quiet Mode: Only speaks for important stuff (people, obstacles, hazards)
• Detailed Mode: Rich ongoing descriptions
• Scans every 2-4 seconds
• Remembers what it already said (no repetition)
Voice Questions - Just Ask
No buttons. Just speak:
• "What am I holding?"
• "What color is this shirt?"
• "Read this label"
• "Is the stove on?"
• "Describe what you see"
• "What's on my plate?"
Always listening mode – ready when you are.
Smart Search (Alpha)
"Find my keys"
AI scans rapidly and guides you:
• "Not visible – turn camera left"
• "Turn right, scan the table"
• "FOUND! On counter, left side, about 2 feet away"
⚠️ Alpha: Still being worked on.
Face Recognition: Alpha
Save photos of people → AI announces when seen:
"I see Sarah ahead, about 8 feet away"
Totally optional. Enable only if wanted.
Object Tracking: Alpha
Tell AI to watch for items:
"Keep an eye out for my phone"
Later: "Where did you last see my phone?"
→ "On kitchen counter, 22 minutes ago"
Meal Assistance
Food positioned using clock face:
"Steak at 3 o'clock, potatoes at 9 o'clock, broccoli at 12 o'clock"
Plus descriptions: portion sizes, cooking level, colors, textures.
Reading Mode: Alpha
Books and documents:
• Voice commands: "Next page", "Previous page", "Repeat", "Read left page", "Read right page"
• Speed controls: "Read faster" / "Read slower" (instant adjustment)
• "Check alignment" (ensures full page visible)
• Auto-saves progress per book
• Resume exactly where you stopped
Social Cue Detection: Alpha
Optional feature detecting if people are:
• Making eye contact with you
• Waving or gesturing toward you
• Trying to get your attention
Fully Customizable
Pre-set profiles or build your own:
• Scanning frequency (2-4 seconds)
• Detail level (Basic / Standard / Maximum)
• Voice speed (0.5× to 2×)
• Auto-announce settings
• Feature toggles
Why This is a Web App, Not an App Store App
Honest reason: We want to ship features fast, not wait weeks for approval.
Better reason:
App stores are gatekeepers. Submit update → wait 1-2 weeks → maybe get approved → maybe get rejected for arbitrary reasons → users manually update → some users stuck on old versions for months.
Progressive Web Apps are different:
Bug discovered? Fixed within hours. Everyone has it immediately.
New feature ready? Live for everyone instantly.
AI model improved? Benefits everyone right away.
No approval process. No waiting. No gatekeepers.
Plus it works everywhere:
• iPhone ✓
• Android ✓
• Samsung ✓
• Google Pixel ✓
• Any modern smartphone ✓
Same features. Same performance. Same instant updates.
Installation takes 15 seconds:
1. Open browser
2. Visit URL
3. Tap "Add to Home Screen"
4. Appears like regular app
Done.
Privacy (The Short Version)
• Camera images analyzed and discarded – not stored
• Voice processed only during active questions
• Face recognition optional
• Data encrypted
• Delete everything anytime
Critical Safety Disclaimer:
AI makes mistakes. This is NOT a replacement for your cane, guide dog, or O&M training. Never rely on this alone for safety decisions. It's supplementary information, not primary navigation.
When Does This Launch?
Soon.
Final testing in progress.
When we officially release, you will have all features even though some of the app and it's features will still be in beta.
The Real Point of All This
For years, accessibility apps have operated on this assumption:
"Blind people need information. we'll give it to them efficiently."
Fine. But also... what if I flipped it:
"Blind people want to explore. They want control. They want information that persists. They want to discover things their way."
That's what I built.
Not "here's a sentence about your photo" but "here's 36 zones you can explore for as long as you want."
Not "here's a description of this room" but "here's a touchable map that stays on your screen."
Information that persists. Exploration you control. Interaction you direct.
That's the difference.
One Last Thing
The photo grid gives you 36 descriptions per image. Detailed, sensory, rich descriptions.
So when it comes out, watch people explore single photos for 5-10 minutes.
Going back to corners. Discovering details. Building mental images. Creating memories of the image.
That's not just making photos accessible.
That's making photos explorable.
And I think that's better.
Coming Soon
Progressive Web App
Works on All Smartphones
Built for exploration, not just description
What do you think? Which feature interests you most? Questions? Thoughts? Comments below.
Comments
My thoughts
I had a quick play and thought I would give my first impressions. I think the touch grid thing is a genius idea and possibly once I get the hang of using the app it might prove to be very useful.
I did get a little stuck in the terms and conditions for a bit. There are a number of checkboxes and it was a bit laborious to find them. I kept going to the Accept button and it was dimmed, then had to go off looking for the next ones. That's probably just me though.
I also had a small issue with the location services. I was told that Safari was blocking the request, so I went into settings and safari was already set to ask. So I ended up skipping the step. Again probably something I was doing wrong but I didn't notice an option to set it.
I went into this expecting an accessibility tool so was a bit confused when it started going on about social media things. Then when I got into the app my first impression was "am I using the wrong thing?". It initially made me a little less confident about using it because it felt like I was going to be sharing photos with the world which I definitely do not want to do. There were also a number of unlabelled buttons here too. I realise I am an anti-social old man and once I realised I could just ignore it, I was OK. I'm sure lots of people will like this though.
I must admit, I don't really like the built-in screen reader. For the most part it feels unnecessary. Maybe it doesn't help that I have an unnatural aversion to Daniel. It did occasionally use my Spoken Content voice, seemingly at random. I guess it does make sense with the grid because you want to be able to go up/down as well as left/right. But otherwise I didn't really appreciate it. I did find the option to change the voice, but it was quite a complicated page with lots of controls. Maybe it's not been finished yet. I could swipe through all the default voices including the silly ones like Bubbles. I thought I'd changed to Karen but it seemed to use Daniel still. Anyway I guess I will get used to it.
When I was on the grid it did work fine, except I wasn't then sure how to get out of the grid and go back. I guess there are probably other swipes to learn and maybe I didn't pay attention when it told me. But I just turned VoiceOver back on and tried to ignore the other voices talking over the top of it.
I tried uploading a photo. I'm guessing as a PWA there's not going to be a way to share photos, but I chose to open the photo library. I got the usual very slight VoiceOver image recognition thing and chose a photo I didn't recognise. I chose the default Bullseye option. It is quite cool being able to move around the photo and feel the different parts of it. But I did find double tapping a little clumsy. I tapped on a group of people as I was interested to know who they were and how they ended up on my phone. But I think in double tapping I must have reselected a different photo instead as it started describing a barista who was in another square. I was swiping around for a bit, then it all went quiet. I wasn't sure if it was busy and tapped and swiped a bit. When I gave up and turned VO back on, I had some popup menu then found I had somehow managed to go Home and had closed the app altogether. I guess it is just going to need a bit of practice. I will need to try the other option and see if that works better for me.
I haven't tried it on a Desktop yet but noticed it uses a mouse or trackpad[. I think it would be better and more natural for screen reader uses if you could use the keyboard. Maybe I could use arrow keys to move around the grid and enter to zoom in? Maybe Esc to go back? And in that case I think if that was doable then I would much prefer to do that without the built-in screen reader.
I think there is an awful lot going on in here and that it does risk becoming a bit overwhelming as a result. As far as I can tell there ids, or plans to be, a social media platform, a tool to help me find stuff, a tool to help me explore photos, a web browser, an AI chat (?), some kind of location thing and all this with a different screen reader that I need to figure out. I am drawn to the idea of the photo grid. I think some of the other features feel a bit unnecessary. There is something a bit peculiar about going to a web site, logging in, and then from there going to another web browser to help me navigate the web. Particularly when it told me I could navigate with the mouse or trackpadd. I don't really get why I would want to do that.
Anyway apologies for the big long post and it is probably sounding a bit negative. It is quite likely that with a bit of of effort on my part it will all start to make sense.
Photo grid
Stupid question, maybe, but when am I supposed to be using the built-in screen reader? It seems that most of the time I need to use VO but when I get to the photo grid, for example, I need to disable it and use the built-in one. When I went through setup, I was using VO and it was fine except for Daniel talking from time to time. Maybe if I'd turned off VO it would have made more sense. But when I turn it off in the app I find I can't really do much until I get to the grid. So I need to keep turning it on and off depending on what I am doing. Is that how it is supposed to work?
I had another play with the photo grid. I think I misunderstood before - I thought I was being asked to choose between Bullseye and Progressive mode, but only bullseye was usable. Maybe the other option is coming later?
Anyway I tried bullseye again. The app thought about it and then the photo appeared. Except as I was moving around, all I heard were the sounds and no descriptions. It was kinda fun to play with but not helpful. Then I found I'd accidentally selected a square. I got a description of the scene this time as I swiped around, but I'd find it would sometimes go quiet and I wasn't sure why. Maybe it's my phone (iPhone 13 Pro Max with iOS 26.1) but sometimes I struggled to get anything to happen for a bit. Sometimes I would hear something like "Reset".
I managed to drill-down to the correct square once, and got the wrong one the second time I tried. I guess it's just a matter of practice, but a split tap would be a lot easier to execute.
After a while it all went quiet, but without coming back to I was a little unsure about how far I could drill-down. It went so far then stopped. The level of detail I got to was more or less just telling me there was a face with blonde hair or something. I tried to go further but think that was as far as it went.
I tried to go elsewhere but it kept going quiet on me and I wasn't sure what was going on. Eventually, nothing, and again I found I'd fallen out of what I was trying to do and this time was back a few screens asking me to choose a photo or something.
One small suggestion is that any time the app is doing something, I would appreciate a little noise to know that I should just wait and not try moving about. This might alleviate some of my issues - maybe I was just too impatient.
One other small thing that was a bit annoying is that every time I went to the start page, it would ask me to grant access to the camera even when I wasn't using it.
I was comparing the grid idea to something like PiccyBot, which is my app of choice for describing what's in a photo. And I was thinking that the main thing this app gives me is an outline of what is there. So with PiccyBot I am going to get an awful lot of detail up front and I can ask follow-up questions. Now I could probably tailor the prompt to give me a specific list of things in the photo, but it is still going to reel them out.
I think this app has the potential to be a quicker way of finding the thing I want, but unfortunately it doesn't really work out that way. Firstly, it takes me a long time to get into it, find the right place, select the photo, turn off VoiceOver, wait for it to have a good long think, and only then do I get the top-level list of things for me to select from.
Whereas I think for this sort of thing I could probably just use other apps and use a better starting question, then just say "tell me about such and such".
Again I think this app is maybe just doing too much, whereas it might be better if it just concentrated on its USP, which I think is the photo grid, and make it really intuitive and nice to use. But I don't want another screen reader, I don't want to use a mouse or a trackpad, I just want to get to the answer.
But let's face it I am not everyone. I am less patient than most, maybe a bit too lazy when confronted with something new, and definitely less sociable.
Anyway, sorry again if I am just being overly critical. I love that you are pouring so much into this app and I hope you continue so that I can eat my words. You have done an incredible amount in a ridiculously short amount of time, so who knows where this is going.
I can't see myself using it as it stands but I will keep following along with this thread and see where it goes. I genuinely wish you luck with it and please don't let me dissuade you.
@mr grieves
Thank you for taking the time to provide such detailed feedback. It's incredibly valuable for us as we continue to refine the experience. I can definitely clarify some of the points you raised.
First, regarding the screen reader usage: You've hit on a core aspect of how our app is designed to provide a unique, immersive experience, especially in features like the photo grid and room exploration.
The Built-in Screen Reader and Why VoiceOver Needs to Be Off for Tactile Exploration:
• Custom Gestures and Spatial Interaction: Our app's tactile exploration modes (like the photo grid and room exploration) rely heavily on custom multi-touch gestures and spatial audio cues. We've designed a system that allows you to "feel" the layout of a scene or room, pinpoint specific objects, and zoom into them using your fingers directly on the screen.
• VoiceOver Interference: When an external screen reader like VoiceOver is active, it intercepts most of these multi-touch gestures. This means the app doesn't receive the direct touch input it needs to interpret your "explorations" (drags, double-taps for zoom, triple-taps for navigation, or multi-finger swipes to exit). VoiceOver tries to describe what's on the screen rather than allowing you to directly interact with the spatial layout.
• Split Tap Limitation: You suggested a "split tap" might be easier, but this would unfortunately remove the ability to perform the nuanced multi-finger gestures that allow for zooming and precise navigation. Our system is designed for a more direct, multi-dimensional tactile feedback, not just sequential item reading.
• Current Workflow: Yes, for now, the intended workflow is to temporarily disable VoiceOver when entering the tactile exploration modes (photo grid, room exploration) to fully engage with our app's unique interactive features. We understand this adds an extra step, and we're always looking for ways to streamline this, but it's a necessary compromise to deliver these specialized experiences. For all other parts of the app (menus, settings, etc.), VoiceOver can certainly remain on.
Your Experience with the Photo Grid ("Bullseye" Mode):
• "Bullseye" Mode: You're correct; "Bullseye" is our primary focus for photo exploration, with other modes being future developments.
• Sounds but No Descriptions: The sounds you hear are indeed our spatial audio cues. They're designed to give you a sense of where objects are in the grid and their importance, even before you explicitly select them. This helps you build a mental map of the scene. The detailed descriptions are triggered once you tap or select a specific area.
• Accidental Selection and Descriptions: You noted that accidentally selecting a square gave you a description – that's precisely how it's designed to work! When you explicitly touch and pause, or "select" an area, the app then provides a verbal description of what's there.
• Quiet Periods and "Reset": Quiet periods could occur if you move your finger quickly outside a selected area, or if the app is momentarily processing. The "Reset" sound you heard is an audio cue indicating that the exploration focus has been cleared, perhaps because a gesture was completed or canceled, or a timeout occurred. This is indeed to let you know a state change has happened.
• Drill-down Limits: You accurately observed that there are limits to how far you can drill down. The system processes the image into progressive levels of detail, and once the finest available detail is reached, it will stop. This is by design, as there's a computational limit to how finely we can analyze an image.
• Exiting Exploration: It sounds like you might have accidentally triggered one of the multi-finger exit gestures. For example, a three-finger swipe left is intended to close the exploration mode. We'll work on making these gestures more intuitive and providing clearer audio feedback when they occur.
Your Suggestions:
• Loading Indicators: This is excellent feedback! You're absolutely right that the app should provide more audio cues when it's actively processing something. We are actively working on implementing these kinds of "wait" or "processing" sounds to prevent users from feeling like the app has gone quiet or is unresponsive.
• Camera Access on Start: The app requests camera access upon startup to ensure all core functionalities are immediately available. We understand how this can be perceived as annoying when not directly using the camera, however this is an IOS restriction. We are currently researching ways around this.
• Comparison to PiccyBot: You've made a great comparison that helps highlight the different approaches. PiccyBot excels at providing comprehensive, upfront descriptions. Our app aims for a different experience: interactive spatial discovery. Instead of getting a detailed list immediately, our goal is to empower you to actively explore a scene, decide what you want to focus on, and then zoom into that specific object or area for detail. It's about letting you drive the exploration rather than receiving a pre-defined description.
• Comparison to PiccyBot: You've made a great comparison that helps highlight the different approaches. PiccyBot excels at providing comprehensive, upfront descriptions. Our app aims for a different experience: interactive spatial discovery. Instead of getting a detailed list immediately, our goal is to empower you to actively explore a scene, decide what you want to focus on, and then zoom into that specific object or area for detail. It's about letting you drive the exploration rather than receiving a pre-defined description.
You're right that there's a learning curve with these new interaction models, but we believe the ability to spatially "feel" and interact with visual information offers unique advantages that complement traditional screen reader functions. Your feedback helps us immensely in making this powerful tool more user-friendly.
mr grieves
In response to your first post, I'm glad to hear that the "touch grid" concept resonated with you as a potentially useful and genius idea!
Let me address your points, as there are indeed some critical design decisions and functionalities that might not be immediately obvious.
On the Built-in Screen Reader and VoiceOver Interaction:
You've pinpointed a key area: the interplay between our built-in screen reader and external accessibility tools like VoiceOver. Our app is designed to offer a unique, immersive "spatial computing" experience, particularly in the photo grid and room exploration modes.
• Why VoiceOver Needs to Be Off for Explore Modes: The reason we recommend turning off VoiceOver for these specific "explore" modes is fundamental to their design. VoiceOver, while powerful, is designed to linearize and announce elements one by one. Our exploration modes, however, are built around direct, multi-finger gestures and spatial interaction.
• For example, dragging your finger across the photo grid isn't just about moving between "buttons"; it's about continuously sampling a visual space, hearing sounds that change in pitch and pan as you move, and triggering feedback that varies based on the object's texture or proximity.
• If VoiceOver is active, it intercepts these gestures, interprets them as standard navigation commands, and prevents the app from receiving the raw touch input needed for the spatial feedback. It tries to describe the UI elements, whereas we want you to "feel" the image content itself.
• Split Tap vs. Spatial Gestures: You mentioned that a "split tap" might be easier. While split taps are great for standard UI elements, they wouldn't allow for the continuous, dynamic sampling of space that makes the grid unique. The multi-finger gestures (like double-tap to zoom, triple-tap to navigate, or three-finger swipe to exit) are designed to provide a rich vocabulary of interaction within that spatial context, allowing you to quickly delve deeper or navigate away using physical movements. It's a different paradigm than sequential navigation.
• "Daniel" and Voice Customization: I understand your "unnatural aversion to Daniel"! You're right that the voice settings page can be a bit overwhelming as it offers many advanced options. We are working on simplifying this. The app should generally respect your preferred voice settings. If it's reverting to Daniel or your Spoken Content voice randomly, that sounds like a bug we need to investigate, as it should consistently use your selected app voice.
Getting Out of the Grid and Unexpected Navigation:
• Exit Gestures: You're absolutely right, there are specific gestures to exit the exploration modes without needing to re-enable VoiceOver. For Room Exploration, a three-finger swipe left is designed to take you out. For Photo Exploration, the gestures are slightly different depending on the context, but the general principle is multi-finger swipes for broader actions. We recognize that these need clearer verbal instruction and practice. The "Reset" cue you heard might be related to gestures that clear the current focus within the grid.
• Accidental Home/App Closure: This sounds like a system-level gesture on your iPhone might have been triggered by accident while attempting app-specific gestures. We aim to keep our gestures distinct to avoid such conflicts but even native apps can be closed with certain jestures. Just insure to stay away from the very top and bottom of your screen as swiping down or up from those points is an on device command.
Terms & Conditions and Location Services:
• Checkbox Process: Thank you for this candid feedback. The "checkbox fatigue" and dimmed button scenario is a known accessibility challenge. We're actively looking into ways to make this process smoother and more intuitive, perhaps by providing better focus indication or summarization but it is important for all that information to be read and understood for legal reasons.
• Location Services: The Safari blocking message and the "ask" setting suggest a browser-level permission issue. Sometimes, even if set to "ask," certain browser privacy settings can be very strict or require a manual override for PWAs. This is a common hurdle with web-based apps accessing device features, and we're working on clearer in-app guidance for these situations.
Social Media and "Overwhelming" Features:
• Social Media as Optional: You hit the nail on the head: the social media aspects are completely optional. We included them because a significant portion of our early user base expressed a strong desire for a truly accessible platform to share visual experiences and connect with others. We recognize that not everyone wants this, and our goal is to empower choice. You can use the exploration tools purely for personal use and completely ignore the social features if you wish. We apologize if its initial prominence made you feel compelled to use it. The unlabelled buttons are a concern, and we'll fix those promptly.
• Why So Many Features? Our vision is to create a comprehensive assistive companion. Different blind and low-vision individuals have diverse needs. Some want to explore their physical environment, some need help reading documents, others want to browse the web safely, and yes, some want to connect socially. Instead of building many separate apps, we're trying to create a unified platform where these tools are available. You don't have to use them all.
• "Browser within a Browser" and Mouse/Trackpad: Your confusion here is understandable. The built-in web browser isn't just a regular browser. It's an accessible web browser designed specifically to tackle the visual complexity of the internet. It uses AI to interpret and summarize page layouts, extracts key information, and allows for tactile exploration of web content in a way standard browsers (even with screen readers) often cannot.
• Mouse/Trackpad for Explore Modes: For desktop users, the mouse or trackpad can offer a highly intuitive way to engage with the spatial exploration modes (photo grid, room exploration, and accessible web browsing). Just as touch gestures simulate "feeling" a space on mobile, a mouse or trackpad allows precise, continuous movement across a digital representation of that space, triggering audio feedback as you "hover" over or click on objects. It offers a different, yet equally rich, interaction model for spatial discovery than keyboard navigation. While keyboard navigation (arrow keys, enter, esc) is a valid suggestion for desktop, the mouse/trackpad provides a more direct analog to the touch experience, particularly for exploring visual layouts, which is a core tenet of the app.
Your feedback is precisely what we need to make this app better. It highlights areas where our design intent isn't translating clearly into user experience, and we'll be making improvements based on your valuable input. Thank you again for being a part of this journey!
Profiles not saving
hi Stephen,
First, I absolutely love how you have organized the settings page. Kudos to you. I wanted to tell you about a small issue I am having. When I go to the profile setting, and try to adjust my name, add a photo, and a little bio, I can do all of this of course, but when I go to save, it errors out. Every time. I will hear the built-in screen reader say saving, or now saving, something like that, then half a second later it says, save failed or failed to save.
All of the other settings seem to work fine. I was even able to set up Alex as my voice, purely through this app, without going to my spoken content settings.
One more issue, slightly off topic, is your built-in screen reader. I see now, when we first load the app, there's a little message about the screen reader, and a button to enable it. However, the screen reader seems to have some lag, or perhaps Certain pop-ups are preventing it from working properly. For example, like Mr. Grieves stated above, I constantly get the allow camera permissions pop up. This happens every time, even after granting permission.
Just some things I wanted you to be aware of. Overall, I am still digging this app.😊
@ Brian
Thanks for letting me know. I’ll have this fixed and hopefully about 10 minutes here.
@Brian
Your profile saving issue should be fixed now! I'm so glad you're now able to set up Alex as your voice - not gonna lie, that was a tough one to implement!
As for camera permissions, that's an iOS-specific restriction I'm looking into. I'm working on ways to make the experience less intrusive - like maybe only requesting camera access when you enter Live AI mode or Conversation mode, rather than upfront. That way it's a little less disruptive, at least for now.
The screen reader is a work in progress, but right now it should work well in explore mode. This one is proving to be trickier than adding that Alex voice, lol.
@brian
This is a prime example why I decided to go with a web app. :). I can push updates to you right away. It could have taken me 3 weeks or long if it was a native app.
Re: Profiles
Profile saved successfully. Thanks Stephen!
Voice commands.
Is there a way to know the voice commands? Also I don't think my microphone settings were on? I'm using firefox and the mic settings are saved as on there.
@brad
Hey Brad, I was just thinking about you lol.
When I talk about voice commands, do you mean for the web browser feature? If so, here’s how it functions:
Our web browser is designed to be fully accessible. You can use your voice to search the web, ask questions about what's on a page, and even navigate through content. When you ask it to open a website, it first tries to load an accessible, AI-summarized version of the content. This means it reads out the main text, key headings, and helps you explore images with spatial audio, all optimized for accessibility.
We've made some big improvements recently. Now, if you want to go directly to a specific website, you absolutely can! However, you might notice that some popular sites like Reddit or YouTube will automatically open in a new tab in your device's default browser instead of appearing directly within our app.
This isn't an issue with our system, but rather a security measure used by those websites. They send out a signal, often called an "X-Frame-Options" header or a "Content Security Policy," which basically tells other applications, "Don't put me inside an iframe!" An iframe is like a window within our app that displays another website, and these sites block that for security and privacy reasons.
We want you to have the best experience, so when we detect these sites, we now intelligently offer to open them in a new tab so you can still access them without a hitch however photo explore mode does not work in these situations as the site can't open in app. I'm actively looking into more seamless solutions for this iframe situation, and I'll keep you updated if and when I find them!
@Brian!
Perfect! :). Not a problem at all.
Wooph!
It looks like I can officially pull the room explore mode out of beta!!!!! Maybe I should get some sleep? Apparently we humans need that. I feel like I haven’t slept in days lol.
omg
sorry guys fixing the room explore feature...lol.
I had an idea!
Hang tight guys, I may do an over hall for room explore mode...just you wait! If this works it is going to be epic. If this works like I'm hoping I'm most certainly adding this to the photo exploration feature. Can any of you guess what I'm thinking?
Epic!
That it already is! I'm loving what I'm seeing here and I feel very happy that I'm being part of this genius thing that is happening.
@Stephen I have no idea what is coming from your side, but I quite trust it's going to be intuetive and useful. I do have a request though. Similar to the maps feature, it will be very useful to have a feature within the app where someone can create a layout of a place (say a venue, restaurant and share the link with a blind person) such that we can explore the place tactiley before we physically go there. Also, it'll be interesting to have a space where we can explore iconic stuff like, say, the statue of liberty, and also layouts of things like say a cricket field, or even things like the human body. I know I am going out of the picture exploration paradigm in the strict sense, but I am beginning to see this as the first baby step towards an AI-powered tactile revolution...
@Gokul
Great minds think alike! I'm already working on features like that on the backend!
Overhaul complete!
Just did a complete overhaul of the room exploration feature. Check it out. It might surprise you 😊. For the absolute best results use headphones. It does work without headphones as well, so please don’t think you have to use them. ❤️.
Also forgot to mention
You should be able to access most features efficiently without having your own device screen reader on. Yes there is still some tweaking to do in the back end but most things you should be able to access without your screen reader.
Very good app, but the live artificial intelligence mode doesn’t
Hello, congratulations on the app, I see great potential in it. Something is happening here and I’m not sure if I’m doing something wrong or if the app has a problem. Here is what happens: when I tap to start the AI assistance camera and try to say something to it, I don’t get any response. The app only starts making some noises and saying the word “Listening,” even to the point of freezing the phone.
@ Guilherme
Hey there thanks for letting me know :). Right now it won’t function properly because I’m switching out how the code works on the back end. I’m going to revamp how that works. I was going to finish that tonight but I got busy working on the room explorer feature for users to have a better experience. Alongside that I was working on the screen reader. I’ll fix that in a bit just going to get some wrest :). I’ll post an update here.
Room explorer
Very interesting how the audio now pans within the room explorer. Again intuetive. I remember trying the vOICe app earlier; this takes the same approach and implements in a more accessible way. I was wondering if there's a way by which one could capture multiple pictures of a room from different angles and combine to get a overall view of the whole room?
A few more thoughts (please ignore if you've had enough of me)
Thanks so much for taking the time both to read through my rather long feedback and to provide such a detailed response.
I think maybe I didn't quite explain a couple of the problems I experienced yesterday very well.
At one point, as I was swiping around I was only hearing the sounds and not the spoken descriptions. So I couldn't tell what anything was. Normally swiping around will tell me the descriptions of things, but for some reason it wasn't doing that. I've only had that once so maybe it was just a quirk.
I think I understand your reasoning for using your own screen reader. I personally think as it stands right now I would still enjoy the app more if I could turn off the built-in one and just use VoiceOver. Touch to explore will work fine with a grid, and gestures like scrub can take me back. Maybe down the road, this choice will feel vindicated but right now it just makes the app very unintuitive and I need to keep switching between two things. I also suspect you will never get away from the need to use VoiceOver sometimes - whether it be to enable the camera, or to select a local file to view or whatever. And it's always going to be jarring. Maybe a sound effect that tells me when to turn VoiceOver on might help a little, or maybe I just end up getting used to it. I am just dabbling after all, so maybe a serious user of the app won't be bothered by these things. But as a new user it feels like an obstacle I am being asked to overcome.
Bear in mind that VoiceOver generally works great for me, I like it and have no particular need for anything else. A tool that understands this and works in tandem with it rather than trying to reinvent the wheel will always be a better experience for me personally.
Being an old cynic, I can't help but feel that this approach was taken by someone who doesn't use a screen reader themselves.
Similarly on a computer, there is no reason for a blind person to own a mouse unless they share a computer with a sighted person. Maybe a trackpad is different. I have never used mine and it seems a bit of an alien artifact to me. It's often not even within reach. The keyboard is a screen reader user's weapon of choice.
I noticed you've added a new onboarding thing which is good. However, at the end it got to a screen about pricing. However the built-in screen reader just kept repeating the heading over and over again. I couldn't read the text behind it with VoiceOver because the other voice was so loud. I eventually found the button to get past it but I have no idea what it was trying to tell me. Probably that the app is free right now but won't remain that way forever.
I was also pleased to hear the absence of Daniel. I The other voice I was hearing is now the permanent screen reader voice as far as I can tell. It's not my spoken content voice, though, it's some American female voice, possibly Samantha but not sure. Anyway it is less grating.
I did find Daniel in one place. I thought I was doing a live scan of my surroundings but I just got these uncomfortable, crackling high pitch beeps and Daniel repeatedly saying "Listening" over and over.
I tried the new room explore feature. I did find the initial sound effects a bit ear-splittingly high pitched and was glad when I completed the tutorial and they stopped. I found when I double tapped on something to get more info I would get an error - something like how it couldn't download an external object or something.
I suspect maybe I am looking at this the wrong way. I usually approach this sort of thing specifically as a utility - how can it help me do something more efficiently than before. I had a similar problem with the Envision Ally app. I wanted the utility but just got wise cracks and jokes as it spouted wrong information at me.
I think maybe my brain isn't quite wired up to feel that exploring an image by touch just for the fun of it is actually something I want to do. The detail of the app isn't really enough that I feel connected to the image in any more of a way than I would with other tools, and being AI it's always going to be a little loose with the truth. (For example, I don't have steps in front of me, nor a telephone, nor a sub-woofer).
I like the suggestion on here about using this to explore an area and try to get some sort of spatial awareness of an environment. Whether a limited number of squares would be enough, I don't know, but it definitely feels like a much stronger case for this tech. And particularly if this could be generated without needing AI so I could rely on its accuracy.
Anyway please remember that I am just one person and the fact that this app may not be for me is absolutely fine and does nothing to diminish what you have done. Please feel free to disregard my ramblings if they come across negatively.
AI 1 - Mr Grieves 0
Oh apparently I do have a sub-woofer in front of me. How long that's been there I have no idea.
OK, AI - you win this time!
@ mr grieves
Hey so I will respond more thoroughly when I’m more awake but I wanted to touch on something. I am completely blind and yes I do use a screen reader. I’ll respond more thoroughly in a bit. Let’s not make assumptions :).
@Gokul and @Stephen
The thing Gokul is talking about is called, 'Panoramics', and if you can implement this properly, Stephen, it will be a true game changer. I mean that in the literal sense.
@Brian and @Gokul
Oh I'm totally with you both on that! I'm actually working toward it, but right now I'm still laying the foundation and building the house before I can think about decorating, you know? Don't want to put the cart before the horse 😄
The reality is, I'm also navigating some financial constraints here. Building this isn't cheap, and I'd honestly love nothing more than to quit my day job and work on this full-time so we could move faster. That's the dream! But I've got bills to pay and responsibilities - my two bunnies, my guide dog, and splitting finances with my spouse. So features like panoramics might be a little ways off for now.
But it's definitely on the roadmap! Just need to get the essentials rock-solid first, then we can start adding those really cool features.
Re: Assump[tions
Apologies if I caused offence. You are entirely right, I made a stupid assumption and should not have done that. If someone had done the same to mean I would have been a little insulted.
Considering that this seemed to come from nowhere and is being developed at a scary speed, you are obviously doing a great job and lots of people are enjoying it. I can tell you are putting your heart and soul into it, and I thoroughly commend you for that.
I think maybe it would have been better for everyone if I hadn't come on here and started spouted nonsense. It is a bit of a personality trait. Give me something that is almost perfect and I will probably complain about the almost and dismiss the perfect bit. Just be glad you don't have to live with me!
Anyway I will do my best to bite my tongue.
Also, as a blind developer myself I am in awe of what you are doing. Admittedly I am a bit of a dinosaur but since going blind, I can only dream of doing a fraction of what you are achieving with this. So please keep it up and don't let me dishearten you.
@mr grieves
Don't talk about your self like that! I welcome anyone and everyone's opinions and feedback. Why? because every opinion, every piece of feedback may spark an idea! You are more than welcome to go to town if you feel like it. That being said, This app may not be for everyone either and that is absolutely ok but that doesn't mean there shouldn't be some dialog. I will always respond to you as long as it's respectful and constructive.
A question for you guys.
Ok, so I’m torn. Should I keep building out the Look & Tell and Live AI features, or should I drop them? There are already a lot of apps offering similar tools, so I’m not sure if it makes sense to invest more into that lane.
If I let those go, I can put more time into the things that really set this project apart like photo exploration, 3D audio panoramic maps, the ability to share those maps, and everything else on that side of the experience.
So I’m asking honestly: would you actually use Look & Tell or the Live AI features? I can spend the time and money on them, but if most of you don’t see yourself using it, then it might not be worth building. Totally up to you.
it's interesting.
Usually I'd not care about the look around feature but whatever picture thing you're using, it's really good, do wish it would read text though, but like you say ther's other apps, but I think you're using a different engin?
@brad
That is interesting. If you like it I'll fix it and polish it up. I was going to scrap it but if you like it and you feal you are gonna use it I'll keep it. It is a little torn apart right now but I'll try to get that fixed later today then. :).
Photo exploration for the win!
Personally, I like the room exploration feature, but I absolutely love the photo exploration feature. If I had my way, I would ask that you invest your time and energy into that.
However, I realize I'm not the only user here, so go with what majority want, I guess. 😅
@ Brian
What would you like to see in the photo exploration feature?
My two cents
Well, if the Live AI feature works continuously, I mean without the need to keep asking it questions for more info, I'm all for it. Gemini and ChatGPT apps can't achieve that.
Only if you want too.
I’m not gonna give you guys a big spiel and ask for any donations. We get all that way too much. If you want to help the project out and it’s within your budget I have set up just a generic donation page. Don’t feel like you have too though. Eventually, once we start making this perfect, I will however be giving away some free legacy member plans and if you want and you’re comfortable, I can put you in the credits of the app as a supporter when it’s fully completed. I don’t want anyone to feel any sort of obligation and I’m definitely not gonna treat you with corporate speech… I work in a corporate job and I hate it lol. I’ll still be putting work into this regardless. You all have been amazing. I appreciate each and every single one of you.
https://www.paypal.com/ncp/payment/8RUHTTVFBJDCQ
Discord server
So I have a discord server up and running so I can chat with you guys in real time instead of constantly having to go here and scroll through all of these posts lol. Here’s the discord server link.
https://discord.gg/B22qDN8C2
Re: photo exploration
Honestly, as much detail as can be provided. And I know this is going to be tricky, but actual details on facial features. I don't need the application to tell me whom the person is, necessarily, but details on their features would be amazing.
@Brian
I love tricky! Let me see how much detail I can get it to describe too you. Let me just finish with the start camera and live ai features as I broke it yesterday and I'll start toying around with that.
hmm
Stephen Stephen Stephen! Nice job!
@Dominique
haha thanks so much :).
Number of users report!
I love you all so much! We are officially at 122 users!
Technical Update: What was wrong
So, the app was experiencing two major iOS-related issues:
1.
Camera Viewer on Homepage: The original homepage displayed a live camera feed even when you weren't using AI features. iOS security requires apps to request camera permission every time the camera is accessed. This meant every time you opened the app, iOS would ask for camera permission - even if you just wanted to check settings or navigate to a different feature.
2.
Voice AI Microphone Conflicts: When trying to use Voice AI to ask questions, iOS was getting confused because the app was already holding onto the camera permission. This created a conflict when trying to request microphone access simultaneously, causing the voice recognition to stutter, restart unexpectedly, or fail to activate properly.
What We Changed:
1.
Removed Camera from Homepage: We completely redesigned the homepage so the camera feed is NOT active when you first open the app. Instead, you now see a clean, organized menu of features. The camera only activates when you tap "Start AI Vision" or enter a feature that actually needs it (Room Explorer, Photo Explorer, etc.).
2.
Separated Camera and Microphone Access: We restructured how the app requests permissions. Now when you want to use Voice AI, you tap the "Ask a Question" button, and the app specifically requests microphone access at that moment - without the camera interfering.
Why iOS Permissions Work This Way:
iOS has strict privacy and security policies. Every time an app wants to access your camera or microphone, iOS requires explicit user permission. This is by design to protect your privacy - Apple doesn't allow apps to bypass this.
The key is timing: We can't prevent the permission prompts entirely, but we can control when they appear. By only activating the camera/microphone when genuinely needed, you'll see far fewer prompts.
What This Means for You:
• Homepage: Opens instantly without camera permission prompts. You can browse features, adjust settings, and navigate freely.
• Voice AI: Works reliably when you tap "Ask a Question" - you'll see a microphone permission prompt (only once when navigating to that page, and voice recognition will function smoothly without conflicts.
• PWA Support: iOS users can now add the app to their home screen as a Progressive Web App without constant interruptions.
• Permission Prompts: You'll still see them when using camera/microphone features - but only when actually needed.
Bottom Line: We didn't "fix" iOS permissions (we can't change Apple's security model), but we optimized when and how the app requests them to create a smoother, less intrusive experience.
I think I already asked this but.
I think an NVDA addon would be nice, so if i'm on reddit, I can just press a button and woosh, the text of a meme is read out to me, and I get to control how its read. For example, do i want to know the picture, the text , how much info, strip away, the post has 200 upvotes and os on, or do I want all that?
In other words; i control the AI and how it responds to me. Oo,, voice control might be even nicer, just say something like, oy! grab the pic from this page and tell me the text with the least amount of detail. Ok maybe not Oy! The mager issue i see with that is that reddit might have a page with multiple pictures, like if I'm on r/shitamericanssay and find a post I like so press enter on it, and it grabs a picture there might be multiple for the joke or just in general on the page, do you see?
Also, this is probably not what your'e going for but i fyou've ever used something like redditforblind, I'd love something like that but without signing in, so a way to brows reddit accessibly but without an account?
I had accounts on and off but would just like to brows.
If this doens't make sense; let me know cause I did just write it down, with not much structure.
@Brad
Oh you did that is my bad. With so much happening over here I thought I already replied to you. Forgive me lol :). As for the NVDA add on, I am looking into how I can make that possible for you. I'll let you know what I find out and how or if I can get that all set up. As for accessing websites thru lets say for example the accessible web brouzer thing I set up, The biggest issue is that whole iframe situation I was speaking about earlier. It is a complicated one and I'm trying to see how I can work around that legally. Hang tight buddy :).
sure.
If you can't it's no issue, if there are more important things to work on,, work on them. the web browser is an interesting idea but I honestly don't see many people using it as we already have web browsers that work.
@Brad
The point originally was to grab lets say google images and explore them but do to security reasons it won't let me. I kept it up because I've actually found it handy to get information much more quickly than using google for example. It skips all those adds and irrelevant results. I also get summaries of pages before I even go to them to make sure it is the one I want. So I left it up. I do see on the back end a handful of people using it so it doesn't hurt.
fair enough.
Have fun making more bits!
Did I just put out something?
Check your apps! This should help in indoor spaces.
more problems with the app
Hello, I’m having some more problems with the app. The AI live mode is not returning the captions in Portuguese, which is my language — it only returns them in English. Another issue is that when I select to explore a photo, it only tells me one object that is in the picture, and when I try to zoom in on this object, it says it found the number of parts, but it stays silent when I slide my finger across the screen to explore those parts.