Listen2 Developer Here - Introducing myself and Happy to Hear Feedback

Listen2 Reader app submission coming soon

Hello everyone,
DARIA here. I was asked to enter an app submission for Listen2 after Zach and I spent so much time working on it together over the last few weeks. That submission will be coming soon expected in the next day or two.
Thank you.

Welcome and some early thoughts

Hi Zach,

Welcome to AppleVis, and thank you so much for taking the time to introduce yourself here. It's always wonderful to see a developer who is genuinely committed to improving the accessibility of their app.

I've been trying out Listen2 and wanted to share a few observations from my initial experience.

The first is a performance issue I've noticed when listening to any EPUB file on my iPhone 15 Pro running iOS 26.3.1. The phone becomes very warm quite quickly and the battery drains noticeably faster than when I use comparable apps such as Voice Dream Reader. I've tested both the App Store release and the current TestFlight build and experience the same behaviour on both, so it doesn't appear to be specific to one version. It also doesn't appear to be specific to the voice being used.

My second point is a feature request: I'd love to see the maximum playback speed increased. Many members of the blind and low-vision community are accustomed to consuming spoken content at quite high speeds, and I suspect this would be a popular addition with a good number of AppleVis members.

An issue I wanted to flag concerns the playback progress slider that appears on the lock screen. When using VoiceOver, rather than announcing the current playback position, the slider always reads as zero percent regardless of where you are in the audio. This means it isn't possible to get a sense of your progress through the content, or to use the slider to navigate, without unlocking the device and going to the app.

Finally, I'd love to see the app display the total listening duration of a document. When opening a new book, it would be really useful to know upfront whether it's a five-hour or ten-hour listen. Building on this, it would be even better if the app could also show the amount already listened to and the time remaining — and ideally both figures would automatically reflect the currently selected playback speed, so that they remain accurate however fast or slowly you choose to listen.

Thanks again for reaching out and for your willingness to listen to feedback. I'm looking forward to seeing Listen2 continue to develop.

VoiceOver tips and tricks for battery drain Control

hello,
I thought I would post to clarify a couple of things for VoiceOver users. A couple of tricks to help with the battery drain are to turn off highlighting altogether or switch it to sentence mode. By default the app uses Word level highlighting. Second, unless you absolutely require your screen be active while reading use your screen curtain this will decrease battery usage. As to the progress bar comment that progress bar is not currently set up for navigation purposes with VoiceOver. I just checked in the app to be sure of this but that progress bar does not read as anything other than an indicator of the time through the document for lock screen navigation use your previous and next buttons if your navigation settings are set to paragraph it will move forward and back by paragraph if it is set to sentence it will move forward and back by sentence. hope this helps.

Re: VoiceOver tips and tricks for battery drain Control

Thank you for these suggestions, they are very much appreciated.

My early testing does suggest that turning off highlighting makes a difference. The phone still gets a little warmer than when using Voice Dream Reader, but nowhere near as warm as it did before I made the change. I have to admit I was somewhat surprised by this result, as I was experiencing this behaviour even when my phone was locked with the screen off. I hadn't expected that word-level highlighting would seemingly still be doing something in the background consuming power under those circumstances.

Regarding the progress bar, you mention that it reads as "an indicator of the time through the document" — could you clarify whether it is literally reporting time for you? On my phone, VoiceOver only announces a percentage value rather than a time. I'm wondering whether you are seeing different behaviour somewhere different to where I am looking, or whether something in the app settings needs to be changed to achieve time-based reporting. Any clarification on this would be very helpful.

Progress and highlighting

No problem. The progress bar does only show the percentage through the document at this point. I have raised the issue with the developer and I’m aware that he’s working on it. Something I do to get around this for the time being until it is dealt with is to import the document. I’m reading into another app that does show time codes get it read of it and then go back to The document in the app I prefer. It’s a little bit roundabout but it does work particularly if you seek to the same position in the document with the app that shows the time code whether you use Voice Dream or Speech Central or something else. I think the difference between this app and most traditional apps like Voice Dream or Speech Central is that these don’t use larger speech models. I don’t know if you were an NVDA user or what version of it you’re running if you do but if you have access to someone’s computer with NVDA running you can try installing the Sonata neural voices add on for NVDA to get a better understanding of how the piper TTS mobile in particular works. There’s only one other app right now that uses the super tonic TTS and that one is called Page echo. Regardless this is the best one I’ve found for on device Neural TTS that I don’t think either eats up too much of my battery or makes me want to cringe because of the strange inflections. It helps that Zach opened up the phoneme and expressiveness controls for Piper and did something similar for supertonic in The beta. The highlighting does tend to nibble on the battery though more so with word level highlighting than with sentence level. I don’t use either one so having it completely turned off does not bother me in the slightest. For anyone using this app who requires highlighting unless you need the word level highlighting or just don’t care about battery drain sentence level works just fine. There are a few bugs with the highlighting from what I’m aware of but they will likely be fixed in the next few versions.

Pete feedback

Pete,

Thanks for trying the app out and for the feedback.

Regarding The heating up: Daria already mentioned what I would say for the current version of the app. The word highlighting engine and the voice model are 2 separate machine learning models running on the device. So turning text highlighting to sentence or off will effectively reduce the power drain.

Think of this like running a gaming laptop or something that’s making use of the hardware, it will definitely heat up. That is a trade off to be aware of for an offline TTS app running neural voice models. I’ve had one user test it against a few other offline apps and he reported the drain on Listen2 was a bit more efficient than the other apps.

I’m always looking for ways to improve the efficiency and am experimenting with alternative highlighting systems and voice models.

The newest version that I’ll submit to test flight later today has Apple system voices as new engine alongside Piper. If you’re wanting the most efficient phone battery and heat experience, those would be the best option as they’re totally optimized by Apple and I believe they’re not using ML technology like Piper (I could be wrong for the premium voices)

Regarding the progress bar. I actually wasn’t aware that the lock screen always showed 0% progress on lock screen. I’ll get that fixed. And I’ll also prioritize getting the time estimates implemented.

The playback rate can go a bit higher. Pete, how fast would you expect to see for the max playback rate?

-Zach

Re: Pete feedback

Thanks for asking, Zach. Personally, I suspect my own sweet spot would be around 3.0x, though I'm aware that many others in the community would still consider that relatively slow! I know from other discussions here on AppleVis that pushing these synthesised voices to higher speeds can introduce undesirable side effects in terms of audio quality and behaviour, so there may be a practical ceiling to what is achievable, but even a modest increase would be welcome.

I also wanted to mention one further issue I've noticed. I have Siri configured to announce notifications on my AirPods Pro 3, and my experience with Listen2 is that although playback correctly pauses when a notification is being read out, it doesn't automatically resume once the announcement has finished. Resuming after a notification announcement is expected behaviour in other audio apps, so this is something that may be worth investigating.

playback speed and resume after siri announcement

Pete, noted on the playback speed. Thanks for giving me your sweet spot and your sense of what others might expect.

Good catch on the playback staying paused after the siri announcement. I fixed a bug a few weeks ago where getting a phone call would start Listen2 playback and I may have broken the resume after interruption. I'll look into this too. I know how annoying that is.

A question

Why do so many accessible reading apps seem to assume that blind readers only want to read through the speech controls built into the app itself? Sometimes it feels as if developers think accessibility begins and ends with pressing play and listening.
But that is not how many of us actually read.
I want to read using my own VoiceOver controls. I want to move through text the way I normally do: by paragraph, sentence, word, or even character when needed. I want to use my braille display naturally. I want to highlight passages, add bookmarks, write notes, export those notes later, and return exactly to where I stopped reading. Sometimes I need to stay on one section and read it several times because I am studying, reviewing, or working—not simply listening to a novel for entertainment.
Apps like Speech Central and Readify still miss much of this. They often lock the reading experience inside their own controls, as if built-in speech is the only thing that matters. But reading is much more than listening. Sometimes I need precision, annotation, and full control over navigation.
Not every reading session is about casually consuming a book. Sometimes I need to examine a paragraph carefully, compare sections, mark important points, and come back to them later. That is difficult when the app limits how text can be explored.
Accessibility should not mean replacing system accessibility tools with something narrower. It should mean giving users the freedom to read in the way that works best for them.

Alternative to listen to for individuals who require braille dis

Hello,
I very much appreciate what you are saying. I am actually deaf Blind so there are times when I definitely need to review text right along with listening to it. When I need to do this, I have started using the app Vox libri it does have speech controls built into the app but it also has settings to allow you to put it into essentially a braille display mode. We all have different tools for different things. And we alternate those tools to meet the needs of the moment. I find Vox libri useful for when I need to use a braille display and want to be able to control my text through VoiceOver exclusively by contrast, I find Listen2 Readify or Speech Central useful for when I need to be able to understand what I’m reading but don’t absolutely require text review. To each their own. The other alternative to Vox libri would be the Dolphin EasyReader application like Vox libri this one also has a screen reader mode for use with braille display so it combines both. dolphin EasyReader is free and Vox libri is $5 upfront. I believe there is a submission for it it in the directory. I know there is one for dolphin EasyReader.

Version 1.6.0 is live and improves on the heat issue

Pete,

I just wanted to give you an update that after spending a couple weeks training some lightweight word alignment models, the burden on the phone's resources is much less. In my tests, with word highlighting on and having a book playing back for over an hour, I didn't notice any excessive heat on my iPhone 15 Pro Max. Now I realize word highlighting may not be a feature you'd make use of, but I just wanted to let you know and anyone else that was concerned by the heat and battery issue, that I've made a significant stride toward making this more efficient.

Now, I've also logged your other requests regarding the playback speed, now playing progress indicator, and siri announcement resuming issues. I'll make sure these get proper attention and will keep you posted when those fixes ship.

-Zach

Playback duration, elapsed time now in v1.7

Pilgrim Pete,

Just FYI, version 1.7 of the app was released today and includes playback duration and elapsed time estimations. It also allows you to scrub through the progress bar and start playback from that location. This works on the Now Playing on lock screen or in the reader view progress bar. There are labels under the progress bar for elapsed time and remaining time.

Still have the playback speed and siri interruptions in the back log.

TL;DR Observations

Cudos, Zack, for such a brilliant app: great interface, great parsing of documents.
I won't be able to purchase it, though; if the trial resets when I delete all the info and move to whatever my next phone is, I'll definitely try it again.
The neural engine is apparently too much for my iPhone 16e. When reading any file, including the intro pdf, there's no speech for between 7 to 14 seconds, either when pressing play or using skip. I did experience instant pause/resume in a book-length Word document a few times, but have no idea what conditions brought that about and couldn't repeat them. Skip re-introduced the lag, as does navigating to another chapter (much longer lag).
On book-length word documents, as well as the provided Alice epub, sometimes playback doesn't begin at all until I jigger with it a bit. I suspect this might be while it's still parsing the document, but there's no message to indicate when the document is ready. Sometimes, it just stops and I can't get it to restart until I tap in the visual document display. I've also had it eject me from the app or lag for a couple of literal minutes before it began speaking (I'd put the phone down and begun doing something else). Those last issues were when I had a couple of other typical apps running in the background; the 7-14 second lags were with no other apps running.
The engine is not responsive to question marks or exclamation points at all, and can't yet interpret context-dependent pronunciations (technically, heteronyms; "content" with accent on the first versus second syllable, for instance). Finally, the expressiveness seems to drop out of the voice after a few pages. It becomes flat, introduces awkward pauses (probably at lines), and becomes worse at responding to punctuation.
I don't know if kokoro is still a Piper engine, but I'll say the premium voices are really improved over the last time I tried them. These are still early days, but time is flying.
Feature suggestions:
- VoiceOver frequently talks over the app voice when focus is inadvertently placed on the document area when playback begins. It did this a lot for some reason. 3 out of 10 on the nuisance scale.
- Given my problems with the neural engines, I would like the option of not using a voice at all and relying on VoiceOver instead. It has access to Siri voices that are far better than the "medium" piper engine.
- If the kokoro voices work smoothly on higher-end processors, I think a note in the app store description that an M-series chip or whatever is recommended for best performance. Otherwise, a lot of potential users will be in my shoes: the trial will run out and we wouldn't be able to try again when a significant upgrade happens.
- I think there should be a free version without access to the premium voices. It creates a great hook for a broader base who then will be pulled in if the app works for them. Frankly, I have regretted every single high-dollar app purchase I've ever made.
- The time selector appears not to work with VoiceOver. I crank the rotor to "adjust value" and cannot adjust the value.
- I would benefit greatly from swipe gestures to skip by sentence and paragraph (ideally, one each). Button-pressing overlays reading with VO speech and gets in the way. It'd be great to have a truly seamless experience; readers sometimes need to skim by paragraph and at other times just reread a sentence, and they need to do both regularly in the same session.
- I despise library imports in all their forms. I don't even use them in my music player. I was thrilled to be able to immediately import documents from OneDrive. Not sure why I can't just open them for reading directly: my OneDrive *is* my library. Maybe there's a technical reason, since all apps seem to do it this way.
Best of luck with this, and I look forward to trying it again. It has the potential to be the greatest thing since sliced bread, as blind-friendly reading tools go. The app design itself is almost there already.

A couple of thoughts

You can always sign up for the TestFlight beta of the app. It will ask you to pay but you won’t actually be charged. No trials. Some of the things you mentioned have been addressed in the most recent beta. Most of the text skipping options are controlled by rotor actions. I usually put my speech to off when I am reading with the app or just out and out lock my screen. I don’t use the Kokoro engine unless my phone is plugged in, but you can use the Apple voices for your reading if you want to. Just go into the voice library and sort and filter the voices to show only the Apple system voices if you don’t want to use the neural Text To Speech built into the app. If you’re talking about the percentage selector, you don’t have to use the value adjustment unless you just want to. There is a button labelled jump to percentage. That will open a text box where you can enter whatever percentage you want. That way you can end up wherever you please in your document. The Kokoro engine is not the same as the Piper engine and there’s also the super tonic engine which is slightly more expressive but is still in development. Remember that all three neural engines built into this application are open source and are under development separate from this application. Did you try the Neuvoice set as the default? Also, did you try adjusting the expressiveness and phoning duration variation to see if that helped with your expressiveness issues? I keep mine at 100% personally. The default expressiveness is set to 67%. And the phoning duration is I think set to 80. Super tonic has a bit more wiggle room in the expressive nest Department but does tend to skip words on occasion hopefully as that model is developed that will change or at least improve. Hope this helps. I will also point out that the Piper and Supertonic engines don’t tend to use as much processing power as the Kakuro engine for anyone who wants a basic ranking I would say Piper uses the least than Supertonic then Kokoro though the Supertonic and Piper might be incrementally different.

Good Observations!

Voracious P. Brain,

If you send me a support email in the app, I'll send you a promo code so you can continue testing the app. And as already stated, the Test Flight app is a good way to try it out for free as well. Let me know if you need help getting that set up. Here is the link: https://testflight.apple.com/join/uRrWXCba

Thanks for taking the time to write out these observations. I know blind harper already gave quite a few valid tips and tricks, but I can speak to some more of the issues as well.

There are currently 4 voice engines you can use in the app.
- Apple "System" voices - These are the ones you love already and you're right that the enhanced and premium voices sound awesome and are very fast and efficient on any iOS device.
- Piper voices - These are the next most efficient voice in the app. Like you alluded to, the prosody on questions and exclamations will vary quite a bit from one voice to the next. These voices are open source and the training data for each was different and some include more of those types of sentences and some models learned those inflections better. You may or may not find one suitable for you.
- Supertonic - These are very experimental still and use a totally different architecture than Piper. The voice audio quality is very good and realistic, but the models are prone to skipping words and even full phrases. It's a known issue with the model architecture and training strategy.
- Kokoro - These are the biggest, heaviest, but most awesome sounding of the engines available. Your phone will get hot and the battery will drain, but that's the price you pay for doing that level of text to speech on your phone hardware.

When I need to preserve battery, I usually just set the voice to the "Zoe" system voice.

Regarding the cold start time: There should be essentially no cold start lag time with system voices. But for the rest, Listen2 synthesizes the text on the fly. I have the text to speech engine build up about 7 sentences worth of audio in a queue. Once that queue is about 30% full, playback can start. So until then, the app will just show the loading audio loading message. Now, to keep everything in sync - the documents current text window, the processed audio, the playback position, the scroll position, I have the audio queue cleared whenever you jump to another location including skipping sentences or paragraphs. That's a tradeoff to be aware of. But, the skipping pause isn't an issue with system voices.

The iphone 6e has plenty of resources to run the app. I think the cold start of the playback was confusing. Even on my iPhone 15 Pro Max, I have those same cold start times.

For your idea of "VoiceOver-only" mode: you can already do this. Go to the library view, then settings, then voice library > then filter > then toggle off all the voice engines and turn on "system" - Then you only see the apple system voices you've downloaded. As you know, you'll have to download any additional ones in iOS system settings rather than in Listen2 app settings. Once you set one of those voices as active, when you play a document, you'll hear that voice.

I do hear your suggestion about making the free tier include the apple system voices. I've considered this and may make that change at some point.

Regarding one drive: Yes, since listen2 extracts the document text, parses the table of contents, and all the other document handling happens on your iOS devices, it does have to be downloaded from one drive and imported into Listen2.

Regarding the line break pauses - yes, this is likely still an issue in some PDF files. PDF parsing is all about getting thousands of edge cases handled without letting any of them slip though. That being said - In the latest release, I did just fix an issue where when line spacing was large, PDFs would treat each line as a paragraph which caused the awkward prosody-killing pauses. Hopefully that fixes many of that class of issues.

Regarding homonyms: I hear you on this. For most voice engines, there's not a way to specify which sense of the word to pronounce. I haven't explicitly tested this with supertonic or Kokoro, but I think those models might just handle it better because of their architecture - but I could be wrong. I've looked into several ways one might achieve homonym distinction and none of the solutions are elegant and all would require some type of sentence context understanding. It's definitely on my radar of things to continue looking into, but I don't have a good solution yet.

I suspect that when you had immediate playback, you may have had a system voice active.

I have experienced the issue with voice over when the play button just doesn't work. I usually try skipping sentence or paragraph and if that doesn't fix it, going back to the library and then back to the book usually fixes it. I know that's a lame work around, but I can't reproduce it consistently so I haven't been able to find the root cause.

You mentioned suspecting the document is still being parsed - Just to clarify on this so you have an accurate mental model - The once you can open a document, it's already imported/parsed into plain text so the reader can process it. The lag is simply the nerual models synthesizing the audio queue.

Like I said, feel free to contact me via the support option and I'm happy to send you a promo code via email.

Thanks again for the feedback.

-Zach

Thanks

It's a pleasure conversing with anyone who, like me, isn't subject to the TL;DR plague. I'll eventually hit you up for that promo code if necessary, and thanks for all the clarifications. I just noticed the filter in the voices library. With the system premium voices, the app clearly compares well to Voice Dream, particularly for anyone who doesn't have an old license for it. I think heteronyms (it's not homonyms, btw) are still a considerable ways out for any neural engine on its own; that would require a chatbot combo.
At least for Windows users, your competition is with Microsoft 365 Immersive reader and Narrator, which now has access to equally-amazing neural voices on computers that have an NPU. Neither of them lags, except when immersive reader pauses between paragraphs, and they remain expressive. So, for now, I'll stick with using two hands at no cost when I need to get away from formant and concatenative synthesizers. Neither of the above options filters in-line citations or parses PDFs, though (although opening PDFs in Word has gotten pretty good), so I'll be watching. The hold up is the hardware, seems like.

Thought on hardware

it might be the difference in chip set between the two phones the iPhone 16e uses the a 18 ship The iPhone 16 pro Max uses the a 18 pro chip. Regardless of what type of phone you’re using and this app does work on earlier phones, by the way you’re going to run into these hardware constraints regardless of the app used. at least if it runs off-line, Neural TTS. I’ve tried a couple of others that run similar engines but typically find that Listen2 does not cause as much battery drain as those other applications. and in this particular classification of application, this is the most accessible one I’ve found so far.

Listen2 Developer Here - Introducing myself and Happy to Hear Feedback

Options

Comments

Listen2 Reader app submission coming soon

Welcome and some early thoughts

VoiceOver tips and tricks for battery drain Control

Re: VoiceOver tips and tricks for battery drain Control

Progress and highlighting

Pete feedback

Re: Pete feedback

playback speed and resume after siri announcement

A question

Alternative to listen to for individuals who require braille dis

Version 1.6.0 is live and improves on the heat issue

Playback duration, elapsed time now in v1.7

TL;DR Observations

A couple of thoughts

Good Observations!

Thanks

Thought on hardware