AppleVis Extra 101: Future Echoes - In conversation with the team behind Echo Vision smart glasses

By AppleVis, 30 October, 2024

Welcome to AppleVis Extra 101, where Dave Nason is joined by Xiaoran Wang and Huasong Cao from Agiga, the team behind the upcoming Echo Vision smart glasses. Check out some early demos, with more to come, on their YouTube channel at: https://www.youtube.com/@AgigaAi/videos And learn more on their website at: https://echovision.agiga.ai/ The team would love to hear your feedback, so please comment below, or contact them through the website.

Transcript

Disclaimer: This transcript was generated by Aiko, an AI-powered transcription app. It is not edited or formatted, and it may not accurately capture the speakers’ names, voices, or content.

Hello there and welcome to the AppleVis Extra.

This is episode number 101.

My name is David Nason and I'm delighted to be joined by two fantastic guests from Agiga.

We have Xiaoran Wang and HuaSong Cao

Is that correct, guys?

Thank you, David.

Thank you for having us.

So yeah, delighted.

And the product you're going to talk to us about is the Echo Vision.

So before we get into that, though, do you want to tell us a little bit about yourselves, the company, what you do, really, and how it all came about?

Sure.

My name is Xiaoran.

I'm the CEO, and before doing this company, actually, I had long experience building and shaping intelligent devices.

I started my career at Amazon Level 26.

That's the first place of the world's first Kindle and also the world's first Alexa.

And I was actually in the early team that developed Alexa devices.

And this experience gave me good exposure and got me enough experience on how to build a good intelligent device.

I think that's one of the confidence I brought in when I was doing this company that we can build a great product for the community.

Brilliant, and obviously, Alexa, a device that's very popular, being a voice-first product, very popular in the blind and visually impaired community.

So yeah, that's really cool.

And tell us about yourself, Hua Song.

Yeah, thanks, David.

So my name is Hua Song, and I'm an engineer by training.

Before I started this company with Xiaoran, I was with Google for about nine to ten years.

I did various software projects, and the latest one is Google Assistant, the voice assistant similar to Amazon counterparts.

Yeah, I've been enjoying doing software, building stuff, both hardware and software.

And with Giga, what we are trying to do is really to use our expertise built prior to this company and take that to something that we can really use to help everyone.

That's brilliant.

So do you want to quickly tell us what the product is, and then we can delve into a bit more detail.

Yeah, so the product is called EchoVision.

By looking, it looks like a normal pair of glasses.

The key feature is to articulate visual information into voices.

Think about it.

If you can't see this, and there's someone next to you, how will this person help you?

So basically, he or she is going to describe it for you, like read it out for you.

And that's how we envision our product, basically.

It's like an assistant, a virtual assistant that does this work for you.

Well, I was going to say, I suppose we've had apps like this on our phones for a while, the likes of Seeing AI and those kind of apps and Beat My AI.

This is obviously wearable, which is, I think, one of the key factors because walking down the street or even just in your own home, having to use your phone or hold your phone out is maybe an inconvenience.

So I assume that's a big part of the idea is that it's wearable.

Yes.

Yeah, we talk to lots of blind people, and people love Seeing AI, people love Beat My AI.

However, being able to do it hands-free is a demand we hear from lots of people.

And for example, if you are in the middle of cooking, you have a question, you have a head study, and do you want to stop everything, wash your hands, take out your phone, take a picture and do this, or you can just use smart glasses and ask a question.

Or another scenario, like if you are traveling, say you're walking in the airport with luggage and also walking in your hand, but you want to get help from the remote assistant, how can you have another spare hand to hold the phone and do all this?

But now you do not have to worry about it, as it's glasses, you can do all this hands-free.

Absolutely.

Yeah.

And does it connect to your phone, or is it the device itself, or how does that?

Yes, it's a good question.

It does connect to your phone, it leverages the connectivity your phone has, it uses phone cellular data if it's in a non-Wi-Fi environment.

Great.

Hwasong, I think you were about to say something, sorry?

Yeah, I was going to add that we've been hearing that hands-free is being the top, top ask from the users of Aira and Beat My AI, so hands-free seems to be a must-have when people are using these type of features.

And also adding to the connectivity, the phone works as the hotspot and enhances the functionalities of the glasses, but in cases that the phone is not around, if there's Wi-Fi, the glasses can still connect to the Wi-Fi and be useful.

Oh, fantastic.

That's good to hear.

And is it running your own app, essentially, or suite of software, or are you running other things?

Do you run other things?

Do you work with the likes of Beat My AI and Beat My AI or Microsoft to put on your device, or are you using your own stuff?

Yeah, this is also a good question.

So, while we developed this, we did notice there are so many apps that are already loved by the community, and they work pretty well.

And what we're trying to do is leverage their capabilities.

For example, C-E-A-I, it's the most favorable app for people to do OCR.

And in this case, it doesn't make sense for us to develop our own software for OCR.

And we would work with C-E-A-I to get this.

We're still working on that, but that is a goal.

And similarly, the remote assistance services provided by Beat My AI's, our era, we want to leverage this as well.

So basically, I think one key differentiate of our product is we try to serve as an open platform that can integrate the favorite application by the community.

Even now, we have a few already integrated, a few in progress.

But also, in the future, we're going to have a leaderboard, and we'll ask you to vote for the app you use most, or the app you love to use with the smart glasses most.

And based on this, we'll decide which ones to put on at coefficient.

I see you very much want community, as well as feedback and ideas for these.

Yes.

If there's something we can already use, and again, love by the community, we'd really love to use them.

So we are already talking to a few partners.

We'd like to talk to more partners, so please talk to us.

And we do have something native that we built.

This is to basically make the experience seamless.

So there is a native experience there as well that does AI, and that kind of thing, is this?

So the native experience helps with, for example, from the beginning, setting up the device, setting up the connectivity, some basic voice interaction.

And if there is a better choice to do the same description, we would leverage that.

But if there is a no for certain use cases, we have this native application that does that.

So speaking today, I know you said pre-orders, we'll come back to this later, I guess, but pre-orders will be going out in the next few months or whatever it may be.

What would people get in terms of AI and functionality today versus what you hope to add over the next couple of years?

So at the beginning, there are two stages.

First stage is to allow people to have easy access to AI.

So the goal is mainly to have a smooth experience end-to-end, have a great device.

And in terms of what AI functionality, we already focus on the most frequently used, for example, scene description, live description of the environment, read the text.

So that's the first stage of the product.

And as we pass that, we really want to leverage the capability we have on the device, which means it has a camera, it has mic, it has access to the advanced AI models.

We can do so many more, so much more smart things with this.

So one feature, like I always think people would love is we can record a voice memo and also ask the assistant questions about memo I made earlier before.

For example, I said, OK, write down Ryan's phone number.

And later on, I can ask, oh, what was Ryan's number?

And for example, another example is I said, I say, I just found a new recipe and this is it.

And later on, I can ask, OK, what was the recipe and what ingredients are there?

So this is one thing I think would really help people.

Yeah, that's cool.

It's kind of like advanced voice memos and unawareable, so you don't have to even take your phone out.

So yeah, that does sound like a cool idea.

Which AI models is it using?

Is it open AI or cloud or are you able to access all of them or does the user select those or what happens?

Yeah, that's another interesting topic.

So for us, we definitely could start with one and also we may have some in-house optimization on the models.

But as we talk about this, I kind of want to hear, how do you think of the idea of having this opportunity to have this privilege to be able to select from the models you want to use without smart glasses?

It's an interesting question, isn't it, because I suppose on our website there'll be a comments area underneath this podcast and you guys can come back and see, do people listening comment and give their opinions on this, I suppose, because you're trying to balance giving people choice and those who are technical and really into their AI and really kind of have a ceiling that they want to use a different AI for a different task versus keeping it as simple as possible for somebody to just pick up and use it.

So I suppose it's probably a case of having it, giving people the option to choose but not being required to choose.

Is that probably what you're aiming for?

Yes, I think we will have a default model for everyone to use.

I guess most people, they don't really care.

But for example, when I use large language models for my work, for helping with my work, sometimes I play around with different models and see which one gives me the best answer.

I want the blind and low vision community to have the same opportunity to do these things.

So it really depends if there are people interested, if there are, I think one thing we could do is to allow people to choose which model they want to use.

Yeah, it makes definitely sense in that and being able to switch for different tasks, like you say, if that's something that somebody wants to do and then some people might prefer to just stick with one all purpose model that they're happy with.

So yeah, I understand definitely.

Yeah.

Adding to that, we did notice that some models have limitations, some models perform better in certain use cases.

So we will need to at least evaluate some of these and make the wise that we are satisfied with our choices.

Yeah, definitely.

And I don't know how long have you guys been working on this project and how much change and development have you seen in the models in that time?

Yeah, we worked since beginning of the year.

I have to say this is an area that are lots of exciting, lots of excited program in academics as well as in the industry.

You probably see it in the news as well as the latest release of GPT-40, 01, and same as cloud and Lama and all those, it's a very exciting area.

And actually, it's really hard to say what's going to happen in the next six or 12 months because everything is progressing so fast.

And I'm so happy that we can bridge the gap and bring this advanced technology to this community to the people who really need it most.

And I think for the blind and low vision people, they really deserve to enjoy the latest of the technology.

Yes.

On some high level, I think people already know some of these milestones the large language models have made previously, we were only able to do text earlier, we started be able to do vision and very recently there is more logic understanding built into the models so that there's more reasoning in the model.

And I think what might be still missing in our experiments and also get back from users is when we say something, the positioning between relative position between objects in the scene to switching can be still not accurate enough.

And another area might be the distance.

So while we say the monitor is how far from myself, that distance estimate can be still wrong.

But those areas are still something to be true.

Makes sense.

Yeah.

And like we saw, a lot of people would have seen the video, Be My Eyes, put out a couple of months ago showing GPT 4.0 that you mentioned and almost a live ongoing conversation.

So it wasn't just sending pictures, it was ongoing video.

This isn't in the hands of users yet, but I assume you're aiming to go that direction as well.

So you could be walking down the street and it could be telling you there's a car on your right hand side or there's a pole on your left hand side in 10 meters and that kind of thing is, I guess, what you're part of what you're aiming to do, is it?

Yes, that's right.

So definitely we need more advancement from the basic models provided by OpenEye and other companies.

But yes, that's our goal.

We are able to achieve some of these still not through the streamed videos, but some consecutive pictures.

And we are very hopeful that some of the advanced models would help us achieve that sooner.

That's cool.

I suppose that relies on these, they're getting a lot faster, which is key.

So no longer are you waiting 10-20 seconds for the scene description.

You need it to be a very fast scene description.

Yeah, definitely.

There's a limitation of the model latency, but there's also something we can optimize from the user experience side.

Yeah, actually one feedback we constantly got from the beta user is really how fast we respond.

Yeah, which has cost so many walls from the user, saying, wow, this is so fast.

That's brilliant.

Because yeah, that's definitely going to be one of the biggest differentiators, I think, if it can be fast, because that's what makes it really useful, is when it just works when you need it straight away.

And I assume that's one of the big advancements, is they're getting faster.

So that's great.

Can I ask a bit about the physical hardware as well?

What do they look like?

Because even if it's blind people, we still care how we look, and we care what things look like, and are they comfortable, and that kind of stuff.

So what kind of work has gone in on that side, and can you describe what they're like physically?

Well, let me think about it.

How do you want to take this?

Yeah, let me try it.

We're still hearing feedbacks, and our final final design can be a little bit different from what we described today.

So please continue to send us feedbacks.

Right now it looks like a regular pair of glasses.

So it's not sporty.

It's something that we can wear inside, outside.

In addition to the glass frames, we are getting feedback regarding the lenses, how dark it can go, whether it transitions, the arms, whether that wraps around the back so it's more secured, so it doesn't fall off easily.

Also we're hearing feedback about the side paneled arms, whether it's a little bit wider so that it shades the light.

All those we are taking into consideration.

I think the goal is to make it a regular pair of glasses so it doesn't stand out.

In the meantime, it fulfills those requirements we just talked about.

So it's about as we have to make, but we're trying to make that happen.

Cool, yeah.

The lens is an interesting one because I myself have some usable vision.

So dark lenses or very dark lenses don't necessarily appeal to me, but then to other people they're fine or they don't mind at all.

You know what I mean?

There's a kind of transition, probably what makes sense to me, but something else won't make sense to somebody else.

So specific to that, what we are designing right now is either type of clip-on or something you can detach and attach as the darker lenses so that immediately while we go outside you can attach that to the glasses as additional darker lenses.

And do they have a speaker or do you connect Bluetooth earphones to your phone to get the sound, to get the speech from them or how does that work?

So it's a so-called open ear headphone.

It doesn't go into the ears, but the speaker is embedded in the arms of the glasses.

It's Bluetooth connected to the phone, so it kind of works as the regular pair of Bluetooth headset.

We've been hearing some feedback regarding people wearing hearing aid, whether the audio can actually go Bluetooth to the hearing aid instead of going through the air.

It's something we are still investigating, so hopefully we can have that answered to that soon.

Cool.

That would be good to know.

I think even for people who aren't, who are just using regular Bluetooth earphones, maybe from a privacy point of view or that kind of thing, that if you're around people that maybe you prefer to have in-ear rather than these.

But I assume there's not much sound leakage from them, is there, that there are people around you who don't hear too much?

Yes.

Okay, very good.

And you could use them just for listening to music and that kind of thing?

Yes, music and podcasts like this.

Yeah.

Sounds good.

Can I ask you actually just, I suppose, a change of topic a little bit, like where did the idea come from or what, you know, do you guys, if you don't mind me asking, have connections to the blind community or, you know, what was it that made you decide to build something for this particular group of customers?

Yeah, it's a fair question because people often ask, like, you are not blind, your husband is not blind.

Why do you guys want to help us?

Right.

Why do you guys do this?

I think, let me talk about what I thought and you can share your idea.

So for me, I spent a long career building all these intelligent devices, as I just mentioned, right?

One is one of the successful ones and there are a few others, like I have done like fashion assistant, I have done smart refrigerator, I have done smart camera, I have done smart locks and I have done different things.

And doing that, and I have done autonomous driving, yes, and doing that for like over 10 years of my career, which I didn't make a huge impact by shipping all these products.

I still look for how can I make more impact.

For example, Alexa is a great product.

People love it and it also helps the community.

However, like it's more like, like, adds up, it's more like a measurement, right?

I say, oh, Alexa, turn on the light, Alexa, what's the weather today?

It's too, I mean, it's great, but I'm looking for something bigger.

I want to do something that has a bigger impact.

And I used to work with Dr. Mark Gumaia when I was at school.

He is a inventor of the first artificial retina.

So that's when I started getting in touch with this community and to see their pain and to see how technology could help them.

However, like it's, there are so much challenge to try to understand how our brain works, how does optical nerve work, how the retina works.

And it will be a much harder path to get that out.

And as my experience was in AI, so I'm thinking, okay, what can we do, like can we leverage AI to help this community?

And the answer is yes.

Maybe my side of the story, I don't think I know this community enough.

So when we started, Sharon shared this idea and I wasn't fully convinced.

So what we started was to actually talking to our local groups, local groups of Hawaiian and Norwegian people.

So I went to some of these connected with them.

We actually later started to serve the community by providing free WhatsApp calls so that people can call us and we do the virtual and visual interpretation service.

And I was one of the people behind the phone answering calls.

Through those, I felt there's actually a lot of needs from the community and people ask for help, but they support us to give us feedback, really want us to succeed, making some of the new product that can help them.

And through the experience, I felt I'm actually helping people, making use of what I know in this field that can make impact.

And I feel very rewarding every time I actually helped someone doing something.

That's brilliant.

It's great to hear that.

I would always say that I think technology and I think many, most people listening to this would subscribe to the idea that technology is one of the biggest ways we bring down barriers and that we gain opportunity as blind people to access the world the same as everybody else and get rid of those barriers that get in our way.

So yeah, it's great to have stuff like this coming along and as well to hear that you're speaking to the community a lot and getting that feedback constantly and getting ideas because it can be easy to come up with ideas and not doing that piece and then creating something that doesn't work, you know what I mean?

That doesn't do what people actually need.

So it's great to hear that you're very much engaging with the community on this to make sure that it actually does something that we need it to do.

Yeah, definitely.

Yeah, we actually served over 100 people through these services.

And the reason why we offer this is not to try to compete with someone else or try to make money.

It's money to understand what's the need of the community, how technology is getting used, what other pains you have in your life.

Definitely.

That's cool.

How do you think your product compares to things like the Meta Ray Bands or the Envision glasses or Celeste, some of the other ones that have come along in the last year or so in particular?

Yeah, that's a good list.

I would say it's so much better.

Our glasses may not be the best for everyone, but it's so much better for the community compared with Meta Ray Band, I have to say.

A few things, first, I don't think anyone else have listened to the community so much as what we did.

We heard people are saying, okay, first, want to read text, like doing OCR as is, not doing summary, not doing the high level thing, but read this line by line, word by word.

That's one.

Second, people say they want to do hands free remote assistance, be able to use ERA and Be My Eyes volunteers.

Third, people feel the ribbon glasses is not the most comfortable to wear.

So we hear all this and lots of other feedback and incorporate that into our products, the Echo Vision smart glasses.

And we are constantly listening to feedback as we build this.

So I think this is a key differentiator for us compared with Meta Ray Band because we really care about this community.

This smart glasses is built for the blind or vision people.

And so as I mentioned a few times, if you have any feedback, please, please let us know.

You can email us at contact at epica.ai or you can comment down below the podcast or at other like Apple base posts.

So we're really, really able to hear this and that's how we can make this a great product for you.

Adding a little bit more to what Sharon already mentioned, I think one of the advantage we have is we are new.

So thanks to all those competitors, they are out there in the market.

So people already saying good or maybe not so good things about them.

And we got to hear them so that we can improve on those.

And because we are new, so we got to the opportunity to use the latest software and hardware technologies so we can build something that based on all those latest things.

So potentially we can be the latest to latest.

Yeah, makes sense, actually.

It's always a good way to.

To add on that, I forgot to mention, I think one differentiator from Celeste and InVision is we're actually able to build our hardware from scratch and tailored for this community.

For example, InVision was built based on Google Glasses.

And this is, I think everyone knows, right, though Google is a great company, Google Glasses is a good product, however, it's not designed for the blind or vision people.

Google Glass is designed for other purposes.

It even has a huge component on it.

That's for showing text letters, which is definitely not needed by the community.

But you have to carry the weight of like half an ounce on your face every day because this glasses is not made for you.

Yeah, I got you.

So it's, yeah, it's very much a product that's built for a specific, built from the ground up for a specific group, like, yeah, as opposed to a mainstream product that's being utilized or adapted.

Right.

Where can people try it?

I think you've been to some of the conventions and things, have you?

Is there any more kind of places that people will be able to get their hands on a pair and actually, you know, try them out or demo them coming up?

So let me start, Sharon, please add.

So we have limited time, number of devices and positive travel.

But we are thinking of CSUN next year, 2025, as the one that we definitely demo to wider audience.

Before that, we are going to local groups that we can reach.

And also, we are definitely shooting some more demo videos and posting them on our YouTube channels.

Brilliant.

Is the YouTube channel, is it Echo Vision or is it a Giga?

Yes.

We're going to post the link down below this podcast and people can pull from there.

OK, great.

Pass that on.

We'll do that.

Do you have any idea around, you know, pricing yet?

I know that's something people are always keen to understand.

And is it an upfront payment or is it a subscription or both?

Yeah.

Do you have a sense of that yet?

Yeah, this is a great question.

And I've got to mention that I think one wild compliment I constantly hear is the price for real assistive devices is so high, it's just not thought affordable for lots of people.

It's easily spent a few thousand dollars on a single thing and coming out of the background, being able to build and ship so many intelligent devices, I don't think this is right.

And bringing the costs and the price down is one big goal for us.

And we're happy to announce the listing price of the product will be $599.

And for a limited time right now, we're doing pre-order for people who pay a deposit and they can have the promotion price as $449.

OK. That is very affordable by assistive technology standards, like you say.

Right.

We even heard some people say, oh, yes, please make some product for the mass audience, not for our community, otherwise it will be so expensive.

I know what you want, I know you want affordable devices, don't worry about that.

Is there a subscription for the AI or is that all, is everything covered in the upfront costs?

Yes.

So to be able to use the latest advanced AI, there are things that have to be processed on the cloud and which incurs an extra cost.

Right now, we will not charge people to use the device for people who, so two things.

First, for people who pre-order, there is a limited promotion right now.

Not only you get that discounted price, but you also will enjoy lifetime free, means if you do not have to pay for subscription lifetime.

And for people who buy later, at a later time, we will not charge for subscription for the first year.

We really want you to enjoy using the device, interact with it, with every moment of your life and really get a more easier, more independent life.

Right.

Yeah, people definitely be keen to hear that and know that hopefully that they remain affordable to them.

And I guess then if there's hardware upgrades is where you would be paying again, but that's only if you choose to get a newer model five years down the line, that kind of thing.

But otherwise, your classes will keep working for as long as you have them if you pre-order.

Is that right?

So yeah, software upgrade is definitely free.

The subscription charges may come after the first year, but I think what's more important is that we are trying to bring some value that makes sense for people to pay for that subscription.

If there is nothing value-wise we are bringing to the users, then there's no point of charging anything.

Very fair and are they at the moment for the pre-order?

Is that international?

Is that US only at the moment or where are we at?

All over the world, but at the very beginning we'll only ship to the US address.

We look forward to expanding to other parts of the world.

We want to know if that happens.

Okay, great.

And language, is it English only at the moment or are other languages supported?

For launching, it would be English only, but are you about to say something?

No, no, you answered it well.

And do you have an idea of, obviously you're taking pre-orders now, do you have an idea of if somebody pre-ordered today when they would expect to get the product?

Yes, so the target shipping date is Q2 next year.

So that kind of coincides nicely with the target of getting to CSUN, I guess?

Yes.

Maybe around CSUN time we would have a few more devices to give to our beta users slash pre-orders to test them out.

Brilliant.

Absolutely brilliant.

Yeah.

That's all really good.

Thanks guys.

Do you guys have anything else you'd like to discuss about it or mention about the product or what's coming up?

I don't know if you want to mention our website, which is echovision.giga.ai.

So it has a detailed description of the product and also if you're interested to pre-order, you can order from the site or you can join our email list to hear more updates about the product.

Great stuff.

Yeah.

Other than that, thank you, David, for your time and having us.

Thank you both so much.

Thank you, Zhiran and thank you, Hwasong, delighted to have you on and really enjoyed the conversation.

I think this AI stuff is really, there's really exciting opportunities in it for us.

So I think, yeah, people are really interested to hear about it and I really appreciate your time and looking forward to hopefully next year getting my hands on a pair and trying them out for myself.

So thank you.

Thank you.

Thank you, David.

Thank you everybody for listening.

Bye-bye.

This podcast was brought to you by the community at applevis.com.

Applevis, empowering people who are blind, deafblind, or who have low vision to get the most from Apple products and related technologies, a proud member of the Be My Eyes family.

For more information, visit our website at www.applevis.com.

Bye. you you

Podcast File

AppleVisPodcast1631.mp3 (36.14 MB)

Search

AppleVis Extra 101: Future Echoes - In conversation with the team behind Echo Vision smart glasses

Transcript

Podcast File

Tags

Options