OK so I put this under accessible technology for reasons that you will learn.
I have recently been playing around with the new ChatGPT ai agent and my fellow blind folks, this is something else entirely! I am having it use the back end of Wix to design and format an entire website that Iāve been wanting to build. Itās created the text, itās created the images, itās place things in the proper location, it is uploading Audio files to the website for me, itās changing all of the fonts to make it look appealing to the eyes, itās changing layouts to what Iāve prompted it to do, this is nuts. This pretty much makes every single web design platform or website accessible if it can take the actions on your behalf.
Comments
is it free?
Hi, Is it free? or not. and can it work with, say, emailing a friend?
iād love a podcast on this
iāve tried creating websites before with minimal success, so I would really love to know how you get ChatGPT to communicate with wix.
@ JC
No you either get it with the plus or pro plan of ChatGPT.
@ xenacat3
I wish I had time to do podcasts because this one wouldāve been a great one!
So what happens is chat GPT opens up its own internal computer, and you tell it what you wanted to do. You do have to provide login information so ChatGPT can go login for you and do what youāve asked it to do. For example, Iāve been giving it quite extensive prompts in regards to color schemes for the website, creating search engine optimization, writing and inserting a terms of service, adding files to the website and my goodness it has been 6 hrs of omg lol. It will also tell you what it is doing and how it is doing it as it navigates webpages, clicks on buttons, and makes selections based off of your instructions. Thereās way too much to put in hereā¦
@ @ xenacat3 Bookmark
It will also go through and see if thereās any sort of design flaws, it will give you feedback and then you can tell it whether or not you want it to improve what it had suggested or if youāve changed your mind and completely wanted to do something else.
@ xenacat3
The coolest thing is watching the system problem solve. It does move slower, but this only came out about a week ago so I donāt expect it to be as fast as the latest ChatGPT model but this thing has reasoning, problem-solving, navigation abilities etc etc.
My initial thoughts
I had access to the ChatGPT Agent before my plan expired, and honestly, at first it was difficult to even log into the portal. I remember trying to use it literally the second day it launched, and I needed extra help to get into the interface. Iām hearing a lot of mixed reviewsāsome users say itās accessible, others say it isnāt. When I started using it, I went in with the expectation that Iād be able to book flights, pay the bills, schedule meetings, do a bit of shopping, maybe even order food. But in reality, it wasnāt able to handle most of those things. To be fair, I havenāt tested it extensively, especially since my plan has expired now. I keep meaning to renew it and mess around with it more once I have the free time. From what I experienced, it mostly felt like an upgraded version of the memory feature. That seems to be the main use case right nowāhandling files like documents, scheduling reminders, creating calendar events. But Iām wondering: can it do anything with media, like videos?
One important thing to remember is that when it comes to ChatGPT Agents, a lot depends on which websites are integrated or onboard. For example, I thought Iād be able to order groceries from Amazon through the agent, but that didnāt happen. Amazon, being the giant that it is, doesnāt really want third-party systems getting involved in their operations. So because of that limitation, those kinds of shopping features arenāt availableāat least not with the bigger names like Amazon, Walmart, or Target. That said, I think thereās still potential here. If smaller independent retailers and grocery stores are smart, they could partner or integrate with systems like this and carve out a space. That might be the route forward. Iāve seen people doing some impressive tasks with the Agentāwriting code, designing flyers, formatting documents, creating spreadsheets, prepping PowerPoint presentations. I have no idea how theyāre doing all that, but it looks like thereās a lot of potential for power users. Still, for me personally, I found it a little underwhelming compared to what I was hoping for. Itās also a bit tricky to get the hang of.
I would love to see a full walkthrough or demo of what this thing can actually do. Iām surprised they havenāt done one yet, but maybe itās because itās still earlyāonly been out for a couple of weeks, if I remember right. Ultimately, I think it needs more time to mature and really live up to the hype. But if someone wants to try it now just to get a sense of where itās starting and compare that to where itās heading later on, $20 isnāt a bad priceāif you can afford it.
@ Winter Roses
honestly, it just uses its own internal computer and accesses the website like you would if you were accessing it yourself. I mean, it works with the files, audio, video, but the coolest thing is it navigating the web and taking actions on your behalf. Those actions itās taking obviously is based on your instructions. Using it as fully accessible. I use NVDA. To be fair though I have not tried it with jaws. Itās as accessible as a regular ChatGPT window. Iāve posted it here because Iāve just been using it to completely design an entire website from scratch. You do have to ensure your prompting properly and your prompts are as detailed as possible.
@ Winter Roses
If for some reason, it canāt access the website, it does problem-solving and it works around that and it finds a solution to access the site.
Iām recording a little demo for you guys now.
Iām going to do my best at recording a little demo. Iām not the most entertaining so bear with me. lol.
What ChatGPT AI agent can do for you
ChatGPT AI agent can assist with a wide range of tasks. It can generate text, answer questions, write and rewrite copy, translate languages, summarize articles, brainstorm ideas, produce outlines or scripts, and even help with coding tasks. It can interact with websites and applications on your behalf, reading content aloud or filling in forms, making technology more accessible. Because it's conversational, you can refine the results by asking followāup questions until you get what you need. This makes ChatGPT an extremely versatile tool for productivity and accessibility.
donāt mind that last comment
I actually was doing a demonstration for you guys on audio on how the ChatGPTAI agent can work. I just essentially had it navigate to the Applevis website under my login information and it posted it under my account. Here is the audio file.
https://www.dropbox.com/scl/fi/y0ah9bogluanro9yh8vkm/audio1798085360.m4a?rlkey=j0es6fliaa6oq3u8bgnis3c99&st=55htxlsd&dl=0
Winter Roses
I've been playing arount with it for a little while, and I've made it add stuff to my amazon shopping cart, book me movie tickets on bookmyshow, so... Haven't tried booking a flight because I haven't had to yet.
and @JC certainly you can make it email your friend so long as you are ready to give it your email cridencials.
Generally speaking, it's truly a game-changer as far as web accessibility is concerned... Should try it with, say, wordpress or youtube...
Happy path
One of the problems with current large language models is that they get relatively good at tackling the so-called happy path in coding problems, which is when all the interactions are easily predictable, but fail to tackle even trivial edge cases sometimes, potentially resulting in security problems. Furthermore they are also prone to hallucinate, and this problem has actually been getting worse lately, with consequences like misspelling dependencies that don't really exist, opening a window of opportunity for bad actors to register them and perform supply chain attacks similar to typo squatting for humans. Beyond this there are also code quality problems, in which the AI tends to generate extremely verbose solutions to problems that experienced programmers can solve a lot more efficiently, which makes the generated code unnecessarily much harder to reason about.
All the above combined results in a huge pile of bloated code with lots of technical debt, skyrocketing costs from token usage, and since the time and memory complexity of context windows increases quadratically with their size, it's not even that hard for a medium-sized codebase to hit resource limits so the whole thing is extremely unsustainable. While I think it's perfectly possible to build hybrid models that take as much advantage of existing algorithmic solutions as possible to significantly improve their efficiency, and I have my own theories about them that I will start experimenting with soon, I think that doing so will require a huge paradigm shift that may not happen before the current AI hype bubble pops.
One potential time-bomb issue for this technology is a phenomenon in which training new AI models on the output of other AI is known to lead to model collapse due to a yet not understood increased tendency to hallucinate, which is becoming a problem given the proliferation of AI-generated content on the Internet, and might already be adversely affecting the latest frontier models significantly. This content is often called AI slop mostly because it's easy to generate without providing much in terms of actual value.
loggin sessions?
If using this requires me to share my login credentials with ChatGPT, then thatās a definite no from me. Even if thereās a way to do it manually, Iād want to know if the process is accessible.
Until those concerns are addressed, I think Iāll be steering clear of it.
Also, once you're logged in to a service through ChatGPT, how do you end that session? Is it as simple as deleting the conversation?
Using chat gpt agent on iphone
Hey guys has anyone used chat gpt agent on the iphone and if so what has been your experiences?
Is it easier to use it on the computer or can you use it on iphone too.
Other things that people have tried with Chat GPT agent.
Hi guys.
So with chat GPT agent what else have people tried to do with it.
Using ChatGPT agent on the iPhone
Hi guys, so I wanted to let you know that I finally activated a ChatGPT plan to try out the Agent. What I attempted to do was to log into this website to post a commentāsimilar to the example shown above. Unfortunately, when using the iPhone, that doesnāt seem to be fully possible. I turned on Screen Recognition and was able to confirm that the username and password fields were on the screen, but they werenāt accessible with VoiceOver. If this is a bug or an accessibility oversight, it needs to be reported to OpenAI so it can be addressed as soon as possible.
On the bright side, ChatGPT did successfully manage to navigate to the website and locate the login page, which worked well. I was also able to type my username and password directly into the chat, and the agent was able to enter those details and log me in. That said, from what Iām seeing so far, you have to be extremely specific with your instructions. In some cases, you need to know exactly what youāre looking for in order to get the results you want. For example, I wanted to post a comment on this specific post, but I couldnāt remember the exact name or title. That ended up confusing the model a bit, so maybe websites with a clearer structure or layout might work better. I realized that ChatGPT doesnāt automatically recognize that a post is about itself, which makes sense, but it means youāll need to be extra clear when giving instructions on sites with dynamic content.
Let me see if I can explain this a little clearer. So imagine youāre on a virtual supermarket website. You decide that for breakfast today, you want a box of Cocoa Puffs, a bottle of Pepsi, and a loaf of bread. Now, on these virtual supermarket shelves, ChatGPT is scanning through categories like āCereals,ā āBeverages,ā and āBakery.ā If the Pepsi is sitting in the āRefrigerated Drinksā section or the bread is in āBakery,ā then ChatGPT will likely find those items pretty quickly because it knows where to look and what those categories typically mean. But letās say thereās another person who owns a completely different websiteālike Mary, who runs a baking site. She sells chocolate chip cookies. Now you say, āChatGPT, order me a box of chocolate chip cookies and a sugar-free glazed blackberry doughnut.ā If the doughnut section is clearly labeled or easy to access, the model might find it right away. But if Mary filed her cookies under something more abstract like āMary's Confectionariesā or āSweet Bites,ā ChatGPT might still be able to get thereāitāll just take a bit more time and work. Thatās the part Iām trying to highlight. For the model to be most effective, you need to be specific. The reason I couldnāt post my comment on the site was literally because I didnāt remember the title of the post, and I couldnāt recall which section it was under. If you donāt have a good mental layout of the website, it can be much harder for the model to perform the task, even if it gets you in the right general area.
It was able to locate the username and password fields easily because those are common across websites and clearly labeled. ChatGPT understands those elements wellāit knows, āThis is the login box, and this is where I need to input credentials.ā But if something is tucked away under an unusual label or section that isnāt visible on the screen directly, I donāt know how many places the model actually searches before it gives up or times out. Unfortunately, I didnāt get to explore that part much because, like a lot of people are discovering, thereās a time limit. Once you hit it, youāre no longer able to interact with the agent for the rest of the day, and I had already used up my window.
Right now, many of the more advanced features are limited. It looks like you only get 15 minutes per dayāor maybe per sessionāwith the browser, though Iām not entirely sure yet. I assumed Iād be able to talk to the agent hands-free in voice mode and have it carry out the tasks for me, but that doesnāt seem to be possible. I noticed that when the task is completed, my phone vibrates and I get a notificationāwhich is a nice touch. Itās definitely a bit slow, but thatās expected given that weāre still in the early stages. If someone were going to do a full review of the product, I imagine theyād need to edit the pause time or task to fit while the model processes everything in the background. Anyway, I couldnāt get it to post the comment, but this is only my first time using it. Iām assuming things will improve in the future as they continue building it out.
not too impressed with this
Hi,
So when using this on Windows, both through my browser and the desktop app,the virtual browser, the browser you can use to enter your username and password, plus takeover from the agent in general if you need to click something the agent won't do like a Captcha is totally inaccessible. I've tried with JAWS and NVDA, NVDA object navigation OCR, and the JAWS cursors and OCR but nothing works. And it won't go to amazon.co.uk or amazon.com at all, even if I tell it to go to this page without completing any task. There's a checkbox on audiogames.net that it won't click because it's a Captcha, and if I take over from the agent, I can't access the checkbox no matter what JAWS or NVDA commands I try. I mean I could give it my username and password for something, but I'd have to keep changing the password just in case it stores it and my security is compromised. I'd only give it my credentials to log into something if something was really inaccessible, but I'd be changing my password after logging out that's for sure.
@ Winter Roses
You only have it for 15 minutes? Thatās strange because I was using it for 6 hours yesterday editing my website and still have time left and Iām use the $20 a monthly but I might go to pro now. Iām loving this thing because it can work on my business while I work my regular job.
Pro and Accessibility
I was wondering too about how accessible interactions are, as it is using a VM. SO it seems Stephen is doing tasks that do not require him to interact with the virtual browser?...
As for usage, according to a chatGPT.com page, the Pro plan allows you 400 messages a month. So I guess try to pack those messages?
It is an interesting project for sure and I will keep monitoring it but need some more advances before it can help me with my job.
By the way, Claude has a similar agent but its not been in the news lately.
Answers and clarifications
When I was using the ChatGPT agent this morning, it disconnected, and I couldnāt get it to reconnect again. Iām pretty sure I saw a time and date saying when it would be working againāthough I could be totally wrong about that. But the second I saw the message, I instantly assumed the product was limited in some way, kind of like how the advanced voice feature is restricted. A lot of the more advanced features with ChatGPT seem to come with limitations, which makes sense. I mean, with the agent especially, itās pretty obvious whyāmany members are trying to use it, and the system needs to keep up and handle all those tasks efficiently. I donāt even think anyone using the free version is going to get access to the agent. If they do , itās gonna be extremely limited. So if I want to explore more of what it can do, Iām gonna need to play around with it some more when I have the time.
Now, regarding Amazon and shoppingābased on what Iāve been reading online, Amazon is not one of the supported shopping websites you can use through the ChatGPT agent. And again, this isnāt that surprising. Amazon has worked hard to become one of the biggest names in online shopping, and the last thing they want is some third-party AI stepping in and acting as a middleman. Theyāre not going to give that kind of access freely. My thinking is this: smaller businesses, if theyāre smart, will absolutely jump on this opportunity. If they can integrate with the agent, lower their prices, maybe offer free delivery or other perks to shoppersāthen I could see customers choosing to shop with them instead. This could be a major advantage for smaller vendors looking to grow. As for whether thereās an official list of supported shopping partners, Iām not sure we have this feature as yet, but it certainly seems like the next logical step in the chain of evolution based on current trends.
I havenāt played around with the agent enough to speak definitively on everything. But I do think it depends on what you already know. ChatGPT can browse the internet and get relevant info, sureābut the more you understand about the site youāre trying to use, how it works, and what to look for, the more effective it seems to be. Some tasks are always gonna be easier because theyāre direct and straightforward. Others, though, are going to be more obscure or ambiguousāand thatās probably where a lot of the confusion and inconsistency comes in.
I didnāt know that Claude had an agent-style product of its own. I might have to subscribe and check it out. Iāve never subscribed to any of Claudeās plans, and thatās mostly because Iā know the context windowālike how many messages you can send in a chatāis limited. Even on the paid plan, Iāve heard it fills up quickly. And instead of starting a new thread when you hit the limit, you only find out when your message doesnāt go through. Another thing I donāt like about Claude is that if Iām typing a message and I accidentally close the app or something interrupts me, the entire message disappears. Itās not like ChatGPT, which keeps the text in the box, so when you reopen it, your content is still there. Thatās one of those little actions that makes a big difference.
Donāt get me wrongāClaude gives grounded, logical responses. It's more human than ChatGPT in certain ways. But because of those limitations, Iāve been hesitant to give it a serious try. Iām going to take a closer look and do some research myself. My biggest issue with Claude has always been the censorship and restrictionsāitās more limited than ChatGPT in that sense. They're trying to be that āethical, moralā AI, but in doing that, they might be missing the mark a bit. Not trying to knock them too hardāthey do have a solid product. It just needs a bit of refinement⦠or loosening up.
A couple of things.
First off, I haven't noticed any time limits per session as such. The limitation however is that for plus users, there are 40 chats using agent per month. that's like 40 tasks. Also, the virtual browser, as some of you mentioned, is inaccessible. I guess for it to work, the screenreader providers will have to work with open ai to implement a sollution. That's why as of now, we will have to provide the login cridencials to the agent. What Claud has is Claud compute, which is arkitecturally different from gpt agent. agents generally creates a VM in the cloud, whereas what claud compute does is take over your computer which means it can also access your files etc in the computer.
I request that the thread title be changed.
This has explicit references to Christianity but I consider the expression to be not only a baseless and flawed assumption but also an intolerable accusation as a Muslim and ask that it be changed to something else regardless of my own stance but due to the fact that one has to either let others use such phrases like "Allahu akbar! I can't believe this or that happened!" or avoid using such phrases himself/herself.
Claude Compute
Hi,
If people want to try Claude Compute, the Guide AI Assistant for Windows uses it as their model.
https://www.guideinteraction.com/
It's about $8 a month at the moment, which I imagine is cheaper than Claude.
Replying To @Enes Deniz
Hi Enes,
We appreciate your feedback.
We want AppleVis to be a place where everyone, no matter their religious beliefs (including having no beliefs at all), is welcome. This includes allowing discussion of apps and technologies related to one's faith and ensuring that those discussions are free from harassment.
You gave the example of including a Muslim-specific reference in a post. Our position is that the use of phrases like "Allahu akbar! I can't believe this or that happened!" that you gave as an example, would be perfectly allowable. Were we to disallow all types of casual religious references in posts, this would set us on a very slippery slope.
Thanks,
Michael
Agree with your stance Michael
Hi,
I have no problem with people putting things like "Allahu akbar!" in their subject lines or posts as long as it relates to tech. If we can put things like 'holy mother of Mary as part of a subject line, then we should be able to put "Allahu akbar!
What would Batman do?
The subject line is my official stance on official subject lines.
Thank you, that is all. š