Happy Friday!
I'm excited to announce the release of VOLlama v0.1.1, an open-source, accessible chat client for OLlama. This client leverages open-source large language models to enable local conversations without internet for privacy.
Many user interfaces for open-source large language models are either inaccessible or annoying to use with screen reader, so I decided to create one for myselfs and others. I hope that ML UI libraries like Streamlit and Gradio will become more friendly with screen readers in the future, so making apps like this is not necessary!
Running an open source model locally requires high computing power. I recommend at least 16GB of RAM and a Mac with an M1 chip or later.
However, it doesn't require much computing power if you just want to use OpenAI GPT models or Google Gemini models with api keys.
To install Ollama, you'll need to use the Terminal, but chatting does not require terminal. The app is not notarized by Apple, so you need to allow to open from the system settings > privacy and security. Unfortunately it takes a little while to open, so you need to wait after you open. I'm looking into improving the opening time.
It has various features, including generating image descriptions with a multimodal model like Llava and the ability to process and query long documents with RAG feature. There are numerous settings available for power users as well. It also supports models from OpenAI and Google Gemini if you have an api key.
If it sounds interesting, please download VOLlama and follow the instruction.
Hope you enjoy and spread the news!
Comments
This is very cool! Thank you…
This is very cool! Thank you for making this
I'm unfortunately on an intel mac, but I'll see if I can give it a spin
strange openAI model behavior
This is very neat! Thank you so much for creating it. Running the models locally seems to work as expected, however I get this error whenever I try using GPT.
Is there something I'm not doing correctly?
To clarify, I've provided my api key. Er
ror code: 400 - {'error': {'message': 'max_tokens is too large: 8192. This model supports at most 4096 completion tokens, whereas you provided 8192.', 'type': None, 'param': 'max_tokens', 'code': None}}
Traceback (most recent call last):
File "Model.py", line 162, in ask
File "llama_index/core/llms/callbacks.py", line 150, in wrapped_gen
File "llama_index/llms/openai/base.py", line 439, in gen
File "openai/_utils/_utils.py", line 277, in wrapper
File "openai/resources/chat/completions.py", line 581, in create
File "openai/_base_client.py", line 1232, in post
File "openai/_base_client.py", line 921, in request
File "openai/_base_client.py", line 1012, in _request
openai.BadRequestError: Error code: 400 - {'error': {'message': 'max_tokens is too large: 8192. This model supports at most 4096 completion tokens, whereas you provided 8192.', 'type': None, 'param': 'max_tokens', 'code': None}}
disregard my last comment
I apologize. After playing around a bit, I was able to adjust the correct parameter in order for gpt to work correctly.
Haven't tried vollama but ollama
And I thank the author for interesting me again in local ais, it works great in terminal actually and I love ollama so simple than the gist I was following at that time with a couple of compilation with make and weird stuff like that... Great post and great app!
cool.
does it work with windowsPC?
How do we get the higher…
How do we get the higher quality voices? All the options in the seem to be compact.
I'm also having issues with…
I'm also having issues with open AI. Getting an error. I've got a api key in there, is that all I need to do or are there further steps?
Works with Windows
Ming, it works with Windows as well.
Ollie, most likely you're using a gpt model that doesn't support 8192 context length which is default for llama3 model. Try go to advance menu > generation parameters and set num_ctx to 4096.
That's got it, thank you.
That's got it, thank you.
Describing images
This app seems really interesting. Thanks very much for sharing it with us.
I noticed I can copy/paste an image into the text area (or use the attach command) but I am told that it can't describe images. If I use nomic-embed-text then I get a 404 error. I did follow the instructions and installed both models. Is the 2nd one responsible for images or do I need something else?
I often get sent screenshots at work which I don't really want to send over the internet. I tend to use the OCR built into Smultron which is sometimes helpful, but it would be great if I could locally query AI about it.
For Image Description
For image description, you need to download a different model called llava. It has 3 different variants. The command "ollama pull llava" downloads the 7 billion parameters model. Then there are llava:13b and llava:34b. Higher you go, the accuracy increases, but it takes up more storage an more computing power, and response speed gets slower.
Once you download it, choose the model from the VOLlama inside toolbar (or command+l.)
Then you attach an image from the chat menu (or command+i.) Then you can just ask a question like can you describe the image.
Also, it has very limited capability of OCR. It's more for like scene description.
If you need OCR, I recommend VOCR, another app I developed specifically to process screenshots with OCR.
https://github.com/chigkim/vocr/releases
Lastly the, nomic-embed-text is an embedding model used for RAG to process documents. You cant chat with an embedding model directly.
Hope that helps.
Wait! You're the one who…
Wait! You're the one who made VOCR!!! I love you!!!
Seriously, I don't know where I'd be without VOCR. I use it all the time for my 3D printing and all the myriad of other apps that aren't accessible. Thank you so much for your work! Apple needs to hire you and put you in charge, and give you a gold hat.
Thanks for the kind words! I…
Thanks for the kind words! I'm glad that you find it useful!
This year I added bunch of features to VOCR, so if you haven't tried the beta version, you should try it! :)
It has new menu, real time ocr, object detection, AI image description through ollama, OpenAI, etc.
Honestly, VoiceOver should have these feature built-in, so I don't have to work on it.
If anyone has connection to Apple accessibility team, please tell them about it. lol
hope it can have new features that help us to play games
I hope in the near future it can help us to play games. or describe the menu and all that
massive newb question here…
massive newb question here. wanting to try this out and cant install the models! I put VOLama in the applications folder, opened teminal and typed ollama pull llama3, and it says command not found. feeling a bit silly here haha, help!
You have to install Ollama…
You have to install Ollama first before you can use VOLlama.
Yes, I'm all over the 2.0…
Yes, I'm all over the 2.0 beta. It's very slick. If you are happy for me to do so, I'll drop apple accessibility a line, it is exactly what they should be doing, trouble is, anything like this is partially admitting that it's needed, IE, their accessibility framework has gaping holes.
Any way you know of getting…
Any way you know of getting the higher quality siri voices on this? I'm not sure if there is a limitation of 3rd party apps accessing the built in voices.
@Chi
Thanks for clearing those things up. For some reason I never thought to use VOCR to read screenshots, but presumably I could ask it to take a grab and then interact with it. I think there are some options to view images in Finder, but mostly on my Mac they would be images in emails or Jira tickets so it's a little easier if I don't have to save it somewhere first. I'll give that a try next time, thanks. I know there is also AI integration with VOCR which I've still not really played with yet (but should do).
VOCR is one of those apps that I don't use all that often, but when I do it's such a godsend. So thanks very much for both apps.
mr grieves, with new Beta…
mr grieves, with new Beta VOCR, you don't need to save the screenshot into file. You can just do it from browser. Move your VOCursor to the target image and run OCR on VOCursor with control+command+shift+v.
People are doing all kinds of things with this: like extract text from video, asking what's going on in Youtube video, etc.
with real time OCR, you can even read live caption in real time without scanning over and over.
Ollie, Unfortunatley I haven't a way to access higher quality voices in Python. Also feel free to give Apple accessibility team a line. They already have screen recognition on iOS, so they just need to port it over to OSX!
@Chi
Ah yes I did see something like that. I don't think at the time I quite appreciated what I was going to use it for.
I will definitely give that a try next time I come across some random image that I can't make sense of. Thanks very much for pointing it out. Oh and developing it too of course!
aaaaaa yep. definitely a…
aaaaaa yep. definitely a newb haha. for some reason i had it in my head that terminal would install it since i had VOLLama installed lol thank you
any plans for this to come to the iPhone
Any plan for this to come to the iPhone?
how can I get VOLLama
hi!
can someone send me the link for VOLLama ...
I just get the chat client but, seems it can not work..
AMD computer in 2021
meanwhile...
if I am using AMD computer computer that I brought in 2021..
does it still work?
I see that this is made in…
I see that this is made in python. Any chance we might see elevenLabs support?
you could use this elevenLabs wrapper
https://github.com/lugia19/elevenlabslib
Image generation
So this is bloody cool! do any of the models generate images? I couldn't see any obvious one in the list.
For image description,…
For image description, Ollama supports two multimodal (vision language) models: Llava and Moondream which was added yesterday.
No image generation. Sorry,…
No image generation. Sorry, I didn't read it carefully. Ollama doesn't supports any model that can generate images.