I really like voiceover image descriptions. When it described a picture of me with a miniature horse at a farm, it said... "a person standing next to a pony on a leash." But I found an image describer that blows voiceover image descriptions and jaws picture smart out of the water! It is called mini gpt4. No it is not based on chat gpt. It is based on an open source model called vicuna. Just to show you what it can do, lets compare the above description to that of mini gpt. the new mini gpt enhanced description is below. so you can see and judge for yourself. for this demo, I will set the temperature or creativity to 2, and beam search to 5. Those are maximum values for those sliders.
I ask mini gpt: Describe this picture in detail.
gpt responds: The image shows a man standing next to a small black pony. The man is wearing a green shirt and has his arms around the pony’s neck. The pony has a black coat and a white patch on its forehead. The man is smiling and the pony looks content. The background is a dirt road with some rocks and gravel. There are some trees in the background and the sky appears to be cloudy.
I want more details. so I ask for them like this.
I continue the conversation and I write: Describe the pony or miniature horse in more detail.
mini gpt responds: The pony in the image appears to be a small, black animal with a smooth, glossy coat. It has a white patch on its forehead and a small, pointed nose. Its eyes are large and brown, and it has small, pointed ears that are alert and attentive. The pony has a small, compact body with a short, stocky neck and a broad, muscular chest. Its legs are short and sturdy, with small hooves that are well-groomed. It has a friendly and content expression on its face, and appears to be well-cared for.
Wouldn't it be great if voiceover could do this, or if an IOS app could do this and we could ask for more details in photos and stuff?
Comments
It’s on its way.
Be My Eyes Virtual Volunteer and recently announced Google Visual Q&A. Be My Eyes is very similar to what you’ve described here. How do you access Mini GPT in the most accessible way?
accessing mini gpt
you can access it here. It should also work in safari. I have been using google chrome on my pc for using mini gpt though.
https://huggingface.co/spaces/Vision-CAIR/minigpt4