using a mac studio headless with voiceover to accomplish heavy GPU tasks

By Moopie Curran, 9 February, 2026

Forum

macOS and Mac Apps

Hi,
I might be getting a mac studio to accomplish GPU hungry AI tasks. But I'm totally blind, and don't need a monitor. I know some of you have headless macs (mac minis or desktops without monitors). Will the mac studio work without a monitor, especially if I use a headless fit display emulator? And will I be able to do tasks that use GPU despite not having a real monitor?

Options

Comments

Perfectly fine

I'm typing this on the 128GB URAM 2TB storage M4 Max Mac Studio that I bought in June last year, I never really attached a monitor to it for anything other than testing purposes, and it has always worked just fine with macOS 15 and 26 without needing any kind of emulation whatsoever. The firmware already emulates a 30" 1920x1080 monitor when the computer is running headless so it works perfectly fine even during the installation, and when I definitely need to read the screen to scan QR Codes to authenticate some applications, I just mirror it to my iPad or Apple TV, or alternatively I have a 15" Raspberry Pi HDMI monitor that I can plug in if there's a firmware problem preventing the system from booting that is not being properly conveyed through the screen-reader (though this has never happened in 8 months of use).

As for GPU-hungry tasks, let's just say that ever since OpenAI released their 120 billion parameter text-only quantized reasoning model to the public, which according to benchmarks has roughly the same accuracy as ChatGPT 4o, I haven't felt the need to use any large language models online. This model is close to 66GB in size, it's a dense model so it requires at least that much GPU memory (which is the same as CPU memory on Apple's M-series chips), and has a pretty stable generation performance of about 90 to 100 tokens per second, generating 10,000 character essays in about 30 seconds on this Mac after converting it to Apple's MLX framework. Although the NVIDIA RTX 5090 still leaves this GPU in the dust, generating over 300 tokens per second with the same model, it only has 32GB of RAM, so you'd need 3 of those cards to run this model, at which point you're already surpassing the cost of this Mac, which has a whole beefy computer attached to its GPU, consumes a lot less power, and can also be connected to other Macs like this through Thunderbolt for extra memory and parallel compute performance.

Although I'm pretty happy with this computer, the M5 Max and Ultra are probably just around the corner with more impressive specs, so I'd wait for those if I was you.

I thought the 120b required the 512gb UM model

But I have never really gotten into the calculation of these things, nice to know! For local model use a mac studio is very tempting in about 4-5 years when I can actually have a real job in the industry too.

Some examples

Below are two examples that I just copied from a Terminal session with me prompting the local GPT model that I mentioned earlier, one as a chat session with the model and another as a single prompt with some statistics at the end:

jps@alpha ~ % mlx_lm.chat --max-tokens=65536 --model models/gpt-oss-120b
/Users/jps/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
[INFO] Starting chat session with models/gpt-oss-120b.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> Explain stencil shadows in the context of conputer graphics. Avoid code examples and be terse.
<|channel|>analysis<|message|>The user asks: "Explain stencil shadows in the context of computer graphics. Avoid code examples and be terse." So they want an explanation of stencil shadows, presumably the stencil shadow volume technique used in real-time rendering, also known as stencil shadow volumes or shadow volumes using the stencil buffer. They want a concise explanation, no code examples, and be terse (i.e., brief). So we need to give a short but clear description: what stencil shadows are, how they work, the steps: render scene, render shadow volumes to stencil buffer, increment/decrement based on front/back faces, then render lit pixels where stencil is zero, etc. Also mention advantages, limitations, and typical use cases. Keep it concise, no code. Provide a clear explanation.

We should produce a short paragraph or bullet points. Avoid code. Use plain language. Provide context: stencil buffer, shadow volumes, silhouette edges, etc. Provide maybe a short example scenario. Keep it terse.

Let's produce final answer.<|end|><|start|>assistant<|channel|>final<|message|>**Stencil Shadows (Stencil Shadow Volumes)**  

A stencil‑shadow technique uses the stencil buffer to determine which pixels lie inside a shadow volume—an extruded shape that represents the region occluded from a light source. The process, in a nutshell, is:

1. **Render the scene normally** (depth‑only) so the depth buffer knows where geometry is.
2. **Create a shadow volume** for each occluder by extruding its silhouette edges away from the light. The volume is a closed mesh that surrounds the space that the light cannot reach.
3. **Render the volume to the stencil buffer only** (no color output).  
   - When a front‑facing polygon of the volume is drawn, increment the stencil value.  
   - When a back‑facing polygon is drawn, decrement the stencil value.  
   This “counters” the number of times a ray from the eye enters and exits the volume.
4. **Apply the stencil test** when drawing the lit pass: only pixels whose final stencil count is zero (i.e., not inside any shadow volume) receive the light contribution; all others stay in shadow.

**Why it works** – The stencil buffer acts as a per‑pixel counter. A pixel that the view‑ray passes through an odd number of volume faces is inside a shadow, while an even count (including zero) means it is unshadowed.

**Pros**  
- Works with arbitrary convex or concave geometry (as long as the volume is correctly built).  
- No need for expensive per‑pixel ray tests; the heavy lifting is done by the GPU’s stencil operations.  
- Works with multiple lights by repeating the stencil pass per light.

**Cons**  
- Requires extra render passes (depth, stencil, lit).  
- Sensitive to precision and winding order; incorrect volume construction can cause artifacts.  
- Not as fast as newer screen‑space methods (e.g., shadow maps) for very complex scenes.

In short, stencil shadows are a classic real‑time shadowing method that leverages the stencil buffer to mask out pixels lying inside shadow volumes, producing hard‑edged shadows without needing full‑screen ray tracing.
>> thanks! Bye!
<|channel|>analysis<|message|>User says thanks and bye. We should respond politely, maybe a short farewell. No extra.<|end|><|start|>assistant<|channel|>final<|message|>You’re welcome! Take care and have a great day. Goodbye!
>> q
jps@alpha ~ % mlx_lm.generate --max-tokens=4096 --model models/gpt-oss-120b --prompt 'Explain stencil shadows in the context of computer graphics. Be terse and avoid code examples.'
/Users/jps/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
==========
<|channel|>analysis<|message|>The user asks: "Explain stencil shadows in the context of computer graphics. Be terse and avoid code examples."

We need to give a concise explanation of stencil shadows, what they are, how they work, typical steps, use cases, advantages, limitations. No code. Terse but clear. Should be short but cover key points: stencil buffer, shadow volume, rendering passes, etc. Avoid code examples. Provide a brief description.

We'll produce a concise answer.<|end|><|start|>assistant<|channel|>final<|message|>**Stencil shadows** are a classic real‑time shadow‑casting technique that uses the stencil buffer to determine which pixels lie inside a shadow volume (the region occluded from a light source). The basic idea is:

1. **Render the scene normally** – fill the color/depth buffers with the visible geometry.
2. **Create a shadow volume** – a closed set of polygons (often extruded from the silhouette of an object toward the light) that encloses the space that should be in shadow.
3. **Stencil pass** – render the shadow volume with depth testing on but color writes disabled. As each front‑facing polygon of the volume is rasterized, the stencil buffer is incremented; as each back‑facing polygon is rasterized, the stencil buffer is decremented. After this pass, any pixel whose stencil value is non‑zero lies inside the shadow volume.
4. **Apply the shadow** – draw a full‑screen quad (or the lit geometry again) with a darkening shader, but only where the stencil test passes (i.e., where the stencil value ≠ 0). The result is that only the pixels inside the shadow volume are darkened, producing crisp, hard‑edged shadows.

**Key points**

- **Stencil buffer**: an 8‑bit per‑pixel integer buffer used as a counter for entering/exiting the shadow volume.
- **Shadow volume**: a geometric representation of the occluded region; can be generated from mesh silhouettes or from simple primitives (e.g., capsules for point lights).
- **Advantages**: works with arbitrary geometry, produces accurate hard shadows, no need for texture lookups, and integrates well with existing depth testing.
- **Limitations**: expensive for many lights or complex volumes (multiple passes per light), only produces hard edges (no soft penumbra), and can suffer from precision issues if the stencil buffer overflows.

In summary, stencil shadows leverage the stencil buffer as a per‑pixel mask that flags which fragments lie inside a shadow volume, allowing the renderer to darken exactly those fragments and thus create realistic, geometry‑aware shadows without resorting to shadow maps.
==========
Prompt: 84 tokens, 51.600 tokens-per-sec
Generation: 543 tokens, 97.397 tokens-per-sec
Peak memory: 65.882 GB

Note that the OpenSSL warnings above are generated from a Python library that Apple's MLX Python bindings depend on for downloading models from places like Hugging Face, and is completely irrelevant in this context since the model was already on my computer. I did not edit anything from the previous output; everything was pasted here exactly as it appeared on my Terminal window, including my misspelling of the word computer in the first prompt. I had to ask the model to be terse since this model in particular tends to be particularly chatty when it's free to decide.

Re: I thought the 120b required the 512gb UM model

Nope, since it's quantized to FP4. It has been discovered some time ago that transformer-based models don't require a lot of precision, so while quantizing from FP16 to FP4 reduces memory requirements by roughly 4 times during inference, the accuracy of the models is only reduced by roughly 5%, so it's a tiny loss of precision for a huge reduction in memory usage, and since modern GPUs also optimize FP4 to be roughly 4 times faster than FP16, quantizing also offers a huge performance boost.