Installing Gemma 4 on an M4 Mac Mini 32GB

Created: | Updated:

Intro.

Gemma 4 was just released recently, so I tried installing a local LLM for the first time. I chose Ollama as the LLM management tool because it seems to work well with my coding tools.

Ollama Gemma 4 Page

Ollama Installation

I used the DMG file for installing Ollama on my Mac, as that was the recommended method.

"The preferred method of installation is to mount the ollama.dmg and drag-and-drop the Ollama application to the system-wide Applications folder."

After installing Ollama, check version on terminal.


myuser@my-Mac-mini ~ % ollama -v
ollama version is 0.20.3
Ollama for Mac

Gemma4:26B Installation

Gemma4:26B is a workstation model and mixture of experts model with 4B active parameters. I thought 32GB of memory would be too small, but it seems that 26B is the smallest option available for the Gemma4 workstation.


myuser@my-Mac-mini ~ % ollama run gemma4:26b
pulling manifest 
pulling 7121486771cb:   9% ▕█                 ▏ 1.6 GB/ 17 GB   63 MB/s   4m16s

Installation failed on Wi-Fi due to a 'read operation timed out' error. It succeeded after switching to a wired connection.

Gemma4:26B GPU Usage

Let's check how much memory is used when running the model. I ran a simple prompt to see if it works.


myuser@my-Mac-mini ~ % ollama run gemma4:26b "Hello"
Thinking...
The user said "Hello".
This is a standard greeting.
Acknowledge the greeting and offer assistance.

    *   "Hello! How can I help you today?"
    *   "Hi there! Is there anything I can assist you with?"
    *   "Hello! What's on your mind?"

"Hello! How can I help you today?" (Simple, polite, and open-ended).
...done thinking.

Hello! How can I help you today?

myuser@my-Mac-mini ~ % ollama ps
NAME          ID              SIZE     PROCESSOR    CONTEXT    UNTIL              
gemma4:26b    18712148f3a52    20 GB    100% GPU     32768      4 minutes from now 

myuser@my-Mac-mini~ % ollama show gemma4:26b
  Model
    architecture        gemma4    
    parameters          25.8B     
    context length      262144    
    embedding length    2816      
    quantization        Q4_K_M    
    requires            0.20.0    

  Capabilities
    completion    
    vision        
    tools         
    thinking      

  Parameters
    temperature    1       
    top_k          64      
    top_p          0.95    

  License
    Apache License               
    Version 2.0, January 2004    
    ...                          

It seems that the model is using 20GB of memory, which is quite a lot. So machine memory with 32GB is minimal for running this model, and it may not be able to run other applications smoothly while the model is running. I will see how it performs under different scenarios.