Running local models on an M4 with 24GB memory

TL;DR

A software engineer demonstrates running smaller local AI models, such as Qwen 3.5 9B, on a MacBook Pro with 24GB RAM. While not matching state-of-the-art models, it offers a practical, offline AI experience. The setup involves specific configurations and trade-offs, with ongoing evaluation of usability.

A software engineer has demonstrated that it is possible to run certain smaller AI models locally on a MacBook Pro equipped with 24GB of memory, without relying on internet connectivity. This development matters because it enables offline AI use, reduces dependence on large cloud services, and opens new possibilities for privacy-conscious or resource-constrained users.

The engineer tested various models, ultimately finding that Qwen 3.5 9B (Q4) can run effectively on a 24GB MacBook Pro using LM Studio with specific configuration tweaks. While the model does not match the capabilities of state-of-the-art (SOTA) models in complex problem-solving or long-term reasoning, it performs well enough for basic tasks, research, and planning. The setup requires choosing compatible tools like Ollama, llama.cpp, or LM Studio, and adjusting parameters such as temperature, top_p, and context length.

For example, with Qwen 3.5 9B, the engineer achieved around 40 tokens per second, with a 128K context window, enabling tasks like code suggestions and simple research. The model’s limitations include occasional distractions, looping, and misinterpretation, which are typical compared to larger models but acceptable for certain workflows. The process involves configuring the model to enable ‘thinking’ mode, which improves its reasoning ability, and setting up the environment through specific JSON configuration files.

Why It Matters

This development matters because it demonstrates that smaller, less resource-intensive models can be practically used offline on consumer hardware, expanding accessibility for developers, researchers, and hobbyists. It also offers a way to reduce reliance on cloud-based AI services, addressing privacy and cost concerns. Although these models cannot replace SOTA AI in complex tasks, they provide a meaningful step toward more democratized AI access and experimentation on personal devices.

Amazon

MacBook Pro 24GB RAM external GPU enclosure

As an affiliate, we earn on qualifying purchases.

Background

Recent years have seen rapid growth in AI model sizes, with state-of-the-art models requiring extensive computing resources and cloud infrastructure. Smaller models like Qwen 3.5 9B have been developed to run on consumer hardware, but their performance often lags behind larger models. Prior efforts focused on cloud deployment, with local use limited to niche or highly optimized environments. This experimentation by a developer marks a shift toward practical local deployment, driven by improvements in model efficiency and configuration tools.

“It’s surprisingly good for something that can run on a 24GB MacBook Pro while leaving space for lots of other things running too.”

— Johanna Larsson, Software Engineer

“While it’s not matching SOTA models in complex reasoning, it encourages more active engagement and step-by-step interaction.”

— Johanna Larsson, Software Engineer

Amazon

AI model running software for MacBook Pro

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how well these models will perform across a broader range of tasks or in more demanding use cases. The long-term stability and scalability of running multiple models simultaneously on consumer hardware remain untested, and configuration optimizations may vary depending on user setups.

Amazon

local AI model deployment tools for Mac

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include further testing of different models and configurations, benchmarking performance across various tasks, and developing more streamlined setup procedures. Monitoring community adoption and feedback will help determine the practicality of wider deployment and potential improvements in model efficiency for local use.

Amazon

high performance laptop for AI development

As an affiliate, we earn on qualifying purchases.

Key Questions

Can I run larger models on my MacBook Pro with 24GB RAM?

Currently, models larger than Qwen 3.5 9B are generally not feasible to run locally on a 24GB MacBook due to memory constraints. Larger models require more RAM and specialized hardware.

What are the main challenges in setting up local models?

The process involves selecting compatible tools, adjusting complex configuration options, and managing trade-offs between speed, memory, and functionality. It can be time-consuming and requires technical knowledge.

How does the performance of these local models compare to cloud-based SOTA models?

Local models like Qwen 3.5 9B are less capable in complex reasoning, long-term tasks, and multi-step problem solving compared to SOTA cloud models. However, they are usable for basic tasks and research.

Is this setup suitable for production or critical tasks?

No, these models are primarily for experimentation and personal use. They lack the robustness and advanced capabilities needed for production environments.

Running local models on an M4 with 24GB memory

Up next

2026.19: Earning & Spending

Author

AI Smasher Team

Why It Matters

MacBook Pro 24GB RAM external GPU enclosure

Background

AI model running software for MacBook Pro

What Remains Unclear

local AI model deployment tools for Mac

What’s Next

high performance laptop for AI development

Key Questions

Can I run larger models on my MacBook Pro with 24GB RAM?

What are the main challenges in setting up local models?

How does the performance of these local models compare to cloud-based SOTA models?

Is this setup suitable for production or critical tasks?

Mitchellh – I strongly believe there are entire companies now under AI psychosis

Sony tries to explain that its AI Camera Assistant doesn’t suck

2026.20: Shifting Alliances in a Changing World

Interfaze: A new model architecture built for high accuracy at scale

Five times AI hallucinations embarrassed governments

Maker packs an opinionated, googly-eyed AI chatbot into a mobile suitcase, powered by an Nvidia Jetson — entirely local machine entity runs Gemma 4 E4B and can respond in 200ms

2026.19: Earning & Spending

The Chinese whiz kids of Silicon Valley

Running local models on an M4 with 24GB memory

Up next

Author

AI Smasher Team

Why It Matters

MacBook Pro 24GB RAM external GPU enclosure

Background

AI model running software for MacBook Pro

What Remains Unclear

local AI model deployment tools for Mac

What’s Next

high performance laptop for AI development

Key Questions

Can I run larger models on my MacBook Pro with 24GB RAM?

What are the main challenges in setting up local models?

How does the performance of these local models compare to cloud-based SOTA models?

Is this setup suitable for production or critical tasks?

You May Also Like