Categorie: News

iPhone 17 Pro runs an LLM that requires 200 GB of RAM to operate

Large language models based on 400 billion parameters require exceptional hardware and enormous amounts of memory. Even when using heavily compressed or quantized versions, processing such systems typically requires a minimum of 200 GB of RAM.

Given these imposing requirements, a normal smartphone would not be the logical choice to sustain such a load.

However, a recent experiment has shown that the current iPhone 17 Pro from Apple has managed to accomplish an apparently impossible feat.

An Impossible AI Runs on an iPhone 17 Pro

Credits: Apple

An open-source project named Flash-MoE has indeed allowed the smartphone to run such a colossal model. This milestone is based on precise software optimization strategies and an extremely intelligent use of internal resources.

The main problem lies, of course, in the physical specifications of the smartphone. The iPhone 17 Pro is equipped with 12 GB RAM LPDDR5X, a capacity clearly insufficient to load an entire language model of that scale.

The solution implemented by the Flash-MoE developers circumvents this obstacle by intensively leveraging the device’s solid-state storage.

Instead of routing everything through the limited RAM, data is transmitted continuously from internal storage directly to the GPU. The operation is also supported by the model’s own nature: the acronym MoE stands for “Mixture of Experts“.

Thanks to this architecture, the software does not have to query all 400 billion parameters at once, but activates only a specific fraction of ‘experts’ for the generation of each single word.

Between Slowness and Privacy

The success of this procedure comes with some inevitable trade-offs. The user @anemll, who documented and shared the results of the experiment online, has highlighted that performance is currently extremely limited.

The model manages to generate just 0.6 tokens per second, translating to roughly one word every one and a half to two seconds on the display.

Such slowness would test anyone’s patience during daily use. In addition to this lack of fluidity, there is a severe impact on the iPhone 17 Pro’s battery life, as the constant reading from disk and the graphics calculations drain the battery quickly.

However, there is an extremely important advantage that justifies such experiments. Running a model locally guarantees absolute privacy, allowing the user to process complex queries and obtain detailed responses without using any active internet connection, ensuring that no information leaves the smartphone.

Although the current demonstration makes the querying process tedious and not yet ready for daily use by the general public, it confirms that high-level artificial intelligence can operate locally on simple smartphones, suggesting that further optimizations could soon bridge the gap between simply launching software and its full practical usability.

Luca Zaninello

Appassionato del mondo della telefonia da sempre, da oltre un decennio si occupa di provare con mano i prodotti e di raccontare le sue esperienze al pubblico del web. Fotografo amatoriale, ha un occhio di riguardo per i cameraphone più esagerati.