Large language models based on 400 billion parameters require exceptional hardware and enormous amounts of memory. Even when using heavily compressed or quantized versions, processing such systems typically requires a minimum of 200 GB of RAM.
Given these imposing requirements, a normal smartphone would not be the logical choice to sustain such a load.
However, a recent experiment has shown that the current iPhone 17 Pro from Apple has managed to accomplish an apparently impossible feat.
An open-source project named Flash-MoE has indeed allowed the smartphone to run such a colossal model. This milestone is based on precise software optimization strategies and an extremely intelligent use of internal resources.
The main problem lies, of course, in the physical specifications of the smartphone. The iPhone 17 Pro is equipped with 12 GB RAM LPDDR5X, a capacity clearly insufficient to load an entire language model of that scale.
The solution implemented by the Flash-MoE developers circumvents this obstacle by intensively leveraging the device’s solid-state storage.
Instead of routing everything through the limited RAM, data is transmitted continuously from internal storage directly to the GPU. The operation is also supported by the model’s own nature: the acronym MoE stands for “Mixture of Experts“.
Thanks to this architecture, the software does not have to query all 400 billion parameters at once, but activates only a specific fraction of ‘experts’ for the generation of each single word.
The success of this procedure comes with some inevitable trade-offs. The user @anemll, who documented and shared the results of the experiment online, has highlighted that performance is currently extremely limited.
The model manages to generate just 0.6 tokens per second, translating to roughly one word every one and a half to two seconds on the display.
Such slowness would test anyone’s patience during daily use. In addition to this lack of fluidity, there is a severe impact on the iPhone 17 Pro’s battery life, as the constant reading from disk and the graphics calculations drain the battery quickly.
However, there is an extremely important advantage that justifies such experiments. Running a model locally guarantees absolute privacy, allowing the user to process complex queries and obtain detailed responses without using any active internet connection, ensuring that no information leaves the smartphone.
Although the current demonstration makes the querying process tedious and not yet ready for daily use by the general public, it confirms that high-level artificial intelligence can operate locally on simple smartphones, suggesting that further optimizations could soon bridge the gap between simply launching software and its full practical usability.
Google is preparing to introduce a brand-new device for its wearables lineup, entering direct competition…
Recently Vivo announced its new Camera Phone for the Chinese market, with a major novelty…
Google seems intent on tightening protection measures related to software on its newer smartphones. According…
The market for TWS headphones is now saturated, with fierce competition among brands to offer…
The company founded by Carl Pei aims to expand its product ecosystem well beyond smartphones.…
A few days after the renders dedicated to the standard model, we are back to…