Already during the Google I/O 2025, the American company had anticipated its vision with Project Astra, showing how the artificial intelligence could not only “see” what happens on the smartphone screen, but physically interact with it, swiping pages and pressing buttons.
Today, thanks to the in-depth analysis of the beta versions of the Google app, concrete details emerge on how this functionality will take shape on Android devices.
Gemini will be able to control your smartphone, here’s how

Recent discoveries made within the code of the Google application for Android (specifically in the beta version 17.4.66) have brought to light new text strings that outline how what is internally called by the codename “Bonobo” works.
The publicly used terminology, however, seems to lean toward the concept of “screen automation”, i.e., screen automation.
This technology promises to transform Gemini into an operating agent capable of carrying out complex actions on behalf of the user. The descriptions found in the software explicitly indicate the AI’s ability to handle practical tasks such as placing online orders or booking rides on platforms like Uber or Lyft.
It’s no longer about tapping various links to complete the action, but about letting the assistant navigate the app interface, select options and finalize the request.
The responsibility remains with the user
Despite the promise of autonomous assistance, Google seems to take an extremely cautious approach regarding the responsibility for actions performed by AI.
The warning messages integrated into the system emphasize that Gemini can make mistakes and that the user is required to supervise operations closely. One of the text strings clearly states that the user is responsible for what the agent does on their behalf, inviting monitoring of progress and manual intervention if necessary.
This aspect introduces a curious contradiction in the user experience: the goal of an autonomous agent should be to free the user from repetitive tasks, reducing human intervention. However, the need for constant supervision could, at least in an initial phase, not significantly reduce the cognitive load required, turning the user from actor into controller.
Privacy and management of sensitive data
Another fundamental chapter concerns privacy and data security during the use of screen automation. The informational notes discovered in the code warn that when Gemini interacts with an application, screenshots of activities could be analyzed by human reviewers to improve the service, if the option to save the activity is active.
Therefore, Google strongly advises against entering login credentials or payment information directly in chats with Gemini or using the automation for tasks involving highly sensitive data.
It remains to be clarified how the system will handle the moment of payment within third-party apps: it is not specified whether the AI will pause to let the user enter credit card details or whether there will be specific security protocols.
Finally, the integration of these features could lead to visible changes in the user interface. References to a section “My Orders” or “Purchases” within the user profile have been identified, suggesting that Google intends to centralize the history of actions performed by the assistant, offering a single hub to monitor transactions carried out via automation.
Although there is not yet an official release date, it is evident that the move from a simple chatbot to a true operating agent is imminent.


