Alibaba Researchers Introduce Mobile-Agent: An Autonomous Multi-Modal Mobile Device Agent
Mobile device agents utilizing Multimodal Large Language Models (MLLM) have gained popularity due to the rapid advancements in MLLMs, showcasing notable visual comprehension capabilities. This progress has made MLLM-based agents viable for diverse applications. The emergence of mobile device agents represents a novel application, requiring these agents to operate devices based on screen content and…