Future of AI Perspectives for Startups

Google is building all the components of the AI technology stack, from custom chips, to data centers to frontier models. As a result, our new Gemini 2.0 models are more capable, faster and more efficient than previous versions. These models are natively multimodal—they are able to process text, images, audio and video. They can also generate images and steerable text-to-speech audio. With long context windows of up to 2 millions tokens, Gemini can power advanced applications that require deep understanding and memory

Additionally, Thinking model is capable of showing reasoning skills for solving complex problems, which is especially useful in math and science. Gemini can also natively use tools like Google Search to access real-time information, and DeepMind’s Project Mariner has demonstrated that agents built with the Gemini model can complete tasks using a web browser. Conversational experiences can now be built with the Gemini Multimodal Live API, which accepts audio and video streaming input. The combination of these capabilities enables a new class of agentic experiences and we’re excited to see what startups build with Gemini in 2025.

First Name

Last Name

Phone Number

Job Title

Company Name

Country

What Enterprise Conversational AI Platforms challenges are you currently experiencing?

What do you hope to achieve once the Enterprise Conversational AI Platforms solution is in place?

What vendors are you currently considering?

What stage of the budget process are you in?

If you wish to opt out of data collection, marketing communications, or any other services provided by Google Cloud privacy policy, please complete and submit the form below.

@GoogleCloud

Future of AI Perspectives for Startups

Subscribe To Our Newsletter