Google Gemini: Everything You Need to Know About the New Generative AI Platform | TechCrunch - Latest Global News

Google Gemini: Everything You Need to Know About the New Generative AI Platform | TechCrunch

Google is trying to make a splash with Gemini, its flagship suite of generative AI models, apps and services.

So what are twins? How can you use it? And how does it compare to the competition?

To make it easier to stay up to date with the latest Gemini developments, we’ve put together this handy guide. We’ll keep it updated as new Gemini models, features, and news about Google’s plans for Gemini are released.

What are twins?

Gemini is Google’s long-promised next-generation GenAI model family, developed by Google’s AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultrathe most powerful Gemini model.
  • Gemini Proa “lightweight” Gemini model.
  • Gemini Nanoa smaller “distilled” model that runs on mobile devices like the Pixel 8 Pro.

All Gemini models have been trained to be “natively multimodal” – in other words, they are able to work with and use more than just words. They have been pre-trained and refined using a variety of audio, image and video files, a large amount of code bases and texts in different languages.

This sets Gemini apart from models like Google’s own LaMDA, which was trained solely on text data. LaMDA cannot understand or generate anything other than text (e.g. essays, email drafts), but that is not the case with Gemini models.

What is the difference between the Gemini apps and the Gemini models?

Photo credit: Google

Google has once again proven that it lacks a sense of branding and has not made it clear from the start that Gemini is separate and distinct from the Gemini web and mobile apps (formerly Bard). The Gemini apps are simply an interface through which specific Gemini models can be accessed – think of it as a client for Google’s GenAI.

By the way, the Gemini apps and models are also completely independent of Imagen 2, Google’s text-to-image model, which is available in some of the company’s development tools and environments.

What can Gemini do?

Because the Gemini models are multimodal, they can theoretically perform a range of multimodal tasks, from transcribing speech to subtitling images and videos to creating artwork. Some of these features have already reached product stage (more on that later), and Google promises them all – and more – at some point in the not-too-distant future.

Of course, it’s a little difficult to take the company at its word.

Google significantly under-delivered when it originally launched Bard. And recently a video caused a stir that was supposed to demonstrate Gemini’s capabilities, but turned out to be heavily manipulated and more or less ambitious.

Assuming that Google’s claims are more or less true, here’s what the different Gemini stages can do once they reach their full potential:

Gemini Ultra

Google says Gemini Ultra’s multimodality means it can be used for things like physics homework, solving problems step by step on a worksheet, and pointing out possible errors in already completed answers.

Gemini Ultra, according to Google, can also be applied to tasks such as identifying scientific papers relevant to a particular problem – extracting information from those papers and “updating” a graph from them by generating the formulas needed to do that Rebuild chart with newer data.

Gemini Ultra technically supports image creation, as already mentioned. However, this feature has not yet made its way into the product version of the model – perhaps because the mechanism is more complex than the way apps like ChatGPT generate images. Instead of passing prompts to an image generator (like DALL-E 3 in the case of ChatGPT), Gemini outputs images “natively,” without an intermediate step.

Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and AI Studio, Google’s web-based tool for app and platform developers. It also runs the Gemini apps – although not for free. Accessing Gemini Ultra through what Google calls Gemini Advanced requires a subscription to the Google One AI Premium plan, priced at $20 per month.

The AI ​​Premium plan also connects Gemini to your broader Google Workspace account – think emails in Gmail, documents in Docs, presentations in Sheets, and Google Meet recordings. This is useful, for example, for summarizing emails or having Gemini take notes during a video call.

Gemini Pro

According to Google, Gemini Pro represents an improvement over LaMDA in its reasoning, planning and comprehension capabilities.

An independent study by Carnegie Mellon and BerriAI researchers found that the first version of Gemini Pro was actually better at handling longer and more complex reasoning chains than OpenAI’s GPT-3.5. However, the study also found that this version of Gemini Pro, like all major language models, particularly struggled with multi-digit math problems, and users found examples of poor thinking and obvious errors.

However, Google promised a remedy – and the first came in the form of Gemini 1.5 Pro.

Gemini 1.5 Pro is designed as a replacement and features improvements over its predecessor in several areas, but most importantly in the amount of data it can process. Gemini 1.5 Pro can hold approximately 700,000 words or approximately 30,000 lines of code – 35 times the amount that Gemini 1.0 Pro can handle. And since the model is multimodal, it is not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in various languages, albeit slowly (e.g., searching for a scene in an hour-long video takes 30 seconds to a minute of processing time).

Gemini 1.5 Pro was introduced in April as a public preview of Vertex AI.

An additional endpoint, Gemini Pro Vision, can handle text And Images – including photos and videos – and output text modeled on OpenAI’s GPT-4 vision model.

Twins

How to use Gemini Pro in Vertex AI. Photo credit: Twins

Within Vertex AI, developers can tailor Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also connect to external third-party APIs to perform certain actions.

There are workflows in AI Studio for creating structured chat prompts with Gemini Pro. Developers have access to the Gemini Pro and Gemini Pro Vision endpoints and can adjust model temperature to control the creative range of the output, provide examples of tone and style instructions – and also tweak security settings.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models and is efficient enough to run directly on (some) phones rather than sending the task to a server somewhere. So far it supports a few features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including “Summarize in Recorder” and “Smart Reply” in Gboard.

The Recorder app, which allows users to record and transcribe audio with the touch of a button, includes a summary of your recorded conversations, interviews, presentations and other excerpts provided by Gemini. Users receive these summaries even when there is no signal or Wi-Fi connection available – and for privacy reasons, no data leaves their phone.

Gemini Nano is also included in Gboard, Google’s keyboard app. A feature called “Smart Reply” will be enabled there to help you suggest what you want to say next when you have a conversation on a messaging app. The feature initially only works with WhatsApp, but will be available in more apps over time, Google says.

And in the Google Messages app on supported devices, Nano enables Magic Compose, which can create messages in styles like “excited,” “formal,” and “lyrical.”

Is Gemini better than OpenAI’s GPT-4?

Google has repeatedly touted Gemini’s superiority in benchmarks, claiming that Gemini Ultra outperforms current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development.” The company says that in some scenarios, Gemini 1.5 Pro performs better than Gemini Ultra at tasks such as summarizing content, brainstorming, and writing; This will probably change with the release of the next Ultra model.

Leaving aside the question of whether benchmarks really indicate a better model, Google’s results appear to be only marginally better than OpenAI’s corresponding models. And—as mentioned—some of the early impressions weren’t particularly good: users and researchers pointed out that the older version of Gemini Pro tends to misrepresent basic facts, has difficulty with translations, and provides poor coding suggestions.

How much does twins cost?

Gemini 1.5 Pro is free to use in the Gemini apps and currently also in AI Studio and Vertex AI.

However, once Gemini 1.5 Pro leaves preview in Vertex, the model will cost $0.0025 per character, while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (approximately 140 to 250 words) and, on models like Gemini Pro Vision, per image ($0.0025).

Let’s say a 500 word article contains 2,000 characters. Summary of this article with Gemini 1.5 Pro would cost $5. Meanwhile, creating an article of similar length would cost $0.1.

Ultra pricing has yet to be announced.

Where can you try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is the Gemini apps. Pro and Ultra answer queries in different languages.

Gemini Pro and Ultra are also accessible in preview in Vertex AI via an API. The API is free to use “within borders” for now and supports certain regions, including Europe, as well as features such as chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. The service allows developers to iterate on prompts and Gemini-based chatbots, then obtain API keys to use them in their apps – or export the code to a more feature-rich IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered code completion and generation assistance tools, uses Gemini models. Developers can make “large-scale” changes across codebases, such as updating cross-file dependencies and reviewing large blocks of code.

Google has integrated Gemini models into its development tools for Chrome and the Firebase mobile development platform, as well as its database creation and management tools. And it has launched new security products powered by Gemini Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that can analyze large swaths of potentially malicious code and allow users to search for current threats or signs of compromise in natural language.

Sharing Is Caring:

Leave a Comment