Elon Musk-led artificial intelligence startup xAI Corp. unveiled its first multimodal model late Friday, adding to an AI arms race that never seems to end.

It’s called Grok-1.5 Vision or Grok-1.5V and it goes much further than the original Grok-1 large language model in that it can understand text and also visual elements, including displayed documents, photos, screenshots, diagrams, etc. Diagrams and so on.

According to the company, Grok-1.5V is more than capable of competing with existing multimodal models in various areas and specializes in what it calls “multidisciplinary thinking.” It has intelligent spatiotemporal perception capabilities, also known in the AI ​​industry as “realistic spatial understanding,” which allow it to reason with complex text, interpret scientific images, and interact with visual content in human-like ways.

The company offered several examples of how Grok-1.5V could be used in the real world. For example, it can be used to translate drawings into children’s stories, identify which object in a group is the largest, help drivers by checking if there is enough space to maneuver around an obstacle, insert a table into the CSV File format to convert or identify when a wood deck is rotting and needs to be replaced. It even explains the context of internet memes that the user doesn’t understand.

XAI provided some benchmark results and said Grok-1.5V outperforms industry peers such as GPT-4V, Claude, 3Sonnet, Claude 3 Opus and Gemini Pro 1.5. The company found that Grok-1.5V significantly outperformed its competitors in a new benchmark called the RealWorldQA Benchmark, which the company developed specifically to measure spatial understanding in the real world.

The multimodal version of Grok arrives less than a month after Musk’s company unveiled the standard LLM Grok-1.5, which offered superior coding and math capabilities to its predecessor, Grok-1. Grok-1.5 also showed that it can handle much longer contexts than the original, meaning it can examine data from more sources to improve the accuracy of its answers.

xAI says Grok-1.5V will soon be made available to early testers, starting with subscribers to X’s Premium+ service, which offers additional benefits to users of the social media site formerly known as Twitter.

The startup has come a long way very quickly since its founding in July 2023. Musk said at the time that he was starting the company in response to the “black box” approach of AI developers like OpenAI and Google, which is very secretive about how their AI models work. Musk said the goal is to create AI that is more transparent and accountable than the work of his competitors.

