Google’s AI Model With ‘Reasoning’ Capabilities Is Finally Here. It Can Solve the Classic River Crossing Puzzle

Google is making significant strides in challenging OpenAI. The launch of the Gemini 2.0 series of AI models has garnered attention, particularly with its AI agent Project Mariner. However, it also introduces a remarkable new feature: the preview version of Gemini 2.0 Flash Thinking, an AI model with “reasoning” capabilities similar to OpenAI’s offerings. Its performance is impressive.

Users can try out Gemini 2.0 Flash Thinking in AI Studio by selecting it from the list of available models. Once selected, you can enter all sorts of questions, particularly those involving mathematical and logical problems. This is where the model truly shines and demonstrates the ability to backtrack and review its answers.

To demonstrate its capabilities, I propose a small experiment: Can you solve two problems that Gemini 2.0 Flash Thinking solved easily?

For the first problem, consider the following image of billiard balls numbered with different digits:

The challenge is to find a combination of three balls that adds up to 30. Take your time.

Do you have an answer? It may seem that there’s no possible combination. However, there’s a trick: You can turn ball number 9 and transform it into a 6. With this little adjustment, you can create a combination (6 + 11 + 13) that successfully solves the problem.

Logan Kilpatrick, the head of AI Studio, presented the new model and demonstrated its capabilities using the same example. If you click on the image above, you’ll see Gemini 2.0’s reasoning process and how effectively it can detect the “trick” needed to solve the problem. It’s truly impressive.

The second example is equally striking. There are numerous logical problems you can use to test these AI models. One example is the classic river crossing puzzle, which a Reddit user presented in a text adapted for a chatbot.

The scenario involves a father, a son, a monkey, and some food that need to cross a river while adhering to several conditions:

They must use a small boat to cross the river.
The boat can carry a maximum of two items, but it can also carry just one.
The boat can’t cross the river on its own.
Only the father or the son can pilot the boat, although they can both travel together if necessary.
The food can’t be left alone with the son because he’ll eat it.
The food can’t be left alone with the monkey because it’ll eat it.

How does the father manage to get everyone and everything across to the other side?

The solution proposed by Gemini includes a step (number 4) that the chatbot refers to as “counterintuitive,” which may indeed seem that way at first.

More from Xataka En

ChatGPT Search Has One Big Disadvantage Compared to Google’s Search Engine. It’s Better in Almost Everything Else

Once the problem is introduced, Gemini will analyze the instructions to break them down, and then will begin to “experiment.” In less than a minute, the model sends a striking solution, which involves the following steps:

The father takes the food across the river.
The father returns alone.
The father takes the son to the other side.
The father comes back with the food to prevent the son from eating it.
The father leaves the food and takes the monkey to the other side of the river.
The father returns alone.
The father takes the food to the other side.

And just like that, the problem is solved!

Claude 3.5 Sonnect couldn’t solve the puzzle.

Although this puzzle isn’t particularly difficult for humans, it can be quite complex for models of this type. For example, I tested it on Claude 3.5 Sonnet, and after contemplating the issue a couple of times, it inquired whether the puzzle was impossible to solve.

Tests like this demonstrate that Google’s “reasoning” model goes a step further and is especially useful in challenging scenarios. DeepMind chief scientist Jeff Dean said in an X post that the new model is “trained to use thoughts to strengthen its reasoning.” His statement may be somewhat controversial, given that many will debate comparing what chatbots do to actual “thinking.” However, the reality is that this approach goes beyond a stochastic model that merely generates text based on its training set.

Models like Gemini 2.0 Flash Thinking take longer to respond, but it’s fascinating to see how they work, analyzing logical puzzles and attempting to solve them.

I’ve also conducted a third test and asked it to count the R’s in a sentence. This isn’t strictly a logical problem, but Gemini got it wrong. Even when I asked it to double-check, it continued to provide the incorrect answer repeatedly. It’s impressive in some areas, yet surprisingly poor in others that may seem trivial.

Image | Google | Xataka using Freepik

RECEIVE "Xatakaletter", OUR WEEKLY NEWSLETTER