LOG IN
← Back to all posts

Deep Dive: Why LLMs can count r's in Strawberry?

by Dima Maleev
Aug 21, 2025
Connect

Hello, Dear Nerds,

Imagine this: OpenAI releases a new model, as they are saying, a "state-of-the-art" model - GPT-X. They are announcing that they can beat any PhD in their science, solve math, and write code better than any engineers. The whole internet is filled with posts that "a new era of AI" is coming, and we all will be replaced. 

In 24 hours, you can open LinkedIn or Twitter and see millions of posts about LLMs' miserable failure to count letters in words. It can be counting `r` in `strawberry` or just counting letters in any other word. 

But why? Why such an amazing technology can't understand really simple things? Today's deep dive, I would like to spend on finding the answer - why LLMs can do so many amazing things, but can't calculate letters? So the next release of LLMs you won't look like a caveman trying to hammer nails with a microscope 

Note: This deep dive will not touch on different LLM architectures and can be used only to understand how models work in the big picture. More links for deeper understanding are provided at the bottom of this deep dive.

Tokens

So, everyone is talking about tokens. Any LLM you check will have something like a max token window, a price for input tokens, and an output token. But what is a token, actually?

A token is a chunk of text that the model understands. And yes - this is not a text. Computers are working with numbers. As always, there are different algorithms on how to generate tokens. Models like OpenAI are using Byte-Pair Encoding, and you can find more details about it here. 

However, let's deep into tokenization on simple words. 

Let's take the word `nerd`. For a computer to understand it, we need to convert it to binary code with only 0 and 1. 

`nerd` = 01101110 01100101 01110010 01100100

Nothing surprising, right? Just represent every character in binary format. The first 8 bits will give you a byte that can represent ( if converted from binary ) a number from 0 to 255. 

So now, you can replace this string with a list of numbers ( which are, in this case are just Decimal inthe  ASCII table:

01101110 01100101 01110010 01100100 = 110 101 114 100

But this is not a token. Yet.

Subscribe to keep reading this post

Subscribe

Already have an account? Log in

Loading...
So Nerdy Planet #11
So Nerdy Planet! is a weekly ( at least I am trying ) newsletter about different news that seems to be important. In this newsletter, we will speak about: How AI saved from a fake interview One more protocol for working with AI Parallel coding agents as a new development style AI saved an engineer from a hack on a fake interview No one likes when a test task is a part of the interview proces...
So Nerdy Planet #10
Hey Nerds! Welcome to issue #10 of this newsletter.  So today we're gonna speak about: New OpenAI and Stripe Integration, and also Agentic Commerce Protocol Ads based on conversation with AI  OpenAI ( again ) and personalized AI New OpenAI and Stripe Integration. ACP There are plenty of predictions of how AI will grow. CloudFlare got ones, NVIDIA has its own, and so on. To cut a long story s...
So Nerdy Planet #9
Hello, My Fellow Nerds!  This is the last newsletter with issues numbered with only one digit :) And actually one of the last with three digits of subscribers :D Thanks for such amazing trust! Before we start - if you'd like to send news, or provide feedback - please feel free to ping me through dima@sonerdy.me  Today we will speak about: NPM ecosystem vulnerability and how it impacts the whol...

So Nerdy Planet

Weekly posts about Engineering Management, Technology, and all the Nerdy stuff!
© 2026 So Nerdy!
Powered by Kajabi

GET THE FREE GUIDE

Enter your details below to get this free guide.