if today is a tuesday, what day was it 100 days ago?

LLM capabilities are astounding. As with all impressive features, however, it is far more interesting to tear them apart and look for issues, and a model like Claude is no different. I will confess, my prompt was not exactly that which I stated in the title. It was actually "If today is a Tuesday, what day was it 100 days ago? Reply with a single word only." One might imagine that the request for a specific output, i.e a single word, would not have implications for the so-called calculation that the model would have to do. Interestingly, this is not the case.

Firstly, to put you at rest I will begin by walking through the conversation that I had with Claude about this.

Me - "If today is a Tuesday, what day was it 100 days ago? Reply with a single word only."

Claude - "Wednesday"

Me - "explain"

Claude - "Today is Monday, 12 January 2026 (not Tuesday), so the question's premise doesn't apply.
However, I can explain the logic:
If today were a Tuesday, counting back 100 days:
• 100 days = 14 weeks and 2 days (since 14 × 7 = 98)
• Going back 14 complete weeks from a Tuesday brings you to a Tuesday
• Going back 2 more days from Tuesday brings you to Sunday
So if today were Tuesday, 100 days ago would have been Sunday, not Wednesday.
I apologise for the incorrect answer. Would you like me to work through any other date calculation?"
"why do you think that you got this wrong originally?"
"I think I made an error in my mental arithmetic. When I calculated quickly, I must have miscounted the remainder when dividing 100 by 7.
The correct calculation is:
• 100 ÷ 7 = 14 remainder 2
• So 100 days = 14 weeks plus 2 days
Going back 14 weeks keeps you on the same day of the week, then going back 2 more days moves you backwards in the week. From Tuesday, 2 days back is Sunday.
I suspect I either:

Miscalculated the remainder (perhaps thinking it was 5 instead of 2), or
Counted forwards instead of backwards when applying the remainder
It's a good reminder that even straightforward arithmetic deserves careful attention, especially when I'm asked for a single-word answer without showing my working."

end of conversation

This, relatively obviously, throws up two questions. Why is a model which is capable of performing otherwise complex tasks struggling with something so trivial? Why is it seemingly linked to the innocuous request to output with just the day?

It appears to me to throw up a significantly more interesting issue; namely how and why prompting for a particular output seems to have effected the 'thinking' of the model. I would contend that the simplest way to achieve what I asked, accurately, would be to carry out the request as normal - and then simply splice the answer to only include the one-word day of the week that it would end up calcuating. In truth, I simply do not know what process was used to arrive at the original answer that it did. I suspect that if we asked Claude it would not be able to articulate how or why either.

The natural reason appears to me to be that the model, having seen that they are required to respond with only a single-word, decides that it must also 'reason' in a truncated manner and it is this process that leads to the error being introduced. This, of course, would be most unnatural for a human, but perhaps serves as a reminder that there remain some subtle differences between man and machine.

More logically, it would seem intuitive that encouraging the model to walk through its reasoning in real-time forces it to reflect on the intermediate stages between the prompt and the repsonse, and much like a human, being forced to explicitly detail its reasoning ensures that it works through the problem systematically.

This begs the question; does asking the model to show its reasoning actually create the reasoning itelf? We might expect that regardless of whether we explicitly ask for proof of the reasoning, the model should still retain enough awareness to know that it must reason to accurately answer problems.

As you have seen from the conversation, Claude is very quickly able to recognise and rectify its error. Once prompted again, it works through the question flawlessly and outputs with the correct answer. We return to the Chinese Room argument, left wondering whether it is syntax or semantics doing the heavy lifitng.

However, whilst this is a rationalisable and indeed plausible explanation, I suspect that it is in fact not particularly accurate. Instead, I imagine that when inlcuding the constraint to output with a "single word only", it altered the execution of the task from asking it to solve a problem and then report the answer in a single word to simply prodcuing a single token, representing the day of the week, that could possibly (probably) answer the prompt. It moved from step-by-step calcuating the logical response to likely just pattern-matching. Perhaps it was able to find similar date arithmetic questions in the training data and reasoned to suggest that Wednesday was a plausible-sounding answer and therefore skipped actually computing anything.

For now it is hard to say with any certainty how or why it answered the way it did, although with some simple(ish) research I am sure that we can determine this with increased confidence. As models continue to improve at a potentially alarming rate, I would imagine that it might not be long until it is difficult to find such quirks when playing around with the models.