Bug of the week #12

The error we’re handling today is a real-life example simplified to the following Python code:

Python

# user inputs several numbers in form of strings (1, 2, 3 and 123)
# which are then used to construct 2 strings
s1 = "".join(["1", "2", "3"])
s2 = "123"

# if the numbers in the constructed strings are different,
# the program flow should stop
if s1 is not s2:
    raise RuntimeError("s1 is not s2!")

# program flow continues...

# user inputs several numbers in form of strings (1, 2, 3 and 123)
# which are then used to construct 2 strings
s1 = "".join(["1", "2", "3"])
s2 = "123"

# if the numbers in the constructed strings are different,
# the program flow should stop
if s1 is not s2:
    raise RuntimeError("s1 is not s2!")

# program flow continues...

The variables s1 and s2 both end up being 123, but when we run this code, we see:

MDX

Traceback (most recent call last):
  File "/home/main.py", line 9, in <module>
    raise RuntimeError("s1 is not s2!")
RuntimeError: s1 is not s2!

Traceback (most recent call last):
  File "/home/main.py", line 9, in <module>
    raise RuntimeError("s1 is not s2!")
RuntimeError: s1 is not s2!

What’s even worse, in the other parts of the app, where s1 and s2 are constructed differently, there are the same comparisons:

Python

s1 = "123"
s2 = "123"

if s1 is not s2:
    raise RuntimeError("s1 is not s2!")

# program flow continues...

s1 = "123"
s2 = "123"

if s1 is not s2:
    raise RuntimeError("s1 is not s2!")

# program flow continues...

But this time the exception is not thrown, so the same comparison seems to give different results for the same s1 and s2 strings.

What does it mean?

The issue comes from misunderstanding the difference between identity and equality in Python. The is operator does not check whether two variables contain the same value. It checks whether both variables point to the exact same object in memory.

== asks: “Do these objects have the same content?”
is asks: “Are these literally the same object?”

For example:

Python

s1 = "123"
s2 = "123"

print(s1 == s2)   # True
print(s1 is s2)   # maybe True, maybe False

s1 = "123"
s2 = "123"

print(s1 == s2)   # True
print(s1 is s2)   # maybe True, maybe False

The first comparison checks the values of the strings, so it reliably returns True. The second comparison depends on Python’s internal memory optimizations. Sometimes Python “interns” strings, meaning it reuses the same object for performance reasons. In those cases, s1 is s2 may accidentally return True, but not always.

In this version:

Python

s1 = "".join(["1", "2", "3"])
s2 = "123"

print(s1 == s2)   # True
print(s1 is s2)   # False

s1 = "".join(["1", "2", "3"])
s2 = "123"

print(s1 == s2)   # True
print(s1 is s2)   # False

Both strings contain exactly the same text, but they were created differently, so Python stores them as separate objects. This makes is unreliable for value comparison.

How to fix it?

Use == whenever you want to compare values. Correct version:

Python

s1 = "".join(["1", "2", "3"])
s2 = "123"

if s1 != s2:
    raise RuntimeError("s1 is not s2!")

s1 = "".join(["1", "2", "3"])
s2 = "123"

if s1 != s2:
    raise RuntimeError("s1 is not s2!")

Reserve is for cases where you truly want to check object identity. The most common use is comparing with None:

Python

if value is None:
    print("No value provided")

if value is None:
    print("No value provided")

Read also: