Below you can find the code that we will be reviewing today. Before you scroll down to section where I fix it, feel free to check how many mistakes you can find.
def CountOccurrences(input):
result = {}
for i in range(len(input)):
item = input[i]
if item in result.keys():
result[item] = result.get(item, 0) + 1
else:
result[item] = 1
keys = []
for key in result.keys():
keys.append(key)
keys.sort()
output = ''
for key in keys:
output += str(key) + ': ' + str(result[key]) + ', '
return output[:-2]
This function is meant to take a list of strings, count the occurrences of each individual string and return these counters in a form of a dictionary serialized to a string. Let’s begin with the function signature:
def CountOccurrences(input):
First line and already 2 bad practices:
- name of the function
- missing type annotations
Name of the functions in Python as suggested by PEP-8 should be snake case, not camel case, so here it should be:
def count_occurrences(input):
The type annotations are not required in Python (after all it’s a dynamically typed language), but providing them is a good practice because they direcrly inform the user what types are expected on the input and on the output:
from typing import List
def count_occurrences(input: List[str]) -> str:
If you want be even more expressive, you can create aliases for the actual types:
from typing import List
StringContainer = List[str]
SerializedDictionary = str
def count_occurrences(input: StringContainer) -> SerializedDictionary:
This may an overkill for the simple types like in this case, but it’s very useful for more complex types. You gain not only more information about what such complex type represents, but what’s most important, if you ever change the underlying types, but preserve their API, you must introduce a change only in one place (in the type alias) instead of everywhere, where the type has been used so far.
Note for beginners: remember, that type annotations don’t enforce the specified types, so you can’t prevent this way calling your function with an argument of invalid type. These are information for the developers (so that they know what types to use) and for IDEs (so that they know what lines to underline). If you want to enforce type correctness, your function must perform an explicit type validation using e.g.
isinstance
before using the received argument.
Let’s move to the the next part of the function:
for i in range(len(input)):
item = input[i]
if item in result.keys():
result[item] = result.get(item, 0) + 1
else:
result[item] = 1
First of all, there is no need here to iterate the list with an index, so to avoid overcomplicating loop, we should just iterate directly over its elements. This way we also avoid the redundant assignment item = input[i]
:
for item in input:
Secondly, there’s no need to check for the item
explicitly in result.keys()
– the following line does the same work in less code:
if item in result:
Finally, an if
statement in this case is redundant anyway because using get(item, 0)
already does the checking work for us, so the entire if
statement and incrementation may be replaced with just one line:
result[item] = result.get(item, 0) + 1
Next in line is part responsible for sorting the items, so that in the output we have the in an alphabetical order.
keys = []
for key in result.keys():
keys.append(key)
keys.sort()
All of these 4 lines can be replaced just by one:
sorted_keys: List[str] = sorted(result)
The last challenge is finding a way how to get rid of this ugly dictionary serialization:
output = ''
for key in keys:
output += str(key) + ': ' + str(result[key]) + ', '
return output[:-2]
It’s ugly because string concatenation like above is inefficient and imposes on us some additional things to do, like removal of the trailing coma. This can be improved by using join
function which (as the name suggests) will join our sorted_keys
list, separate its elements with coma and return it in a form of a string:
output = ', '.join(f'{key}: {result[key]}' for key in sorted_keys)
return output
Eventually, our refactored function looks like this:
from typing import List, Dict
StringContainer = List[str]
SerializedDictionary = str
def count_occurrences(input: StringContainer) -> SerializedDictionary:
result: Dict[str, int] = {}
for item in input:
result[item] = result.get(item, 0) + 1
sorted_keys: List[str] = sorted(result)
output = ', '.join(f'{key}: {result[key]}' for key in sorted_keys)
return output
Shorter, simpler and easier to maintain. If we now call it like this:
print(count_occurrences(['apple', 'banana', 'apple', 'orange', 'banana', 'apple']))
The output will be as follows:
apple: 3, banana: 2, orange: 1
Note for advanced: if you have already some experience with Python, you can most probably see that almost entire function above can be replaced using
Counter
object fromcollections
. I didn’t want to use it because it would hide many important points in the “bad” function and that’s just not the idea of this article, but to make the article complete, below you can find the function’s implementation usingCounter
:
from typing import List
from collections import Counter
def count_occurrences(input_list: List[str]) -> str:
counter = Counter(input_list)
return ', '.join(f'{key}: {value}' for key, value in sorted(counter.items()))