Let’s talk about naming variables.
Today I saw something like this in the code base at work. No I’m not sharing the actual code but it’s a good enough representation:
use_sku = False
products = Product.objects.filter(store=store)
customers = Customer.objects.filter(store=store)
df_products = pandas.DataFrame(products)
df_customers = pandas.DataFrame(customers)
# Imagine some_condition = some long comparison expression
use_sku = True
return df_products, df_customers, use_sku
def __init__(self, store):
self.store = store
self.df_products, self.df_customers, self.use_sku = RecommenderUtility.get_recommendations_related_data(self.store)
df_user_recommendations = self._get_user_recommendations()
df_bestseller_recommendations = self._get_bestseller_recommendations()
return df_user_recommendations, df_bestseller_recommendations
# Init class and generate recommendations
store = Store.objects.get(name='spiderman_photos') # (1)
recommender = Recommender(store) # (2)
df_user_recommendations, df_bestseller_recommendations = recommender.generate_recommendations()
So the code does something like this, in order:
- Get Store object with name “spiderman_photos” using QuerySet (Django’s built-in Object-Relational Mapper or ORM)
Recommenderclass and store in
generate_recommendationsmethod to generate recommendations for the specific store
But here is a question. Imagine you are tasked to find out why some recommendations are coming out funny and you had to examine this code in detail for the first time. Can you tell what
use_sku is for? What data type do you expect it to be, and what information do you expect it to represent?
A quick primer on “SKU” – it is an acronym for Stock Keeping Unit, which is the basic unit for keeping track of inventory in a typical shop (traditional and e-commerce). Specific to this case we are looking at an e-commerce store, which is probably the biggest use case for recommender systems nowadays.
When I first saw this in the code base and had to troubleshoot the problem without any helpful class and method docstrings to rely on, I had a few questions:
- Can an e-commerce store use anything besides
skuto keep track of inventory?
- Since a conditional statement in
True, I can assume the value of
use_skuis dependent on data in the database. But what is it based on exactly?
- Are other methods in the
Recommenderclass also similarly dependent on the
use_skuvariable (which at this point would serve more like a flag for “this store uses SKU”)?
These were the questions I had to ask myself because the answers were implicit, as opposed to explicit, in the code. It was my job to follow the trail and understand what
use_sku is really trying to represent and its importance in this class.
But this should not be the case! The onus of clarity and ease of code comprehension should lie strictly in the hands of the programmer who is writing the code, because that person knows the intent of every variable, statement, and expression the clearest. Every subsequent reader of the code will have to deduce the original programmer’s intent, unless that programmer takes care to leave signposts, like class and method docstrings and short comments.
Improvement 1: Rename the variable to make sense on first look
Here is an example of an improvement that is relatively easy to implement and can potentially save a lot of time from deduction.
# Instead of
use_sku = True
# How about something like
store_uses_sku = True
Ambiguity is enemy number one. Leave no room for different interpretations of what a variable is supposed to do if you can help it!
Improvement 2: Add docstrings to every class and method
Another improvement that can be made with almost no additional effort is writing docstrings. Docstrings are basically multiline comments that appear directly below the
class declarations in Python. Practitioners of various programming languages advocate different ways of doing this, but I really like Python docstrings – they are systematic, easy on the eyes (those “””), and they even have built-in support even though I don’t personally use it (you can call
Execute common tasks needed for Recommender system.
Query database and return data from tables relevant to creating recommendations.
Relevant tables: Product, Customer.
Return Tuple (df_products, df_customers, use_sku):
DataFrame df_products: products dataframe
DataFrame df_customers: customers dataframe
Boolean store_uses_sku: flag indicating whether the store uses SKU as primary id for products
There are definitely more ways to improve on this. What other low hanging fruits have I missed?