Let’s define word “better”
In the title of this series, I used the term “better”, but as an engineer, I can’t begin my research with such a vague objective. The world of software engineering is so complex and varied that word “better” can have very different meanings depending on the use case.
Let’s take C++ for example – what is C++ good at? Well, I think we all agree that it’s good at performance. There is a great talk from Jon Kalb during which he says that uncompromised performance is embedded so deeply into C++’s identity that whenever we need to justify why C++ sacrifices safety, readability or something else in favor of performance, we could just say “because this is C++”. Focus on performance is an arbitrary design decision.
Because I’m searching for a process improvement for myself, I will also define a set of general rules which I consider as good and which I can use later to judge other solutions and maybe work out my own. I wanted this list to be short, so I ended up with the following set of properties which define a “better” software development process for me:
- consistency and unambiguity
- resistance to individual preferences
- ease of development and conciseness
I imagine that it may be difficult to understand straight away what hides under each of these points in terms of software development, so let’s analyse them on few examples.
Consistency and unambiguity
Nothing drives me so crazy as inconsistency. I even sometimes prefer uglier solutions, just to keep the consistency with the rest of the system. It may be seen as an individual preference, but I’ll actually try to defend it as “rational at least in most cases”. It’s because that for quick and effective software development, our tools and systems must be consistent. Otherwise, if you don’t know what to expect or you’re being surprised on every development iteration, you fall into a deep hole of uncertainties in which any predictions, estimates or plans seem impossible.
Consistency and unambiguity in language syntax
Let’s start with some examples. What does this code do?
void Function()
{
int* a = new int(12);
std::cout << *a << std::endl;
delete a;
}
Well, it creates a pointer, allocates memory, prints number 12 and deallocates memory. And what does this code do?
void Function()
{
std::shared_ptr<int> a = std::make_shared<int>(12);
std::cout << *a << std::endl;
}
Functionally, it does exactly the same thing. Let’s then take a deeper look on what characterizes each of these code snippets.
The first one is less safe because it forces user to manage memory manually, however it doesn’t introduce the performance overhead related to the atomic reference counter in the std::shared_ptr
, so it resonates well with the C++’s focus on uncompromised performance. This code is also forbidden to use in many modern commercial C++ projects.
The second one is safer because it deallocates memory automatically, however it introduces some performance overhead, which may be visible especially in multi-threaded application. This code is also recommended in many modern commercial C++ projects.
Do you see where am I going with this? C++ reached a point in which we not only have 2 completely different ways of doing the same thing, but we actually banned the one which fulfills the main assumption of this language. To make things even funnier, remember that smart pointers, which are the recommended way, provide get
function which allows you to access the raw pointer anyway.
To make syntax of a programming language unequivocal, it seems like there should be an important rule:
There should be one and only one way of using each language feature.
Consistency and unambiguity in safety and error handling
I’ve seen many ways of handling errors in C++, each of them defended by die-hard engineers. Let’s take a look on the 2 most popular ones:
- error code based:
ErrorCode code = DoSomething();
if (code != ErrorCode::kSuccess) {
std::cerr << "Error when doing something: " << code << std::endl;
}
- exception based:
try {
const int value = DoSomething();
}
catch (const std::exception &e) {
std::cerr << "Error when doing something: " << e.what() << std::endl;
}
Of course, both of them have their pros and cons. In first example, exception-loving person would say that not only such approach prevents the function to return a usable value (because it must return an error code), but what’s even worse, the returned error code may remain unchecked or checked in an incorrect way and the code flow goes on despite the error. Moreover, the success path is the path which will be executed in most cases, but you still have to explicitly check the error code every time. On the other hand, with exceptions you pay the cost only if an error actually occurs. In eyes of such engineer, the second example must be better because the exception terminates the flow of the code immediately when an error occurs, so there is no need for manual checking.
The error code-lover would however ,see this situation a little different. Catching exception is not enough because what really matters is assuring a valid state of the system after the exception has been thrown (strong exception safety). Does your function have enough knowledge to handle the exception? If no and you must let the exception go “up”, will the caller handle it properly? Will any of the callers up the stack catch that exception or will it end up terminating the whole program? In case of error codes, such problem don’t exist and the handling of every error is predictable and very easy to read.
Which one is better? I still didn’t find an objective and final answer, but what I know is that if error handling is so inherently embedded into the software development process, it seems like a good assumption that:
There should be one error handling mechanism which is embedded into the language syntax.
Resistance to individual preferences
Many engineers build up their strong individual preferences over the years of software development. These habits may be so strong that things like where to put curly brackets, how to call class members or where to put &
in the function’s argument list rise to the rank of religion and may consume large amount of time during the meetings which would be otherwise spent on solving the actual problems that the software was initially intended to solve. Below I present few examples on how these habits manifest themselves.
Resistance to individual preferences in project structure
Let’s look at these 2 examples of 2 different high level project structures:
project_1
|- include
| |- my_library
| |- my_library.h
|- src
| |- my_library
| |- my_library.cpp
| |- main.cpp
|- CMakeLists.txt
project_2
|- my_library.hpp
|- my_library.cpp
|- main.cpp
|- CMakeLists.txt
Look at the first example and answer the question: do you agree with such project structure? I’m sure that depending on your personal preferences and things you’re just used to, the answer may vary:
- maybe you would rather use camel case for file names?
- maybe you would rather put
my_library
in its dedicated folder and createinclude
andsrc
folders inside of it? - maybe you would rather keep header files together with .cpp files in one
my_library
folder?
None of this questions has a strict and objective answer, so let’s ask a different question: do you agree that such structure is:
- clear
- scalable
- functional
- maintainable
If the answer is yes, then why do we even consider spending time on searching answers for the 3 previous questions if they have basically no impact on the project development? You may say that it’s because you can come up with a different project structure which is clear, scalable and functional as well. If this is true and all these conditions are fulfilled, I can blindly agree with you to follow your structure instead of the one proposed by me above. The point here is that the project structure should be assessed from the perspective of fulfilling certain project needs, not from the perspective of individual, subjective preferences.
Now let’s look at the project structure from the second example in which someone decided to keep all the source files in the repository root folder and answer the same question: do you agree with such project structure? If you have any experience in professional software development, I bet you don’t agree. What would you do if a new library would require a header file with the same name as some other file? Normally, this wouldn’t be a problem, but you can’t have 2 files with the same name inside one folder, so now you have a problem. Do you imagine multiple teams working on such repository? This structure does not only seem bad, it actually is bad because it prevents efficient software development.
All of this means that there is a set of project structures which can be objectively considered as good and a set of project structures which can be objectively considered as bad. By “objectively” I mean that the judgment doesn’t come from the preferences of individual engineers, but from certain project assumptions and goals. It’s a set of features what makes a project structure good or bad and not its appearance related to e.g. camel case file names. If a project structure would be picked once and enforced, this issue disappears. Moreover, if a structure is precisely defined, a tool chain may utilize that fact to improve the user experience. There will never be an ideal structure for every one, but since we don’t live in an ideal world, I’ll sum it up with such statement:
The cost of not having a predefined project structure is greater than the cost of necessity to adjust to the imperfect structure imposed by the process.
Resistance to individual preferences in dependency management
Almost every software project (except these really tiny ones) has some dependencies to a third party software components. If you ask Python developers how exactly should you integrate a library into a software project, they’ll say that it’s very simple – you just create a virtual environment, call pip install
for whatever you need, call pip freeze > requirements.txt
on whatever you have and done. From now on you have a requirements.txt
file which every other Python developer knows what to do with.
In C++ world however, such question may be controversial and result in a long debate. Maybe build from source? But should we then clone from original repository or create a mirror one? Who will maintain that mirrored version? What if a library is not open source? Maybe use a package manager like Conan then? But what if certain library is not available in Conan repository? What if the library is available, but is not available in a specific version required by your project? Maybe it’s better to maintain your own artifactory with binaries which you could download and link against?
I’ve heard all these questions many times in different projects and it made me thinking that it should be the job of the tool chain to do the heavy lifting in that topic. The tool chain should let the developer decide only about what dependencies, in what form (source or binary) and in what versions are needed. I would formulate the final conclusion as:
The mechanism behind dependency management must not be the responsibility of the application developer.
Resistance to individual preferences in conventions used
Look at the following code snippets and try to answer a question: which one is better?
class SomeClass
{
public:
SomeClass() : private_member_{0} {
std::cout << "Constructed" << std::endl;
}
private:
int private_member_;
};
class SomeClass
{
public:
SomeClass()
: m_privateMember{0}
{
std::cout << "Constructed" << std::endl;
}
private:
int m_privateMember;
};
In both cases we have a perfectly valid C++ code, so why is that, that for many people choosing one of them is often a matter of life and death? There were some approaches to the topic, for example Python mostly got rid of formatting preferences – there are no brackets and the number of tabs impacts the flow of code, so the formatting is sort of enforced by the language itself. Python has also some successes when it comes to the naming convention of the class member variables – it successfully enforced that a private member variable must begin with __
and a public member variable must not, but it failed in case of protected member variables. Only by convention, they should begin with a single underscore, but it is not enforced. In C++, where there is no common convention, not mentioning about enforcing any rules in that area, too much freedom leads to waste of time and no real impact on the software. You may now ask: so you completely don’t care about the conventions used? Well, of course I have my favorite conventions, but the longer I work in a complex environment of a big commercial software projects, the more I care mostly about the code to be consistent and easy to understand. The actual thing which helps with that is:
It is not important which convention is used, but it is crucial to have a single convention which is consistently used across the entire code base.
Ease of development and conciseness
Easier development means less time spent on implementation and less time spent on implementation means more money. For this reason it seems obvious that ease of development should be one the crucial factors when designing new tools, however it is surprisingly difficult to define when the development actually is easy. I’ve once heard that:
The programming language does not allow for easy development if you spend more time on debugging your programming language knowledge than the actual problem you’re trying to solve.
Let’s take some trivial example to have something to work on like printing “Hello world”. At the first glance, it may look like scripting languages should be the kings. In Python, that’s a one-line job:
print("Hello world")
In Bash it’s a 2-liner:
#!/bin/bash
echo "Hello world"
Comparing C++ to it looks like a sad joke. However, for some reason we don’t use scripting languages everywhere. We want the code to be compiled instead of interpreted, we want to split the binary artifacts into separate libraries, we want to decide if these libraries should be linked statically or dynamically.
Some of you may say now that it’s basically comparing apples to pears and how can I even put in the same sentence an interpreted scripting language like Python and compiled language like C++. Well, I can because in terms of the interface that language offers, it doesn’t matter. Technically, there’s no obstacle to make C++ interface simpler and more concise (Nim somehow managed to do that), the same goes for the tooling. It’s just a matter of design decisions and the ratio between things left to be done by the user and things handled by the tooling.
But is Python really that easy for all the people? Is C++ hard for all them? Unfortunately, both answers are: no. This makes me thinking that we just have to accept the fact that “ease” of development is always subjective and any decision in that area must be arbitrary. However, to give myself some anchor point, I’ll summarize it as:
The tool must not require from the engineer more attention than the actual problem to solve.
Being concise is also difficult to define, but it seems it’s at least easier to agree on when something is too verbose. Most of the C++ developers that I’ve met tend to share a common view that C++ requires just too much typing even for simple things. Look, for example, on the line defining a constant shared pointer to a constant object and allocating memory for it:
const std::shared_ptr<const int> ptr = std::make_shared<const int>(12);
That’s 71 characters to achieve such a basic thing and I didn’t even count the #include <memory>
which is also mandatory in this case. Another example could be defining a simple function object which adds two floats:
std::function<float(float, float)> func = [](float a, float b) -> float { return a + b; };
This one is 90 characters long. Yes, I know that -> float
part can be omitted, but here I also didn’t count characters in #include <functional>
directive which is additional 21 characters. Too long even if I get rid of lambda return type.
So what does “better” mean?
Defining word “better” turned out to be much more difficult than I expected, but the definition I’ve created for myself says, that the software development process is better when it has the following properties:
- consistent and unambigous
- there should be one and only one way of using each language feature
- there should be one error handling mechanism which is embedded into the language syntax
- resistance to individual preferences
- the cost of not having a predefined project structure is greater than the cost of necessity to adjust to the imperfect structure imposed by the process
- the mechanism behind dependency management must not be the responsibility of the application developer
- it is not important which convention is used, but it is crucial to have a single convention which is consistently used across the entire code base
- ease of development
- the tool must not require from the engineer more focus than the actual problem to solve
After reading that, you may have an impression that I just hate freedom that languages like C++ or Python give. You may be right, at least in some part. Albert Einstein once said “Make things as simple as possible, but no simpler” and if I was to sum this article with only one sentence, I would paraphrase Albert Einstein’s quote to:
The software development process should give as much freedom as necessary, but no more.