Author: Chloé Lourseyre
Editor: Peter Fordham
As far as containers go in C++, since std::vector
is well suited for most cases (with the occasional std::map
when you need key-value association1), it’s become easy to forget that there are other types of containers.
Each container has its strength and weaknesses, so if you are one to forget what its specificities are, this article is a good start.
Disclaimer: Not all C++ containers are listed, only the most usable ones in my opinion. If you want to go further, you’ll find two very useful links in the addenda, at the bottom of the page.
How to choose a container
Criteria
First™ criteria: sequence or associative?
The first question you have to ask yourself when you need a container is: it is a sequence container or an associative container?
In sequence containers, data is organized in an ordered and sequential way, with each value following the previous one. In memory, it doesn’t have to be contiguous (and often isn’t), but in practice, you access a value by knowing its index inside the container.
Unlike the sequence containers, associative containers do not store data as a sequence, but by associating a value to a key. Instead of using an index to refer to a stored value, you use its key.
Sequence containers criteria
- Is the size of the container static? If so, an
std::array
is what you need (and it’s not really impacted by the other criteria). In any other case, you will need to answer other criteria. - Will the size of the container vary a lot? This criterion is important for memory allocation. If you make the size of a container vary a lot when it’s not designed for it, you may experience slowness and memory over-usage2.
- Is the order is important? This criterion relates to data structures where you can add and erase value only at the beginning or at the end, while the middle remains hidden (FIFO and FILO structures).
- Do you need to add/erase values at the extremities of the container (at the beginning or at the end)? Some containers do that with maximum efficiency.
- Do you need to be efficient at inserting/deleting data in the middle of the structure? Same as the previous criteria, sometimes you do a lot of additions and deletions, but in the middle of the container. There are also structures efficient for that purpose.
- Do you need to find the nth element? Depending on your container, search is not always optimal.
- Do you need to merge collections? Depending on your container, structure merge is not always optimal either (some need additional allocation and some don’t).
Associative containers criteria
- There are two kinds of associative arrays: the ones who associate a key with a value (possibly of different types), that’s
map
s, and the one for which the key is the value itself, that’sset
s.
- By default, the keys are unique. But there is a variant when keys can have multiple entries. These are the
multimap
/multiset
.
- The inner structures of these containers can be implemented in two ways. By default, they are ordered by key. But they can also be unordered and use hashable keys. These are the
unordered_
version of each.
Note: the ordered/unordered distinction is important for associative containers. Most of the time, ordered associative containers use balanced binary trees while the unordered ones use hash tables. This have an impact on performance and on the fact that, for hash tables, you don’t need to implement an order operator).
Containers’ matrix
I present you with two matrices (one for sequence containers and one for associative containers).
Each matrix represents for which criteria it is best to use3.
Sequence containers
(for the queue, insertion is at the front and deletion at the back)
Associative containers
Flowchart cheat sheet
Here is an easy-to-read and printable flowchart that will help you choose which container to use in any situation4 : Joe Gibson’s data structure selection flowchart.
But what happens in real life?
As I said in the introduction of the article, most real-life container problems can be easily resolved with std::vector
and std::map
. With that in mind, to what extent is it useful to search for the “optimal” container every single time?
The semantics behind each container is linked to how you use it. When you see deque
you’ll think “insert and erase at the front and the back”, when you see queue
you’ll think “insert at the front and delete at the back” and when you see list
you’ll think “insert and erase in the middle”.
We see each container for how we use them more than how it is efficient to use them (which is linked but not equivalent). But there are a lot of operations that are available in several containers. For instance: you can easily insert and erase values at the end of a vector
, as much as in deque
(each of them has the push_back
and pop_back
member functions).
But may want to say “But“ inserting and deleting at the end of a vector is strongly inefficient!”. In theory, yes. But in practice, it only matters if it is within a critical section of the code. Most of the time, if you are within the 80% of Pareto’s principle, using a vector instead of a deque won’t affect performance.
So let me ask you this: if using a vector works, and if it doesn’t affect performance, why would you use any other container?
Vectors are the most understandable structure because it is quite close to the plain-old arrays. Most C++ users aren’t experts, and std::vector
is the container they know how to use best. We shouldn’t make mundane code any more difficult to read and understand.
Of course, as soon as you have special needs, you should use the most appropriate container, but that doesn’t happen very often.
(I took std::vector
as an example here for sequential containers, but the same applies for std::map
and associative containers)
Optimization?
Optimization must be an opt-in behavior, not an up-front behavior.
When you first write a piece of code, you should not think about performance. You should write it in the clearest way possible.
Only then, you may think of the impact your code has on global performances. You may do benchmarks, and static/dynamic analysis to know if your code needs be optimized and what is badly optimized (execution time? memory usage? etc.).
If the code you are writing is not in the 20% of Pareto’s principle, you should not think about optimization at all. If it is, you can think about it after you have written it.
A more efficient way to look at it
I present you with a flowchart cheat sheet (in the same spirit as Joe Gibson’s flowchart) that summarize what has been said in the previous section:
The principle is simple. You have to ask yourself two questions:
- Can I do what I want to do with a
vector
/map
? - Am I in a critical section of the code?
If your answers are 1) Yes and 2) No, then you should use std::vector
or std::map
.
Wrapping up
It is more than useful to know the differences between the C++ containers. However, in your everyday life, try not to think too much about it. It is a fairly common mistake to overthink situations that can have simple solutions. But this knowledge is nor wasted: every now and then, you’ll into a specific situation that’ll require specific containers.
Pareto’s principle can also apply to this: more than 80% of the tools we know are useful in less than 20% of the situations.
Thanks for reading and see you next week!
Author: Chloé Lourseyre
Editor: Peter Fordham
Addenda
Sources and resources
The flowchart is from the following GitHub repo, where you can also find a more detailed view of each container: cpp-cheat-sheet/Data Structures and Algorithms.md at master · gibsjose/cpp-cheat-sheet · GitHub.
You will also find a neat cheat sheet, a bit more graphical and dense (along with other kinds of cheat sheet) here: C++ Cheat Sheets & Infographics | hacking C++ (hackingcpp.com).
Joe Gibson’s data structure selection flowchart
Notes
- Really, we can agree that
std::unordered_map
is best thanmap
for basic key-value association. But in real life, I see too many indiscirminate use ofmap
and almost nounordered_map
. - Plus, It’s not just the global size that matters, it is also the local maximum size, e.g. if a container will never exceed 20 entries in a critical section of the code we can pre-allocate locally to avoid unexpected reallocation, while maintaining the flexibility of using a variable size data structure.
- Of course, each container can perform most listed operation. But the matrices summarize which containers are best suited for each need.
- Unfortunatly, the presented flowchart lack any
unordered_
associative container. But you can think of it like this: “Values need to be ordered? Yes ->map
/set
; No ->unordered_map
/unordered_set
“
No mention of cache friendliness?