Use STL containers! ...unless you want to go faster
We all know to use the STL whenever possible, not to reinvent the wheel and to move forward as a community.
However, a few shortcomings of it and delays in the integration process are often reasons to roll your own containers/algorithms (mozilla::vector
, eastl::list
, etc). We're going to see a real-life example of this.
In this case we have an 2D image (matrix of values) that we need to serialize into a long 1D vector (put the rows in a line).
Each row is contiguous but they aren't necessarily contiguous to each other.
Mind you, in this real-life example the image is huge and we'll need maximum performance, we only wish to pay for what we need.
We'll use the following implementation of image
:
const size_t width = 2000000, height = 5; // Small sizes for example sake
std::vector<std::vector<double>> image (height);
image[0].resize(width);
image[1].resize(width);
image[2].resize(width);
image[3].resize(width);
image[4].resize(width);
std::iota(image[0].begin(), image[0].end(), 1); // Giving non-trivial values
std::iota(image[1].begin(), image[1].end(), 1);
std::iota(image[2].begin(), image[2].end(), 1);
std::iota(image[3].begin(), image[3].end(), 1);
std::iota(image[4].begin(), image[4].end(), 1);
We're using a contiguous std::vector
for the image but that's just to simplify the access to the rows (subscript operator []
) in this example.
So we want to create an std::vector
and fill it with the values!
But that's where we get stuck: every option incurs a downside...
std::vector<double> res1 (image.size()*width);
for (size_t i = 0; i < image.size(); ++i)
std::copy(image[i].begin(), image[i].end(), res1.begin()+i*width);
That example can nicely be parallelized, but the creation of the vector (res1(image.size()*width);
) initializes its elements, which is a costly operation when the image gets really large.
Moving on to the famous "reserve & push_back" solution:
std::vector<double> res2;
res2.reserve(width*height);
for (size_t i = 0; i < height; ++i)
for (size_t j = 0; j < width; ++j)
res2.push_back(image[i][j]);
Two downsides here: push_back
is not parallelizable and each one costs a size check/increment at each step. Same goes for the assign
solution.
We are limited by the fact that for more security, the size
member of std::vector
has no public setter, so we cannot "reserve, set size & copy data", which is essentially what we want to do.
So, here's an idea to have a fast copy:
template <typename T>
class serializedImage
{
public:
serializedImage(const std::vector<std::vector<T>>& vectors) :
data (nullptr),
width (vectors[0].size()),
height (vectors.size()),
size (vectors[0].size()*vectors.size())
{
data = new T[size]; // 'new T[size]()' would initialize
#pragma omp parallel for
for (size_t i = 0; i < height; ++i)
std::copy(vectors[i].begin(), vectors[i].end(), data+i*width);
}
~serializedImage() { delete[] data; data = nullptr; }
private:
T* data; size_t width; size_t height; size_t size;
};
For readability sake, the code has been reduced to its bare minimum, without the security checks (empty input, operators, namespace, etc).
So here we are! From a large series of big vectors we have a way to efficiently align them, with no superfluous cost and in parallel fashion. And, we respect RAII with a very simple construction!
serializedImage<double> res3 (image);
Source: Giphy
By the way, I tried to make the ctor a Fold Expression that uses a variable number of input vectors (like size = (vectors.size()+...);
) but didn't manage so far, if anyone can it'd be fun to see.
Related links: