Print unique values 6 ways

I participate in a couple of programming/C++ help forums on discord and the like. They’re fun, helping people is great and I also learn a lot. Often someone asks for a fairly simple algo like “Given a container of unsorted integers print only unique ones to the console.” There are some obvious solutions to this but in the spirit of inspiring others to think through the problem we often provide the asker with a convoluted solution in the hopes that that will help lead them to the basic ideas. Ideas like “I could store every new int I see in another container and then I could check for ones already seen with each element” or “What if I sorted the container….”
Which led me to come up with a variety of possible solutions to the problem using different things in the standard library and different techniques, I’ll discuss them briefly, show some code and link a benchmark at the end.

Obvious solution, use another vector of ints to hold values you’ve already seen and then only print new values, I’m going to use std::for_each for brevity but this is straightforward to write with a ranged based for or a raw loop. This solution is simple but it does take an extra allocation of unknown size that will reallocate several times if the original vector of ints is big so…

std::vector<int> seenVec;
std::for_each(vec.begin(), vec.end(),[&](auto& e){
    if(std::find(seenVec.begin(), seenVec.end(), e) == seenVec.end())
    {
        std::cout << e << ' ';
        seenVec.push_back(e);
    }
});

What if we do the same idea but in place? Basically we search the range behind the current element in the vector, no extra allocations, so that’s good. We use a static variable assuming this code will only be called once in the lifetime of the program, we’ll change that for the benchmark though…

std::for_each(vec.begin(), vec.end(),[&](auto& e){
        static auto end = vec.begin();
        if(std::find(vec.begin(), end, e) == end++)
        {
            std::cout << e << ' ';
        }
    });

Ok, that’s fine and good and does what we want but we’re doing an awful lot of searching through subranges of the vector or through another vector, what if we sorted? Then we can just output whenever the element changes…

std::sort(vec.begin(), vec.end());
for(const auto& e : vec)
{
    static auto compare = std::next(vec.begin(), 1);
    if(e != *(compare++))
    {
        std::cout << e << ' ';
    }
}

Nice, that works but it sorts the output which may not be what we want, do we have other standard library tools that can help us? What about a set? They only accept unique values after all, but they do sort. However there is an overload for std::set::insert that returns a std::pair of an iterator and a bool that indicates if the insertion succeeded, let’s try that.
[Note: there is also std::unordered_set but this was more fun :)]

std::set<int> seen;
for(const auto& e : vec)
{
    auto[vecIt, notSeen] = seen.insert(e);
    if(notSeen)
    {            
        std::cout << *vecIt << ' ';
    }   
}

that’s cool, and we can use a regular set constructor if we want it to be sorted like this

std::set<int> seenTwo {vec.begin(), vec.end()};
for(const auto& e: seenTwo)
    std::cout << e << ' ';

This also has the benefit of brevity, though I’m loathe to leave out the {} even on a single statement loop but hey, it’s slideware! (blogware?)
The set solutions are cool but they do require an additional allocation.

However there is one other standard library facility that also does not require any additional allocations, the aptly named std::unique . When this algo is applied to a sorted container it moves all the duplicates to the end and returns an iterator to the end of a range of sorted unique items. Just what we need! The only downfall is that is does mutate the original container so you may have to copy it first which is another allocation….

std::sort(vec.begin(), vec.end());
auto end = std::unique(vec.begin(), vec.end());
vec.erase(end, vec.end());
// you can erase the duplicate items at the end or use the
// end iterator to constrain the range when you print
for(const auto& e: vec)
    std::cout << e << ' ';

Alright, to the benchmarks, I’ll post a link at the bottom. I used a vector of 10000 randomly generated ints in the range 1-100 to make sure there was a lot of repetition and a stringstream to mimic standard out.
Predictably the for_each solutions fared the worst, surprisingly the one with extra allocation was faster than the one searching in place.
The set solutions were faster than the for_each solutions with a wider distribution but fared much worse when I decreased the range of random ints to create more duplicates.
sort and iterate came out on top or tied with the std::unique solution depending on the iteration of the benchmark.

So in closing the set solution that doesn’t sort is useful if you don’t want the output to be sorted or if you don’t want to mutate the original vector and sort and iterate and std::unique are the fastest.

https://quick-bench.com/q/jLgMY6qkOaAdjS-kVoRJ4eF4ehw