Update keys of map or set with node extract

Question | Oct 8, 2020 | hkumar

c++ map set splice pointer move-semantics c++17

ir-md-block-image ir-md-block-image-w-70 map or set node extraction

Immutable keys

Associative containers, e.g., map and set, are node-based containers that store their elements in dynamically allocated nodes linked together based on keys. The keys of associative containers in STL are immutable (const) because changing a key can alter a container's structure:

std::map<int, std::string> m = {{3,"Red"},{7,"Green"},{9,"Blue"}};
auto mit = m.find(3);
if(mit != m.end())
 mit->first = 2; //!ERROR. key is const


std::set<int> s = {1,5,4};
auto sit = s.find(5);
if(sit != s.end())
 *sit = 2; //!ERROR. key is const

Therefore, there is no way to directly modify a node's key in an associative container. The standard workaround for updating a key is to erase and reinsert the element, which could be very expensive because of the deallocation and reallocation of objects.

However, the C++17 standard introduced the extraction and insertion of nodes, making it possible to unlink a container's node, update its contents, and reinsert it in the same or another compatible container. We talk about this more in the next section.

Note that this discussion is relevant to all associative containers of STL, ordered (e.g., map), unordered (e.g., unordered_set), and non-unique (e.g., multimap).

Node extraction and insertion

C++17 added the extract() member functions to all associative containers that unlink a node and return a node handle for a given key or position (iterator):

//source: cppreference
node_type extract(const_iterator position);
node_type extract(const key_type& x);

A node handle is an instance of a container-specific opaque type — node_type — that encapsulates a pointer to the extracted node and a copy of the container's allocator. If no element exists for a given key, extract returns an empty node handle. An example is in order:

std::map<int, std::string> m = {{1, "Foo"}};

std::cout << m.size() << std::endl; //1

//extract and check if node handle is not empty before use
if(auto nh = m.extract(1); !nh.empty()) {
 std::cout << nh.key()
           << "=>"
           << nh.mapped() << std::endl; //1=>Foo
}//node and the contained element are destroyed here

//map is empty now  
std::cout << m.size() << std::endl; //0

A node handle is move-only, and, it appropriately destroys the node on destruction. Also, note that an extracted node can outlive the container it is separated from through its node handle. Here is a contrived example of a node factory:

//A map<int, std::string> node factory
auto generateNode(int k, const std::string& v) {
 return (std::map<int, std::string> {{k, v}}).extract(k);
}

void useNode() {
 auto nh = generateNode(2, "Bar");
 std::cout << nh.key()
           << "=>"
           << nh.mapped() << std::endl; //2=>Bar
}

We can insert an extracted node back to the same container or possibly a different container through the insert() member function:

 //source: cppreference
insert_return_type insert(node_type&& nh);

On successful insertion, the node handle is moved and becomes empty. The insertion fails if the key already exists and the node handle is left intact in that case. As shown above, the return type of insert — insert_return_type — conveys if the insertion was successful or not and the position of the newly inserted or the existing element.

A node handle extracted from a container can be inserted into a different container also as long as the two containers have the same element types and allocator types. This process is generally known as splicing. The two containers can have different comparators. Few examples of splicing:

//Ascending order
std::map<int, std::string> m1 = {{1, "One"}, {2, "Two"}, {3, "Three"}};
//Descending order
std::map<int, std::string, std::greater<>> m2 = {{4, "Four"}, {5, "Five"}};

//Print both maps
for(auto [k, v] : m1)
 std::cout << v << " "; //One Two Three

for(auto [k, v] : m2)
 std::cout << v << " "; //Five Four

//extract from m1 and insert to m2
m2.insert(m1.extract(3));

//get another node from the above node factory
m2.insert(generateNode(6, "Six"));

//Now print m2
for(auto [k, v] : m2)
 std::cout << v << " "; //Six Five Four Three

Note that insert does nothing if an empty node handle is passed to it, therefore, it is perfectly fine to directly pass a temporary returned by extract to insert.

Another interesting point about splicing is that a node separated from a unique container can be inserted into its multi-counterpart. E.g., a set's node can be inserted into a multiset, as follows:

std::set<int> singles = {1,2,3};
std::multiset<int> multies = {3,4,4};

multies.insert(singles.extract(3));

std::for_each(multies.begin(), multies.end(), [](auto& v){
 std::cout << v << " ";
}); //3 3 4 4

Both extraction and insertion of nodes involve no copy or move of the elements. Only the container's internal pointers are adjusted. However, both of these operations can result in a rebalance of the container.

Update keys

Note that a node handle provides non-const access to the key via the key() member function. A map node handle also gives access to the mapped value via the mapped() function.

Therefore, we can change the key of an extracted node and insert it back into the container, as follows:

std::map<int, std::string> m = {{3,"Red"},{7,"Green"},{9,"Blue"}};

for(auto [k,v] : m)
 std::cout << v << " "; //Red Green Blue

auto nh = m.extract(3);
if(!nh.empty()) {
 nh.key() = 8;
 m.insert(std::move(nh)); //watch std::move, because nh is lvalue
}

//Order is changed now    
for(auto [k,v] : m)
 std::cout << v << " "; //Green Red Blue

The node extraction and insertion do not cause any copy/move and only relink the pointers. Therefore, updating the keys via extraction and insertion of nodes could have significantly better performance than erasing and inserting the elements. However, the real performance earnings depend on the elements' size and the frequency of the updates.

A convincing example

Consider a hypothetical bidding or auction system that receives a large number of bids every second. An auction takes place hourly, and the highest bid wins.

The bids are stored in a map — bidsByPrice — keyed by the bid price in descending order. Therefore, the highest bid is first:

struct Bid {
 int id;
 double price;
 //other stuff...
};

std::map<double, std::shared_ptr<Bid>, std::greater<>> bidsByPrice;

//returns shared_ptr<Bid>
auto getWinningBid() {
 return !bidsByPrice.empty() ?
            bidsByPrice.begin()->second :
            nullptr;
}

This is a simple auction system, so bids with duplicate prices are not allowed. But users can correct or change their bid prices anytime before the next auction happens. To make the bid price correction more efficient, we update the keys by node extraction/insertion.

Also, to find the bids quickly, we store bids in another map — bidsById — keyed by their identifiers. We are using shared_ptr<Bid> so that it easier to model the shared ownership of a Bid object in the two maps:

std::map<int, std::shared_ptr<Bid>> bidsById;

So, a bid correction is done as:

//Changes the key in the bidsByPrice map.
//called by processCorrection
void correctBidPrice(double oldPrice, double newPrice) {
 auto nh = bidsByPrice.extract(oldPrice);
 nh.key() = newPrice;
 bidsByPrice.insert(std::move(nh));
}

//finds bid by id and processes correction
void processCorrection(int bidId, double newPrice) {
 //Find a bid by id
 auto it = bidsById.find(bidId);
 if(it == bidsById.end())
    return; //bid not found

 //Found bid. Do correction
 double oldPrice = it->second->price;
 it->second->price = newPrice;

 correctBidPrice(oldPrice, newPrice);
}

However, there is a problem in the correctBidPrice function above. Identify the problem and select the correct solution from the choices below (check Explanations for details).

Splicing with merge

Node extraction was born out of the need to splice all elements of an associative container into another efficiently. We can splice a container by extracting and inserting all elements in a loop, but that would be inefficient because of potentially multiple rebalancing operations.

Therefore, the associative containers also have merge() member function that can splice each element of a source to a destination container using node extraction. No elements are lost in merge. If a key already exists in the destination container, that element is not extracted from the source container:

std::set<int> s1 = {1,2,3};
std::set<int> s2 = {3,4,5};

s2.merge(s1);

//3 could not spliced
for(auto v : s1)
    std::cout << v << " "; //3

for(auto v : s2)
    std::cout << v << " "; //1 2 3 4 5