Friday, 29 July 2011

Removing selected vector elements whilst iterating through it

Have you ever tried to iterate through vector and remove some elements? For example, you need to remove all odd numbers from a collection stored in STL vector. You might have been tempted to write something like this:



When execution tries to move iterator to the next position after one element has been erased Debugger brings up Debug Assertion window:

dbgassertfailed1

Who does _Has_container() belong to and why did this assertion fail?

In a very simplified way, we can say that iterator contains two pointers: one that points to container it iterates through (this pointer is named as _Mycont in the STL code) and another one (_Myptr) which points to the current position (element) in the container.

When some element is removed from the vector, contiguity of its storage must be retained so all subsequent elements are reallocated in memory (shifted towards the empty position) invalidating any iterator pointing to them.

If you check what C++ standard says about this, you will be surpised: ISO/IEC 14882:2003 states (in paragraph 23.2.4.3) that vector::erase "Invalidates all the iterators and references after the point of the erase." But hey, the only iterator we were using is the one that points at the location of erased element, not AFTER it! So our program should be working, isn't it? The problem is that current C++ standard incorrectly describes effect of vector::erase. Draft of the new standard (N3225) is more accurate about this matter and says that vector::erase "invalidates iterators and references at or after the point of the erase."

Ok, so now we know that our iterator has been invalidated. A look at the _Mycont value shows 0 which means that this iterator is not associated with any container.

How can we resolve this situation? We need to look at vector::erase not as an enemy but as a good friend! C++ standard says: "The iterator returned from a.erase(q) points to the element immediately following q prior to the element being erased. If no such element exists, a.end() is returned." So, why don't we just assign our iterator to this value and keep iterating happily?
Following code achieves what we want:



Output:

Initial vector elements:
v[0] = 1
v[1] = 2
v[2] = 3
v[3] = 4
v[4] = 5
v[5] = 6
v[6] = 7
v[7] = 8
v[8] = 9
v[9] = 10

Vector elements after removing odd values:
v[0] = 2
v[1] = 4
v[2] = 6
v[3] = 8
v[4] = 10

I found here and here similar discussion about this topic.

You need to be aware of the fact that after erasing the last element in the vector iterator points to the vector end (a position past the last element), in which case you cannot dereference it, so make sure you check its value if you are going to use it before jumping back to the for loop condition check:



It is always a good idea to reuse ready-made routines which are generic and tested. Dig into 〈algorithm〉 header when solving STL-related problems. In our case, remove_if draws our attention for its name. Basically, this function iterates through original set and shifts towards beginning all elements that don't meet condition by copying them over all elements that meet the condition. Return value is iterator that points to the next element behind this set of 'survived' elements. Range passed to this function still has the same size but all 'survivors' are grouped at its beginning. Elements behind this group become irrelevant (unspecified, to say it accurately) and can be deleted. That is so-called Erase-Remove idiom:



Why does iterator get invalidated after erasing an element?

Erasing an element leaves a gap in the memory and because it is guaranteed that vector keeps all its elements in contiguous memory (like an array) it will shift remaining elements in order to fill that gap. Address of the element that has been deleted is stored inside iterator and that address is now not a valid one. That's why is iterator invalidated completely. Luckily, vector::erase() returns a new valid iterator that points to the next element after the deleted one (or to the vector end) and we can use it safely.
Post a Comment