Tag Archives: c++11

Suffix Tree: Suffix Array and LCP Array, C++

I played a bit more with my C++ Suffix Tree experiment, gradually exploring how Suffix Trees are equivalent to a Suffix Array and its associated LCP Array, allowing us to do anything with Suffix Arrays (and LCP arrays – Longest Common Prefix) that we can do with Suffix Trees, but with less memory (but only by a constant factor) and, I think, more cache-friendly use of memory. This code is still far from suitable for any real use.

Converting between Suffix Trees and Suffix Arrays

I first moved my previous code for Ukkonen’s algorithm from a branch into master, so we can now construct a SuffixTree from a string, in O(n) linear time. Subsequent insert()s still use the naive O(n^2) code. Traditionally, you’d concatenate all your strings with unique separator characters, but I’m trying to avoid that. And, really, I’m just trying to explore the use of the relevant algorithms – it’s a bonus if I can wrap it up into something useful.

I then added a get_suffix_array_and_lcp_array() method to get the Suffix Array and associated LCP Array, in O(n) linear time. This works by doing a lexicographically-ordered depth first search.

Finally, I added a SuffixTree constructor that takes such a Suffix Array and LCP Array to build the suffix tree, again in O(n) linear time. This iterates over the suffixes in the Suffix Array adding a node for each, walking back up the tree to split at the point specified by the LCP array. I kept the path in a stack for that walk up. This constructor let me test that I could use a Suffix Array (and LCP Array) obtained from the SuffixTree to recreate the SuffixTree correctly.

This was based on the ideas I learned from Dan Gusfield’s Fundamentals of Stringology lectures, specifically part 3. Incidentally, his later discussion of the Burrows Wheeler Transform led me to Colt McAnlis’ wonderful Compressor Head series, which far more people should watch.

Towards a usable SuffixArray API.

I’ve already created a SuffixArray class to do much of what the SuffixTree class can do, such as finding matches for a substring. It currently does an O(log(n)) binary search over the suffix array. But this should be possible in O(m) time, where m is the length of the substring, hopefully with far less string comparisons, just as it is possible with a Suffix Array. This can apparently be done with the help of the LCP Array. Hopefully I’ll get to that.

I also want to implement Kärkkäinen & Sanders DC3 algorithm to construct the Suffix Array directly from the string in O(n) linear time. And, if I understand Dan Gusfield’s lecture properly, we can use the Burrows-Wheeler transform to construct the associated LCP Array also in linear time. Hopefully I’ll get to that too.

gtkmm 4 started

We (the gtkmm developers) have started work on an ABI-breaking gtkmm-4.0, as well as an ABI-breaking glibmm, target GTK+ 4, and letting us clean up some cruft that has gathered over the years. These install in parallel with the existing gtkmm-3.0 and glibmm-2.4 APIs/ABIs.

A couple of days ago I released first versions of glibmm (2.51.1) and gtkmm (3.89.1), as well as accompanying pangomm and atkmm releases.

This also lets us use my rewrite of libsigc++ for libsigc++-3.0, bringing us more fully into the world of “modern C++”. We might use this opportunity to make other fundamental changes, so now is the time to make suggestions.

We did a parallel-installing ABI-breaking glibmm even though glib isn’t doing that. That’s because, now that GTK+ is forcing us to do it for gtkmm, this seems like as good a time as any to do it for glibmm too. It’s generally harder to maintain C++ ABIs than C ABIs, largely because C++ is just more complicated and its types are more exactly specified. For instance we’d never have been able to switch to libsigc++-3.0 without breaking ABI.

 

Suffix Tree, Ukkonen, C++

I’ve been playing with suffix trees recently, in particular trying to get my head around Ukkonen’s algorithm for suffix tree construction. I’ve got the hang of it now after trying out some code and checking it against this excellent video walk through of Ukkonen’s algorithm by Tushar Roy.

Tries, Radix Trees, Suffix Trees

A suffix tree stores every suffix of a string, or of several strings, and makes it easy to find prefixes of those suffixes, in linear O(m) time, where m is the length of the substring. So it lets you find substrings in O(m) time. So it can be used for full text search, and it’s often used for DNA comparison.

A suffix tree is actually a radix tree (also known as a radix trie), with suffixes inserted instead of just the whole string. A radix tree  is itself a compacted trie (also known as a prefix tree).

A trie (pronounced try) is a tree with characters on the edges, letting you discover the whole string as you traverse the edges, staring with the edge from the root for the first character of the string. For instance, this lets you offer text completion if the user gives you the first few correct characters of a word.

A radix tree compacts the Trie by letting each edge have several characters, splitting the edge only when necessary to distinguish different strings. So, while a Trie containing “abc” would have a chain of root->a->b->c, a Radix Tree would have just root->abc. Adding “abd” would change that to root->ab->c, with another for root->ab->d.

A suffix tree, or radix tree, can save space by storing just the start and end indices of its strings instead of the strings themselves. Then each edge has constant size, and the space for the tree is linear O(m), where m is the length of one inserted string. Or maybe you could say its linear O(nk), for n strings of length k, where k is m considered effectively constant.

For the string “banana” a suffix tree would look like this, with both leaf nodes and intermediate end markers marked with *.

root
    banana*
    a*
     na*
       na*
    na*
      na*

We can add a second string in the same suffix tree, associating values with the leaf at the end of each path, letting us know which items have a particular substring. For instance, if I add “banana” (value 0) and “bandana” (value 1) to a suffix tree, I can ask about “ana” and discover that it identifies both items (value 0 and value 1), via the a->n->a path. The tree looks like this:

ban
   ana (0)
   dana (1)
a (0, 1)
 n
  a (0, 1)
   na (0)
  dana (1)
n
 a (0, 1)
  na (0)
 dana (1)
dana (1)

Suffix Tree Construction

The hard part is constructing the suffix tree. When we insert one string of length m, we actually insert m suffixes. If each suffix insertion takes O(m) time, then the whole construction time is at least O(m^2).

Ukkonen’s algorithm reduces this to O(m) by iterating through the string just once from start to end and using the ideas of an implicit “global end”, the “active point” and “suffix links” to jump to wherever the rest of a smaller suffix should be inserted.

Each “phase” deals with one character from the string, in sequence, performing “extensions” within that “phase” to add suffixes that start with that character. This is useful because of the repeating sub-structure of suffix trees – each sub-tree can appear again as part of a smaller suffix.

Global End

Whenever the algorithm adds a node, that’s going to be a leaf node, meaning the edge to the leaf node will have the actual end of the inserted string. So the algorithm assigns a global end to the edge. We increment that end index each time we process the next character, so these edges get longer automatically.

Active point

The active point identifies a point on an edge in the tree by specifying a node, an edge from that node (by specifying a first character), and a length along that edge. Within every “extension”, the algorithm starts looking for the next character at the active point. Sometimes it will find the next character only because an edge got longer automatically when we incremented our global end.

Whenever it finds the next character from that active point, maybe even after stepping across an intermediate node,  it sets that newly-found position as the active point. This is called Rule 3.

Whenever it doesn’t find the next character from that active point, it adds a node, splitting the edge if necessary. This is called Rule 2. Whenever it splits an edge and creates an internal node, it changes the active point to the same character in a smaller edge, and looks again for the same next character. This lets it do the same split in the smaller suffix. It keeps doing this until all the smaller suffixes have been split in the same way.

This example, with the string “banana”, shows the use of the global end and of Rule 3 and Rule 2. It doesn’t show the use of suffix links yet, because it always splits edges from the root. Note that Ukkonen’s algorithm typically appends a special out-of-alphabet “$” character to the string so it can insert leaf nodes.

b: Look from root. Add to root.

b

a: Look from root. Add to root.

b
ba (because we incremented the global end).

n: Look from root. Add to root.

ban (because we incremented the global end).
an (because we incremented the global end).
n

a: Look from root. Find it (“ana”). Set it as the active point.

bana (because we incremented the global end).
ana (because we incremented the global end).
na

n: Look from previously-found “a”, in “_a_na”. Find it. Set it as the active point.

banan (because we incremented the global end).
anan (because we incremented the global end).
nan

a: Look from previously-found “n” in “a_n_an”. Find it. Set it as the active point.

banana (because we incremented the global end).
anana (because we incremented the global end).
nana

$: Look from previously-found “a” in “an_a_na”. Not found, so add an internal node and add the $. Change the active point to same character in a smaller (suffix) edge: “a” in “n_ana”.

banana$ (because we incremented the global end).
ana
   na$ (because we incremented the global end).
   $
nana$

$: Look from “a” in “n_a_na$”. Not found, so add an internal node and the $.
Change the active point to same character in a smaller (suffix) edge: “a” in “_a_na$”.

banana$
ana
   na$
   $
na
  na$
  $

$: Look from “a” in “_a_na$”. Not found, so add an internal node and the $. Change the active point to root.

banana$
a
 na
   na$
   $
 $
na
  na$
  $

$: Look from root.  Add to root.

 banana$
 a
  na
    na$
    $
  $
 na
   na$
   $
 $

Suffix Links

In Rule 2, when the algorithm splits an edge, adding an internal node, if sets a suffix link for that internal node, pointing to root. If it has previously set a suffix link for any other internal node, while looking for the same character (in the same “phase”), it updates the link for that previously-created internal node to point to the new node.

If the active node is not root after adding the internal node, it also changes the active node via the active node’s suffix link, taking us to a previously-created internal node. As before, this lets us jump to the smaller suffix of the same substring, to make the same split there, but suffix links let us do it even when we are not on an edge from root.

We can see this when using “banbadbans” as an example. We reach this point:

a: Look from “n” in “ba->_n_badban$”. Not found, so add an internal node and add the $. Follow the suffix link from the “n” to the “n” in “a->_n_badban$”.

ba
  n
   badban$
   $
  dban$
a
 nbadban$
 dban$
nbadban$
dban$

a: Look from “n” in “a->_n_badban$”. Not found, so add an internal node and add the $. Follow the suffix link from the “n” to the “n” in “_n_badban$”.

ba
  n
   badban$
   $
  dban$
a
 n
  badban$
  $
 dban$
nbadban$
dban$

a: Look from “n” in “_n_badban$”. Not found, so add an internal node and add the $. Follow the suffix link from the “n” to root.

ba
  n
   badban$
   $
  dban$
a
 n
  badban$
  $
 dban$
n
 badban$
 $
dban$

a: Look from root. Not found. Add to root.

ba
  n
   badban$
   $
  dban$
a
 n
  badban$
  $
 dban$
n
 badban$
 $
dban$
$

Example Code

Here is some example C++ code. It’s also in github, with far more comments.

const auto key_start = key.start_;
const auto key_end = str_end(key);

ActivePoint active;
active.node = &root_;
active.edge_valid = false; // Instead of -1.
active.length = 0;

std::size_t remaining = 0;
auto end_ptr = std::make_shared<KeyIterator>(key_start);
KeyIterator& end = *end_ptr;

// The "phases"
for (auto i = key_start; i != key_end; ++i) {
  ++remaining;
  ++end;

  Node* prev_created_internal_node = nullptr;

  // The "extensions".
  while(remaining) {
    const auto edge_match = (active.edge_valid && active.length) ?
      find_partial_edge(active, i) :
      find_partial_edge(active.node, i);
    const auto edge = edge_match.edge_;
    const auto part_len_used = edge_match.edge_part_used_;

    if (!edge_match.char_found_) {
      KeyInternal prefix(i, end_ptr);

      // Rule 2 extension: There is no match:
      if (part_len_used == 0) {
        active.node->append_node(prefix, value);
      } else {
        auto extra_node = edge->split(part_len_used);
        extra_node->append_node(prefix, value);
        extra_node->suffix_link_ = &root_;

        if (prev_created_internal_node) {
          prev_created_internal_node->suffix_link_ = extra_node;
        }
        prev_created_internal_node = extra_node;

        if (active.node != &root_) {
          active.node = active.node->suffix_link_;
        } else {
          --active.length;
          ++active.edge;
        }
      }

      --remaining;
      continue;
    }

    // Rule 3 extension:
    active.node = edge_match.parent_node_;
    active.edge = edge->part_.start_;
    active.edge_valid = true;
    active.length = part_len_used;

    break;
  }
}

Generic Suffix Tree in C++

However, the basic Ukkonen’s algorithm just stores one string. Before trying Ukkonen’s algorithm, I wrote a templated C++ suffix tree that can contain multiple strings (see the test code), with a simple O(m^2), or possibly O(m^3) construction time.

I created a version that uses Ukkonen’s algorithm, reducing construction to O(m) time, but it doesn’t support multiple inserts. In this implementation, I’ve avoided the need for extra “$” leaf nodes. I’ve also stored the suffix links in a hash table, to avoid wasting space per node even after construction, though that replaces a true constant-time read/write with an amortized constant time hash table insert/lookup.

I’m trying to adapt the Ukkonen-based suffix tree to support multiple inserts. Usually, to create such a “generalized” suffix tree with Ukkonen’s algorithm, you concatenate the strings together, separated by unique out-of-alphabet characters (“$”, “#”, etc), insert the whole thing, then prune the paths you don’t need, but I’d like to avoid that.

I’m not suggesting that anyone use this code for serious work. For now, it’s littered with asserts() and debug output and it’s in need of optimization because it’s generalized for any standard C++ container, such as std::string. For instance, it doesn’t seem wise to store std::string::iterators, instead of pointers, for the start/end ranges on edges. I’m also bothered by the use of std::shared_pointer<>s to share the “global end”, which still take space per edge even after construction.

Alternative Algorithms

Update: I’ve just watched Erik Demaine’s 2012 MIT “Strings” lecture about suffix trees (I needed to watch his previous “Static Strings” lecture first). He doesn’t mention Ukkonen’s algorithm, but does mention a “DC3 (Difference Cover 3)” algorithm by Kärkkäinen-Sanders-Burkhardt (at least the lecture’s student notes call it that) for building suffix arrays in linear time. He shows that a suffix tree can then be constructed from a  suffix array by using the LCP (Longest Common Prefix) of the suffix array to build a cartesian tree, again in linear time. So the whole suffix array construction would be linear time. Suffix arrays themselves anyway offer better data locality than suffix trees and this DC3 algorithm apparently allows parallelization. I hope I find time to explore all that sometime in code.

He also mentions the use of a “suffix tray” to store the edges in each node, to get  O(P  + lg(Σ))) query time, much the same as the O(P) time you’d get with an array, while reducing the overall space complexity to O(T) instead of O(TΣ ). Here P is the size of the query string, T is the size of the text being searched, and Σ is the size of the alphabet. He also mentioned instead using a combination of hash tables and van Emde Boas trees to get O(P + lglg(Σ)) query time. Those parts of the lecture felt quite rushed to me, so I’d like to investigate that sometime too if I find time.

Boost Graph Library and C++ >17

I’ve recently been looking at the Boost Graph Library (BGL), by reading through the excellent BGL book and playing with the BGL examples, which are mostly from the book.  Although it  was written 15 years ago, in 2001, it doesn’t feel dated because it uses techniques being actively discussed now for C++17 or later. And there’s a clear lineage from then to now.

For instance, I’ve been watching the charming Programming Conversations videos by Alexander Stepanov, who brought generic programming to C++ by designing the STL. At some early point in the videos, he mentioned the idea of concepts, which he expected to be in C++17, and showed how he mimicked the simpler concepts syntax (without the checking) with some #defines. Unfortunately, that proposal, by Andrew Sutton, has recently been rejected for C++17, though it seems likely to succeed after C++17. Andrew Sutton demonstrates the proposed syntax wonderfully in this C++Now 2015 Keynote video. He has implemented C++ concepts in gcc 6, and I’ve played with it very superficially for libsigc++.

I’ve written and maintained lots of C++ template code and I’ve never liked how each template makes only implicit requirements of its types. I would feel more comfortable if I could express those requirements in a concept and have the compiler tell me if my template’s types don’t “model” the concept. Eventually compiler errors will then mention problems at that higher level instead of spewing details about exactly how compilation failed. And eventually, C++ might allow checking of semantics as well as just syntax. I can even imagine tools that would analyze template code and suggest that it should require certain concepts for its types, a little like how the latest compilers can suggest that I use the override keyword on virtual method overloads. This means more checking at compile time, and that makes me happy. However, I understand why it would need multiple compilers to implement it before it would be accepted into C++17.

Anyway, I started reading that BGL book and immediately noticed the foreword by the same Alexandar Stepanov, which mentions generic programming ideas such as concepts. The BGL uses concepts, though with minimal checking, and the book uses these to show the structure of the API. Furthermore, as I tried to get some simple changes into the BGL, I noticed that the same Andrew Sutton had been a maintainer of the BGL.

I began playing with the BGL by converting its example code to modern C++, replacing as many of those verbose traits typedefs and awkward tie() calls, with the auto keyword and range-based for loops. The result looks far clearer to me, letting us see how the API could be improved further. For instance, the BGL’s use of generic free-standing functions can seem a little unconstrained to people used to knowing exactly what method they can call on what object, particularly as the BGL puts everything in the boost namespace instead of boost::graph). But Bjarne Stroustrup’s Unified Call proposal (apparently rejected for C++17 too) would improve that. For instance, num_vertices(graph), could be written as graph.num_vertices() and Concepts would let the compiler know if that should be allowed.

So, though the BGL source code seems to have had very little attention over the last 15 years, and now looks almost abandoned, it’s clearly been an inspiration for the most current trends in C++, such as Concepts and Unified Calling. All the work on C++11 and C++14 has drained the swamp so much that these old ideas are now more obviously necessary.

libsigc++ 2.99.5: Even less code

I managed to rip out another chunk of slightly-incomprehensible code from libsigc++, and those improvements are in libsigc++-3.0’s 2.99.5 version.

Now there’s no common functor_base base class and I removed all those interdependent result_type typedefs that were scattered throughout the code. When I had unravelled the whole result_type chain, I could replace it with just a handful of decltype(auto) uses, realising that that’s what the whole mess was trying to achieve. I love decltype(auto).

libsigc++: std::function-style syntax.

Yesterday I released version 2.99.2 of libsigc++-3.0. This changes the declaration syntax for sigc::slot and sigc::signal to use the same style as std::function, which was added in C++11. We don’t want to be arbitrarily and unnecessarily different.

We’ve also simplified sigc::mem_fun() to always take a reference, instead of either a reference or a pointer.

So, where you might do this for libsigc++-2.0:

sigc::slot<void, int, A> slot = sigc::mem_fun(this, &Thing::on_something);
signal<void, int, A> m_signal;

You would now do this for libsigc++-3.0.

sigc::slot<void(int, A)> slot = sigc::mem_fun(*this, &Thing::on_something);
signal<void(int, A)> m_signal;

Actually, as of version 2.9.1 of libsigc++-2.0, you can use both syntaxes with libsigc++-2.0, allowing you some time to adapt to the new API before one day moving to libsigc++-3.0.

Playing with C++ Variadic Templates and Dynamic Programming

Dynamic Programming C++ base classes

Over the last few months I’ve been exploring various dynamic programming (DP) algorithms, trying to get a feel for how to choose suitable sub-problems and the dimensions of those sub-problems, and when a top-down or bottom-up strategy is best. In principle, dynamic programming lets you break a problem down into suitable sub-problems, as long as they obey the “principle of optimality“, meaning you can use some broad state (result) of the sub-problem calculation, without needing the actual specifics of each solution.

I’ve also been playing lots with variadic templates in modern C++.

My little murrayc-dp-algorithms project is the result. It has base classes for both bottom-up DP and top-down DP and several examples that use them. The end result is only slightly useful, and is currently far from optimal for performance. But I still think it’s an interesting exercise that shows how these various DP algorithms really relate to each other. It should also explain how I came up with those examples in my earlier blog post.

Top-Down DP

Let’s start with top-down dynamic-programming, because it’s generally simpler. Calculating Fibonacci numbers is the classic example. This is clearly inefficient because every recursive call recalculates almost the same lesser Fibonacci numbers, twice. This is exponential, with time complexity of roughly O(2^n).

int f(int i) {
  if (n == 0)
    return 0;
  else if (n == 1)
    return 1;

  return f(i - 1) + f(i -2);
}

One simple way to avoid the wasteful recalculation is to cache the results and reuse them. That’s generally called Memoization. For Fibonacci number calculation, this is still not the best way, but it’s a useful example.

DpTopDownBase provides a framework for dynamic programming with memoization. You provide a calc_subproblem() implementation, which should use get_subproblem(). That get_subproblem() will return a previously-calculated result if it exists, or recursively call calc_subproblem().

1-Dimensional Top-Down DP

For instance, the top-down DpFibonacci class is declared like so, deriving from DpTopDownBase:

class DpFibonacci
  : public murraycdp::DpTopDownBase< unsigned long, // subproblem type unsigned int // i > {
public:
  explicit DpFibonacci(unsigned int n)
  : n_(n) {}

This specifies that its calc_subproblem() (and the base class’ get_subproblem())will take an unsigned int and return an unsigned long:

unsigned long
calc_subproblem(
  type_base::type_level level,
  unsigned int i) const override {
  // Base cases:
  if (i == 0) {
    return 0;
  } else if (i == 1) {
    return 1;
  }

  return get_subproblem(level, i - 1)
    + get_subproblem(level, i - 2);
}

The level parameter is just for optional debug output, to show how deep we are in the recursive call stack.

It also implements get_goal_cell() so that the base class knows which value of n to use to get the result f(n). This just returns the n provided to DpFibonacci’s constructor.

We can then instantiate the class and call calc():

DpFibonacci dp(n);
auto result = dp.calc();

The base DpTopDownBase::calc() method calls get_goal_cell() and calls calc_subproblem() for that cell. Because our calc_subproblem() implementation uses get_subproblem(), it will automatically avoid recalculating already-calculated sub-problems.

That DpTopDownBase::calc() method is a fairly simple example of the usefulness of C++ variadic template parameters to call one variadic method with the result of another variadic method, using tuples and std::experimental::apply(). And it’s type-safe at compile time all the way.

2-Dimensional Top-Down DP

But DpTopDownBase isn’t just good for a 1-dimensional cache, based on one parameter. The wonder of C++ variadic templates lets us specify multiple parameters for calc_subproblem and get_goal_cell(). Those overridden methods take just the right parameter types and just the right number of parameters.

And this is where top-down DP can be useful, because it might not be necessary to calculate all possible sub-problems for all possible values of these parameters.

For instance, in the Knapsack Problem, we have a knapsack with a certain maximum weight capacity, and we have several items of various weights and values. We want to find the set of items to put in the knapsack that maximises the value. We give each item an index number.

In the classic Knapsack DP algorithm, we then have 2 dimensions (or parameters) for each sub-problem: A count of items and a maximum weight. So if we have 5 items and total capacity of 100Kg, we can calculate the max value, f(5, 100Kg) by using other sub-problems such as f(4, 80Kg) along the way. It’s not likely that we’ll need to calculate all 5 * 100 possible subproblems, though it is likely that we’ll reuse some sub-problem calculations.

For instance, the top-down DpKnapsack class derives from the base DpTopDownBase like so:

class DpKnapsack
: public murraycdp::DpTopDownBase<
    SubSolution,
    SubSolution::type_vec_items::size_type,
    Item::type_weight
  > {

Again it overrides calc_subproblem() and overrides get_goal_cell(). This time it takes 2 parameters instead of just i:

SubSolution
calc_subproblem(
  type_level level,
  type_size items_count,
  type_weight weight_capacity) const override {

And it can call get_subproblem() with those 2 parameters, letting the base class worry about how to cache the sub-problems and when to recalculate them. (This currently uses a rather inefficient generic hashing function for std::tuple<>s. )

Notice that we return a custom SubSolution class for the sub-problem result instead of just a number. This lets us store both the maximum value for that sub-problem and the items used in that sub-problem solution. That helps us to reconstruct the complete set of items for the sub-solutions along the way. However, that is inefficient because we won’t need all sub-solution item sets . It should instead be possible to write some generic code to  reconstruct the solution by working backwards from the result of the goal. I’ll try that out sometime.

Again we can just instantiate the class and call calc():

//Items with their values and weights:
DpKnapsack::type_vec_items items{
  {13, 15}, {12, 14}, {2, 13}, {10, 12},
  {1, 11}, {7, 10}, {6, 9}, {5, 8},
  {11, 7}, {14, 6}, {9, 5}, {4, 4}, {3, 3},
  {22, 2}, {8, 1}};
DpKnapsack::type_value weight_capacity = 45;

DpKnapsack dp(items, weight_capacity);
const auto result = dp.calc();
std::cout << "solution: value: " << result.value << std::endl
  << "with items: ";
print_vec(result.solution);
std::cout << std::endl;

The top-down DpContextFreeGrammarParser , top-down DpMakeChange , top-down DpEditDistance , top-down DpParenthesization do much the same thing. The top-down DpTsp uses 3 dimensions and overrides calc() to choose the minimum result over all the sub-problems at the top level instead of just letting the default base calc() use the result of one goal sub-problem as the overall result.

I think it’s nice to separate the real heart of each DP algorithm from the code that does the boring caching and checking.

Bottom-Up DP

In bottom-up DP, we calculate every single possible sub-problem result, starting from our base case. Even if we won’t need every single possible result, this is often more efficient because we sometimes know when we can discard whole sets of sub-problems. For instance, looking again at the classic Fibonacci Numbers example, we only really need the last 2 sub-problem results:

int f(int n) {
  if (n == 0)
    return 0;
  else if (n == 1)
    return 1;

  int result = 0;
  int prev1 = 1; // f(1) = 0;
  int prev2 = 0; // f(0) = 0;
  for (int i = 1; i < n; ++i) {
    result = (prev1 + prev2);
    prev2 = prev1;
    prev1 = result;
  }

  return result;
}

So we don’t need to cache all i results along the way. For a multi-dimensional DP algorithm, that can save us a lot of space.

1-Dimensional Bottom-Up DP

The bottom-up DpFibonacci class derives from DpBottomUpBase, specifying:

  • the number of sub-problems to keep along the way: 2 to keep f(n-1) and f(n-2).
  • the type of each sub-problem’s (and the final) calculation result: long for the result of f(5), for instance.
  • the type of the input: int for 5, for instance, in f(5).
class DpFibonacci
: public murraycdp::DpBottomUpBase<
    2, // count of sub-problems to keep
    unsigned long, // sub problem type
    unsigned int // i
  > {

DpFibonacci again overrides calc_subproblem() and get_goal_cell(), with exactly the same implementation as the top-down version.

The base class’ calc() method then just iterates from 0 to n, calling the virtual calc_subproblem() method and then returning the result for the goal cell, based on calling the virtual get_goal_cell() method.

DpBottomUpBase keeps just the right number of previously-calculated sub-problems, while reusing the discarded sub-problem storage, by using my (funky) circular_vector.

DpTripleStep is very similar, though it needs to keep the 4 previous sub-problems instead of 2, and of course it does a different calculation per sub-problem.

2-Dimensional Bottom-Up DP

The bottom-up DpKnapsack class is again much like the top-down DpKnapsack implementation, but it again specifies that it only needs the previous 2 sets of sub-problems.

class DpKnapsack
  : public murraycdp::DpBottomUpBase<
    2, // count of sub-problems to keep.
    SubSolution,
    SubSolution::type_vec_items::size_type,
    Item::type_weight
  > {

Where the bottom-up DpFibonacci passed one count to the base class’ constructor, the bottom-up DpKnapsack passes two, one for each calc_subproblem() parameter:

DpKnapsack(
  const type_vec_items& items,
  type_weight weight_capacity)
  : DpBottomUpBase(
      items.size() + 1,
      weight_capacity + 1),
    items_(items),
    weight_capacity_(weight_capacity) {}

The base DpBottomUpBase class uses these constructor parameters as ends of ranges, assuming that each range starts at 0. It will then iterate over the first range, iterating over the second range each time, like a nested for loop:

for (int items_count = 0; items_count < MAX_ITEMS; ++items_count) {
  for (int weight_capacity = 0; weight_capacity < MAX_WEIGHT_CAPACITY; ++ weight_capacity) {
    ...
}

There is no way in C++ to nest an arbitrary number of for loops. But we can use variadic templates to declare an arbitrarily-nested vector. And thanks to my funky variadic templates for vectors of vector (of vectors of vectors, etc), the base class can generically call calc_subproblem() as if it had nested for loops, regardless of how much nesting there should be. My for_vector_of_vectors() takes the (nested) vector, and a series of start and end parameters for each level and eventually uses std::experimental::apply() to call the functor (a lambda, in this case) for each combination. To get that series of start and end parameters, it uses my funky tuple_interlace() to alternate starts (0) with ends.

The bottom-up DpEditDistance and bottom-up DpSubstringMatching do much the same thing.

Problems with DpBottomUpBase

The bottom-up base class has some serious limitations:

It only lets us discard older sub-problems in the first dimension, such as the Knapsack sub-problems for items less than n-2. But you might understand enough about your algorithm to know other sets of sub-problems that can be discarded at various times, based on less simple criteria. This would make the best use of space while avoiding recalculation. Maybe some recalculation would be OK as long as you could minimize it.

Also, it is limited to a simple iteration, using a range from start to end, using a simple operator++ to move to the next iteration. But sometimes you want to do something a little more complicated than an iterating for loop. For instance, my top-down DpTsp class finds a solution for the Travelling Salesman Problem by manipulating sub-sets in its calc_subproblem(). I could use gosper’s hack to create a suitable n-choose-k subset iterator, but it still needs to iterate over a further set of subsets missing each item from the parent subset, and I suppose that would be doable with another custom iterator. But the start and end iterators at each level would then have to be instantiated based on the values of the parent level, and cannot just be supplied before the whole calc() starts. Maybe I could do something with lambda functions to let these iterators be instantiated along the way.

C++ Core Guidelines: Ownership and Parameters

I’ve been catching up on the wonderful CppCon talks from 2014 and 2015. It looks like we are getting towards a consensus about how to write modern C++, though I guess we are still early in the process,  and still coming up with as many questions as answers.

When this all settles down, it might need Scott Meyers to come out of retirement from C++ to combine and update his Effective C++ books. Maybe he’d do it if a really modern C++ emerged from it all.

Talks, Guidelines, Smartpointers and Ownership

Some of the most important talks focus on when pointers, or smartpointers, have “ownership”. Three talks in particular progress towards a rather utopian vision in which (shared) object ownership is clearer in various parts of your code and in which there are no object lifetime bugs:

  • CppCon 2014: Herb Sutter “Back to the Basics! Essentials of Modern C++ Style”  (slides)
    Herb discussed some best practices, particularly around auto and smartpointers.
  • CppCon 2015: Bjarne Stroustrup: “Writing Good C++14” (slides)
    Bjarne discussed how we can agree on guidelines that static analysis tools can use, and how following the rules can let static analysis tools find more bugs. For example, by using types to indicate whether a (smart)pointer is owning. As in Herb’s 2014 talk, he suggested that we should get() the raw pointer out of a shared_ptr<> before passing it to a function that doesn’t really want to share ownership. He also introduced the GSL, which has owned<> for when you can’t use shared_ptr<> or unique_ptr<>.
  • CppCon 2015: Herb Sutter: “Better C++14 by Default” (slides)
    The next day, Herb discussed many more ways that static analysis tools could avoid several classes of bugs, and how we can break the rules down into sets called profiles. He suggested some annotations, via the C++ generalized attributes syntax, already in C++11, which could let the tools find even more problems, but showed that we can make good assumptions by default. He showed a Microsoft static analysis tool doing this already.

I like the idea of tools telling me what to do. C++ is more complicated than ever, though also better than ever, and there’s more scope than ever for people to disagree about how to use it. At some point I like to move on and get stuff done. I’m sceptical that we can ever really make our code as safe as Bjarne and Herb suggest, but I’m convinced that we could catch several new categories of problems before runtime, and I want to try.

C++ Core Guidelines, Smartpointers, and Ownership

The C++ Core Guidelines are currently a bit of a muddle. There are many rules that seem to overlap too much, often with insubstantial examples. Too often there are admonitions about what not to do rather than examples of how to do things properly. Trying to comply with one rule will sometimes violate another rule, with no clues about resolving the conflict. The github issues and commits show that conflicts are being clarified, but the whole thing seems so big and spread so thinly that I guess it will be a long time until it feels consistent and cohesive.

The advice often seems to assume that you are writing application code rather than reusable library code, though large real world systems are rarely just one or the other. Therefore there is little consideration of ABI stability. For instance, it suggests “F.7: For general use, take T* or T& arguments rather than smart pointers” and “I.11: Never transfer ownership by a raw pointer (T*)” (and more overlapping advice) but what happens when you need to change the implementation and need to share ownership of the passed instance? You can’t just change a T* to a std::shared_ptr<T> without making existing applications crash. The document suggests GSL’s owned<> in this case, though I guess it will still affect shared library symbol names for methods, and it won’t actually do any reference counting. So in a library where you know that instances get passed around a lot, and there’s a reasonable chance that ownership will be shared, it doesn’t seem unreasonable to default to std::shared_ptr<> even if that obscures the actual ownership in some parts of the code.

Furthermore, when looking at other peoples’ code, how do we distinguish the foo(const T*) method that probably doesn’t keep a copy of the pointer, from all the foo(const T*) methods that might do anything at all because their authors never heard of the guidelines?

Raw pointers as non-owners

As a start, I’m playing with this idea of not overusing smartpointers and letting the lack of a smartpointer indicate non-involvement in the ownership. For instance:

std::shared_ptr<Something> something = get_something();

// This takes a shared_ptr:
do_something_and_keep_it(something); 

// This doesn't care about ownership:
do_something_and_dont_keep_it(something.get());

I’m ready to try this out in an application such as Glom, but fearful of doing it in a library API, such as gtkmm.

Bear in mind that, In the past,

  • I have been one of the maintainers that made glibmm and gtkmm use Glib::RefPtr<>, an intrusive reference-counting smartpointer, for all our wrapper classes of reference-counted glib/GTK+ objects (not widgets). For instance, see the doxygen graph of glibmm/giomm classes deriving from Glib::ObjectBase, and there are many in gtkmm too (not the widgets).
  • I converted most of my Glom codebase to use all of its data structure classes by shared_ptr (a custom shared_ptr<> before C++11) rather than trying to decide when they can be deleted.

These have worked out very well, partly because “only use this via this smartpointer” is a nice simple rule to follow. I’ve resisted letting anyone easily get the underlying pointer out of a RefPtr, to preserve this simplicity. Now that there’s this new (at least to me) consensus, I guess we’ll actually add RefPtr::get() and RefPtr::operator*() soon.

But I still hesitate, because I don’t expect the typical developer to be careful enough until there are tools to stop misuse and I don’t expect the typical developer to use the tools even when they exist. So the price of a better experience for experts may be more ways for non-experts to make mistakes. The gtkmm API has tried to be a pure and modern C++ while also being pragmatic and obvious, but it will be hard to keep the balance in this case.

Trying out raw pointers as non-owners in Glom

I tried this out in an “ownership” branch of Glom. I first looked at some static (standalone) functions that took a std::shared_ptr<> because those were unlikely to store any state between calls. Here is the fairly small patch that changed those parameters.

I then looked at some other functions that took a shared pointer to const (std::shared_ptr<const Something>), suggesting that the implementation just wanted to look at it and then forget it. Changing one function’s parameter forces you to look at the calling functions, where you might find that the shared_ptr<> is itself a parameter that should be changed to a non-owning raw pointer. I followed this logic up the call stack for a while and ended up with a larger patch. I forget which function I tried to change first.

I like that this reduces the amount of code where I need to worry about circular references, which I would break with a std::weak_ptr<>. But I still worry enough to wish there was some way to automatically identify where these might happen.

I’m still a little uncomfortable with this mix of shared_ptr<> and raw pointers, even in purely application code, so I’m not ready to put this in git master just yet. But I think I’ll come around to the idea.

std::unique_ptr parameters to take ownership

The C++ Core Guidelines, suggest “R.32: Take a unique_ptr<widget> parameter to express that a function assumes ownership of a widget” though they currently tell you to avoid std::move() (update: fixed now) which you’d need when calling such a method with a named variable. They don’t mean  gtkmm Widget, but I do.

Watch Herb Sutter’s 2014 talk for the rationale for taking std::unique_ptr<>, but not all types, by value. It’s also mentioned in the C++ Core Guidelines “F.15: Prefer simple and conventional ways of passing information” section.

For instance:

void
SomeClass::take_something(std::unique_ptr<Something> something) {
...
}

...
auto something =
  std::make_unique<Something>("something", 2, 'a');
something->set_extra_foos(foo1, foo2);
bar.take_something(std::move(something));

// Or without the intermediate variable:
bar.take_something(
  std::make_unique<Something>("something", 2, 'a'));

// Or again without the intermediate variable:
bar.take_something(
  create_something("something", 2, 'a'));

Trying out std::unique_ptr<> parameters to take ownership in gtkmm

gtkmm has Gtk::manage(), which roughly means “let the container widget worry about deleting this child widget later”. For instance, you can new a Gtk::Button and call Gtk::manage() on that button before you add() it to a Gtk::Grid. It works well and it’s very simple. Incidentally, all it really does is enable the default behaviour for GtkWidgets – the only clever stuff about Gtk::manage() is how we change the default for gtkmm.

Passing a std::unique_ptr<> to add() could have much the same meaning, and would have the advantage of using standard C++ API and a known convention. I have a patch in bugzilla that does this and there’s been some discussion on gtkmm-list.

However, the result is sometimes hard to love. You have to move the add() to later in your code, to avoid a use after std::move(). And, for some of our API, you really do need to use the pointer right after you’ve add()ed it to a container. So you need to take a temporary “observing” (non-owning) raw pointer before the add().

Currently, the worst part of the gtkmm API for this, which has never felt
quite right, though we can blame GTK+, is this:

auto cell = std::make_unique<Gtk::CellRendererText>();
cell->property_style() = Pango::STYLE_ITALIC

auto cell_nonowning = cell.get(); // yuck
auto column = std::make_unique<Gtk::TreeViewColumn>("something", std::move(cell));
column->add_attribute(cell_nonowning->property_text(), columns.title);
column->add_attribute(cell_nonowning->property_style_set(), columns.italic);

treeview.append_column(std::move(column));

I guess we’ll add the add(std::unique_ptr<>) overloads to allow this, but not deprecate Gtk::manage() until we feel more comfortable with how they make us rearrange our code.

C++ tuple utils

I’ve recently been been writing lots of modern C++ code with variadic templates. For instance , I’ve been trying to make libsigc++ use variadic templates instead of being a mess of generated code.

I often find myself needing utility functions and type traits to manipulate tuples, but the C++ standard library still only offers std::tuple_cat(). Writing these is awkward and that often stops me from experimenting quickly.

So I’m gradually gathering this code together in a little murrayc-tuple-utils library. I’d gladly change the name if this gets any use. Really, I’m surprised that nothing like this seems to exist already, apart from as part of larger projects such as boost::hana and boost::fusion. But boost is a really awkward dependency and those are larger projects with much grander goals.

So far murrayc-tuple-utils has:

  • tuple_cdr(): Removes the first element.
  • tuple_start<N>(): Takes the first N elements.
  • tuple_end<N>(): Takes the last N elements.
  • tuple_subset<pos, len>: Takes len elements, starting at pos.
  • tuple_interlace<T1, T2>: Takes elements from each tuple, interlacing (or zipping) them together.

For each of them, there are also type traits, such as tuple_type_cdr<>::type, though these are not so necessary now that C++14 has decltype(auto) for return types.

These are just enough code to make things work enough for me when I’m in a rush. I’m sure they can be improved, and maybe this is how to get those patches and pull requests.

The project has a complete autotools build structure, with “make check” tests, Doxygen documentation building, a pkg-config .pc file, etc, so you can try it out, improve it, and add to it, without having to mess around with that stuff.

Modern C++: Variadic template parameters and tuples

Variadic Templates

C++11 lets us define variadic templates, taking any amount of parameters, of any type, instead of just a specific number of parameters. Then, when we use the template, we can not only specify the types, we can specify an arbitrary amount of different types. For instance,

template <class... T_values>
class Base {
public
  virtual void something(T_values... values) = 0;
};

class Derived1 : public Base<int, short, double> {
public:
  void something(int a, short b, double c) override;
};

class Derived2 : public Base<std::string, char> {
public:
  void something(std::string a, char b) override;
};

This is useful in extremely generic code such as libsigc++. We are gradually using this in libsigc++ and in glibmm, along with template aliases, to replace code that was previously generated by perl and python scripts to produce multiple versions of each C++ template, with various numbers of parameters. In the meantime, I’ve been playing with variadic templates in a separate experimental project and this is what I’ve learned along the way.

Parameter Packs

The “class… T_values” there is called a template parameter pack. If you are just templating a function then it’s called a function parameter pack:

class Thing {
public:
  template <class... T_values>
  void something(T_values... values) = 0;
};

Expanding a Parameter Pack

The “T… values” in that method signature is us expanding (or unpacking) the parameter pack in a function parameter list. You can use the … in various ways in the function parameter list. For instance, to receive the types as const references:

template <class... T_values>
class Thing {
public:
  void something(const T_values&... values) = 0;
};

In your template, you can then call another method with that parameter pack, by expanding (or unpacking) it into the function argument list. For instance, with “values…”:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    something_else(values...);
  };

You can write more complex patterns to change how the parameter pack is expanded. For instance:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    something_else((values + 1)...);
  };

  void other_thing(T_values... values) {
    something_else(const_cast<T>(values)...);
  };

Storing a Parameter Pack in a std::tuple

However, if you want to keep the parameter values around and use them at some later time, you’ll need to store them in a std::tuple. I think this is why std::tuple exists. For instance:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    tuple = std::tuple<T_values...>(values...);
  }

private:
  std::tuple<T_values..> tuple_;
};

Expanding a std::tuple

Then you have another problem. You probably want to call some method with those values. But now you have a tuple instead of a parameter pack. Trick with std::index_sequence<> and a helper method lets you call a normal method (that takes normal parameters), passing a tuple that holds those parameter values:

template <class... T_values>
class Thing {
public:
  void something(T_values... values) {
    tuple_ = std::tuple<T_values...>(values...);
  }

  void do_something_with_values() {
     call_yadda_with_tuple(tuple_,
      std::index_sequence_for<T_value...>())
  }

  void yadda(T... values);

private:
  //The helper method.
  template<std::size_t... Is>
  void call_yadda_with_tuple(const std::tuple<T_values...>& tuple,
    std::index_sequence<Is...>) {
    yadda(std::get<Is>(tuple)...);
  }

  std::tuple<T_values...> tuple_;
};

Unfortunately, this does clutter up your code. I haven’t yet managed to write a simple generic call_function_with_tuple(f, tuple) helper method. I hope it is possible.

Parameter packs are part of the C++ language. std::tuple<> and std::index_sequence<> are part of the C++ standard library, apparently added specifically for use with parameter packs. I can see the sense in keeping the language as simple as possible, as long as you can provide what you need via library code in that language. But the end result is not very attractive in this case. Luckily, hopefully, this isn’t something you’ll need to use much anyway.

At this point, any reader who already doesn’t like C++’s complexity will like it even less. Showing them a related thousand-line g++ compilation error should turn them away for good (clang++’s compilation errors are much clearer). But if you really like compile-time type-safety, and if you really like to avoid copy/pasted code, you might like that this is at all possible.

It would be understandable for coding guidelines to discourage the use of variadic templates except in special cases, until people are more familiar with them.

Manipulating Tuples

Of course, you might need to call methods with just some of the parameters from the parameter pack, or some combination of parameter packs. Once you have the parameters in a std::tuple, you can manipulate that tuple with some more template cleverness. For instance:

  • std::tuple_cat() concatenates two tuples into one.
  • But I recently needed a tuple_cdr() to remove the first item from the tuple, leaving me the rest.
  • A tuple_car() would just do std::get(tuple) to get the first time, so nobody bothers to write one.
  • I even needed a tuple_interlace() to interlace two tuples together, turning a, b and c, d into a, c, b, d.

It all starts to feel very lispy. Luckily there are lots of people on StackOverflow who enjoy discussing how to implement these. I feel there should be more functions like std::tuple_cat() in the standard C++ library, or maybe in some open source library.

Once you have your new tuple, you’ll probably need to use std::make_index_sequence<> instead of std::index_sequence_for<>, to call your call_*_with_tuple() helper method, like so:

void do_something_more() {
  const auto combined_tuple = std::tuple_cat(tuple1_, tuple2_);
  constexpr auto tuple_size =
      std::tuple_size<decltype(combined_tuple)>::value;
  call_yadda_with_tuple(tuple,
    std::make_index_sequence<tuple_size>());}