Tag Archives: C++

Using GitHub Actions to check builds

November 14, 2019Generalangular, bigoquiz, build, C++, git, GitHub, go, libsigc++murrayc

GitHub Actions

A couple of weeks ago I spent some time adding GitHub Actions to some of my projects. I was lucky enough to get access to the beta, but it has now become available for everyone.

GitHub Actions lets you specify builds, tests, etc, to run on certain triggers, such as after every commit push. I’ve now used it in several projects, each using different build environments. It has worked very well, and has really improved the developer experience, and therefore improved the code, in a few of my projects. It has encouraged me to use pull requests even for changes to niche personal projects so I can benefit fully.

Actually, I’m not sure that Actions is the best name for the system. It really lets you describe Jobs, and each Job might use a few Actions. But maybe that’s just a matter of perspective.

Now that it’s public, the pricing suggests that most open source projects will have enough time for free, for which I’m grateful. Nevertheless, history shows that it wouldn’t be wise to tie your projects too closely to one proprietary system. For instance, that’s another reason not to duplicate your build rules in a GitHub Actions workflow file. You should have a simple build system that you can just call. I generally do that via a simple Makefile, as I mention below.

Here are the GitHub projects that I’ve added GitHub Actions to:

libsigc++

We now have GitHub Actions to build and run tests for libsigc++ with several versions of 3 C++ compilers: 3 versions of gcc, 5 versions of clang, and even 1 version of msvc (thanks to Stuart Dootson). We will also add an action to check our C++ code formatting via clang-format.Â If you’d like every libsigc++ commit to work with some other compiler, please do try to make that happen via a pull request.

This will give us valuable confidence that our changes will work everywhere, and give us valuable clues when they don’t. It’s great to know that people can get this feedback automatically when they submit a PR, without waiting for a human.

I might be abusing GitHub’s generosity with so many builds jobs, but luckily there are not that many commits to this project. I wonder if GitHub hope to fund some of the GitHub Actions costs for open source projects via GitHub Sponsors.

angular-bigoquiz -client (The bigoquiz.com frontend)

I added GitHub Actions to build and run tests for this Angular project. I put the npm/ng commands behind a Makefile to keep the CI rules as simple as possible. The only thing worse than having undocumented or overly-complicated build steps, is repeating those incantations in multiple places.

I like how it uses just one Workflow definition, but runs it with several Node versions, thanks to the strategy.matix specifier, and the with.node-version specifier made available by the actions/setup-node action.

See the Workflow definitions.

go-bigoquiz -server (The bigoquiz.com backend)

I added GitHub Actions to build and run tests for this Go microservice. It only builds for go 1.12, but I think I could parameterise that like the node version for angular-bigoquiz-client.

Again, I put the go commands behind a Makefile to keep the CI rules as simple as possible.

I did some major refactoring of this code, adding tests along the way, and having CI made that fairly painless.

See the Workflow definitions.

Minor Imperfections

I have lots of experience using Atlassian’s competing Bitbucket Pipelines system. GitHub Actions feels very similar, though I still miss a few things:

Can’t trigger builds for individual commits

I often create PRs with many small commits while refactoring old code. 10 commits is not so unusual, and 30 has happened. Good test coverage makes this possible, and inevitably that test coverage shows a problem in one of the commits. When that happens in a BitBucket PR, with BitBucket Pipelines, I like doing a manual binary search, starting a new CI run for individual commits until I see where the problem started. (Actually, I wish that it would automatically do that binary search, like a git bisect.)

Unfortunately, I can’t see any way to start a job for an arbitrary commit, so I can only ever see the result for the latest commit in a PR.

No caching

Doing full builds of projects on every commit is simple but wasteful. Ideally, we could do reliable incremental builds, though that’s hard to get right. But some caching could save time and heat.

For instance, Bitbucket Pipelines lets me cache previous Docker build steps and even gives me a ccache cache to reuse C++ compilation units. Well, the ccache cache is not a real cache – it’s just a dump of files that’s re-generated every couple of weeks, and is stale in the meantime, but it’s better than nothing.

Providing CI build time to companies is obviously meant to be a profitable business. Unfortunately, it makes the incentives a bit confused. For instance, I don’t believe that Bitbucket are eager to help their customers spend less money on build time, so they don’t offer very capable caching. But competing services, such as GitHub Actions, if they get this right, could make them care about losing customers.

Brain, refactored

December 7, 2018Generalandroid, angular, C++, go, ios, java, libsigc++, scalamurrayc

For a long time I was too busy to learn anything much new, while running a company and caring for two small children. But more recently, I had the time to catch up on some technical knowledge and I remembered how much I love pure learning.

This post seems to be a big list of what I’ve learned over the last couple of years. Actually, I meant to write it a year ago, so now it’s also a list of what I need to revise.

C++

I caught up with the C++ renaissance by reading Effective Modern C++. and watching heaps of CppCon, C++Now, and Meeting C++ conference videos, such as CppCon 2014: Herb Sutter’s Back to the Basics: Essentials of Modern C++ Style, (the one with almost always auto).

I exercised my growing knowledge of modern C++ by first using some modern C++ in gtkmm and Glom, and then doing a massive rewrite of libsigc++. I also explored some modern C++ generic programming techniques with tuples and dynamic programming.

Along the way I experimented, in Glom’s code, with some of the ideas from Sean Parent’s talks, such as the “no naked for loops” idea from his “C++ Seasoning” talk from Going Native 2013.

I read the old Boost Graph Library (BGL) book, to tie C++ and graph theory together in my head. I spent lots of time, off and on, trying to clean the BGL code up, and modernize it, but it’s a struggle and I’ve only managed to get some simple stuff accepted so far.

The BGL book is ancient but its ideas about C++ concepts are only just about to become mainstream, hopefully in C++20. I also read Alexander Stepanov’s wonderful From Mathematics to Generic Programming book, which convinced me even more.

I tried to get a simple explicit operator bool check into clang-tidy itself, but that didn’t get far and I chose not to be a pain by being pushy about such an unimportant contribution.

I started to learn about some compiler theory by writing some actual code.

I listened to the complete backlog of CppCast podcast episodes and I now listen to every episode as soon as its published.

Java and Android

I revised and updated my Java skills by reading Effective Java Programming and Programming Android. I regularly listened to the Fragmented and Android Backstage podcasts, but not so much recently. I gained some real-world Android programming experience by creating the Android app for Galaxy Zoo after first trying to create an Android app for my Glom database system. The Galaxy Zoo app was quite popular, but sadly, it’s not working right now – I need to find time to adapt it to their changed backend API.

Java and web

I spent some time with Java as a way to work on backend (distributed) systems, reading the JAX-RS book, the,Spring Microservices in action book, and the Spring in Action book.

I gained some more Java/GWT/AppEngine (and Resty-GWT) experience by creating the Big-O Quiz website. I used bigoquiz.com to take organized notes about the computer science that I learned subsequently. It asks me questions to remind me. But see below about me rewriting bigoquiz.com with Go and Angular.

iOS / Objective-C

I learned iOS (iPhone/iPad) programming by watching all 18 lectures in Stanford’s “Developing iOS 7 Apps for iPhone and iPad” course and then creating the iOS app for Galaxy Zoo. I used Objective-C, but now I have a reason to learn Swift.

Like the Android app, this is currently broken, because the backend server now has a very different API.

Go

I learned Go via the The Go Programming Language book, taking extensive notes from the perspective of a C++ developer. To gain initial experience, I rewrote bigoquiz.com‘s backend with Go, instead of Java, and found the resulting code refreshingly simple.

I now use Go regularly at work for backend microservices and I’m quite content with it. For larger projects I’d still prefer C++, but not all projects need to be large.

Angular

I finally gave up on GWT, long after everyone else realised that it’s barely maintained. So I took the opportunity to rewrite the bigoquiz.com frontend in Angular, using Typescript. Learning it was easy with the excellent Angular tutorial.

I’ve since tried out React too.

Scala

I read the huge Programming in Scala book, because I use Scala on a large project at work. I’m fond of it, but I think its simplest ideas should just be in Java, and the cleverest stuff tends to make code hard to read.

Algorithms and Data Structures

I grew to love computer science by doing Coursera’s/Stanford’s “Algorithms: Design and Analysis” course, part 1 and part 2 by Tim Roughgarden. I then read Cracking The Coding Interview, The Algorithm Design Manual, Algorithms, by Robert Sedgewick, and the first half of the CLRS Introduction to Algorithms book. I worked through the first few lectures of Tim Roughgarden’s “A second course in Algorithms” to cover max-flow and min-cost matching algorithms.

I went through the material 2 or 3 times, from various perspectives, eventually making sure to actually implement each data structure and algorithm.

I also read Jon Bentley’s Programming Pearls book, for some practical advice.

I watched, and re-watched, far too many of Tushar Roy’s algorithm walkthrough videos.

Linux

I already have years of Linux development experience, but I knew I was missing out on some knowledge and habits. So I solidified my Linux knowledge by reading Linux System Programming. (And now, later, I think I should read it again, though I wish there was a newer edition.)

I’m still looking for a really good book about bash scripting and general command line cleverness.

Distributed Systems

I didn’t have practical experience with at-scale server-based systems but I became acquainted with the ideas by reading Google’s Site Reliability Engineering book.

I discovered it later, but Tim Berglund’s 4-hour Distributed Systems in One Lesson series of videos is an excellent and engaging overview of the main themes. If it’s no longer available on YouTube but I think it’s so good that you should buy it.

Mikito Takada’s free Distributed Systems For Fun and Profit book is also an excellent introduction and overview.

I also listened to many old episodes of the Software Engineering Radio podcast and the Software Engineering Daily podcast, which often cover similar topics.

Performance

Even as a regular C and C++ coder, I realized how little I thought about the computer architecture that my code is running on. Various talks by Andrei Alexandrescu and Chandler Carruth really opened my eyes to these performance considerations that seem more relevant in at-scale server code than they were in typical desktop applications or even in embedded code.

For instance, Andrei Alexandrescu’s Code Dive 2015 “Writing Fast Code: part 1” and part 2 talks, as well as his “Optimization Tips – Mo’ Hustle Mo’ Problems” talk at CppCon 2014 and his “Writing Quick Code in C++, Quickly” talk from Going Native 2013.

For instance, Chandler Carruth’s “Understanding Compiler Optimization” talk from Meeting C++ 2015, his “Tuning C++: Benchmarks, and CPUs and Compilers! Oh My!” talk from CppCon 2015, his “Efficiency with Algorithms, Performance with Data Structures” talk from CppCon 2014, and his “Care and Feeding of C++’s Dragons” talk from Going Native 2013. Also his more recent “High Performance Code 201: Hybid Data Structures” from CppCon 2016.

Mike Acton’s famous “Data-Oriented Design and C++” talk, from CppCon 2014, really gets to the point about this too, from more of a C perspective.

Coding Habits

I finally took the time to learn Vim enough to use it as my daily editor, after reading Practical Vim. I also learned to use screen and Tux but I haven’t incorporated either into my daily habits.

Ironically I never found my Linux environment outside of the terminal to be as keyboard friendly as the old pre-X MacOS. So staying in the terminal helped me to keep my hands off the mouse again. I practiced with the Klavaro typing tutor to get some of my good habits back.

I worked through the first 25 tasks from Project Euler and most of the tasks from HackerRank. I spent two intensive months working through every task on InterviewBit (around 260 at the time).

I highly recommend InterviewBit – they have a vast set of coding problems, nicely organised, with helpful test cases and tips for when you get stuck, so you can actually improve. HackerRank is useful too, but it feels less directed and the need to parse stdin input for every task gets annoying.

Along the way, I took a note of each C++ typo or generic algorithm mistake that I made, and started counting how many times I made each mistake, so I could focus on breaking my worst habits.

Management

I read some books about project and people management too, revising agile processes with the Scrum and XP from the Trenches and Kanban and Scrum: Making the most of both free ebooks.

The Manager’s Path maps out the various roles that are popular at software companies today, with suggestions about regular actions that each might perform, but it doesn’t seem useful beyond establishing that common language, and I hope it becomes outdated.

I recently read Measure What Matters, and Radical Focus, about OKRs, and I’m very enthusiastic so far. I like the idea of openness and regularity in businesses, to allow focus and alignment, almost like well-functioning decentralized open source projects that I’ve known.

I also got interested in product management, but I haven’t taken it far yet. The Cracking the Product Manager Interview book was rather superficial. I hope to dig deeper into this topic in future.

Others

I revised my Design Patterns knowledge by reading Head First Design Patterns and I used it to build a Design Patterns quiz.

I learned the Lua programming language and the Moai game-development platform via the Programming in Lua book and the “Developing Mobile Games with Moai SDK” book. This was for an idea I’d had for a kids’ programming app that didn’t seem quite worth the effort in the end.

I learned a bit about Hadoop by reading the “definitive guide” book, but I never got around to using it.

C++17 in libsigc++ : invoke, apply, and constexpr if

April 15, 2018GeneralC++, C++17, gtkmm, libsigc++, murrayc-tuple-utilsmurrayc

I’ve made a fairly minor 2.99.11 release of the unstable libsigc++, just to make use of C++17. Here are some brief descriptions of the C++17 features used in libsigc++ so far:

std::invoke()

std::invoke() invokes a callable object, such as a plain function pointer, pointer to a member method, or a functor, such as a sigc::slot, using one generic syntax. We provide the arguments, and possibly an object intance, in the std::invoke() call. For instance:

std::invoke(function, 1, 2);
std::invoke(method, object, 1, 2);
std::invoke(functor, 1, 2);

By using std::invoke in libsigc++ (and here), I hope that this makes the implementation more generic though it’s not immediately obvious how.

std::apply()

std::apply()Â is a little like std::invoke(), but lets us invoke a callable object with the arguments in a tuple. For instance:

std::apply(function, std::make_tuple(1, 2));
std::apply(method, std::make_tuple(object, 1, 2)); // I think.
std::apply(functor, std::make_tuple(1, 2));

This is useful for libsigc++ template code that uses tuples to concatenate or rearrange sets of variadic parameters. It lets us replace several slightly less generic helper functions that just called aÂ functor with a tuple unpacked as variadic arguments via a providedÂ std::index_sequence. Receiving variadic parameters into a tuple is normal, so this complements that well, letting us use some of those parameters in calls to other functions.

constexpr if

I also used constexpr if to avoid the need for a helper template. Before C++17, templated code could sometimes need to use another helper template to perform different operations depending on the type, sometimes where some operations won’t even compile with the other type. The different operations would then be isolated in different template specializations of the helper template, and the main template could just have one generic call to that helper template.

But constexpr if lets you put it all in the main template. For instance, it let us remove the little with_type<> template, which only existed so we could call a functor for some types, but not for others.

I also updated the tuple-utils from my murrayc-tuple-utils project, including severalÂ uses of constexpr (I also used the new C++17 nested namespace syntax when doing that).

libsigc++ has moved to GitHub

March 14, 2018GeneralC++, git, GitHub, libsigc++murrayc

We have moved libsigc++ from git.gnome.org to github.com. This should make working on it even nicer.

I’ve also moved itsÂ websiteÂ to GitHub Pages rather than Sourceforge, now using the simple Jekyll templating system. But I’ve made only minimal changes, so I’d be glad if someone submitted a pull request to make the website better and prettier.

We probably should have done this sooner, because libsigc++ is in no way GNOME-specific, but GNOME’s git has been very good to us over the years and it’s been hard to trust anything else after Sourceforge fell apart. However, with GNOME’s gradual move to gitlab.gnome.orgÂ (which gtkmm will happily use) we would have to change something anyway. I am a happy (paying) GitHub user and it feels like a natural home.

I’m still rather proud of my modern C++ rewrite of libsigc++Â but I’m sure it could benefit from more use than just by gtkmm 4, which is itself not used much yet. Maybe this move to GitHub will help with that.

A C++ developer looks at Go (the programming language), Part 3: Concurrency

July 4, 2017GeneralC++, concurrency, golang, javamurrayc

This is the third of my posts about looking at Go from the perspective of a C++ developer. I think that will be enough for now. Here are all 3 parts:

Here I mention Go’s built-in concurrency features, letting you write code that makes the best use of one, or more, processors. I am not an expert in concurrency, and I think few people are. I’ve had some really helpful friendly feedback about the previous parts from Go developers, so I hope that continues. I will gladly correct any mistakes that are pointed out to me so the text can become more useful.

However, the mentions of C++ coroutines here might also be particularly useless. Clang and gcc don’t support coroutines in their stable releases yet and it’s hard to find definitive explanations. What I can find is contradictory and far from asÂ simple as Go, probably because the C++ coroutines specification has been in flux for a while. I would very much welcome corrections. Interest in C++ coroutines seems to be heating up now so I guess this will settle down soonish. I think it will then be time for some books to be updated. I hope that the end result is at least as expressive as Go but I am not confident of that so far.

As before, I strongly suggest that you read “The Go Programming Language”, (referred to her as “the book”), from which I’ve learned this stuff. At least at first, the structure of this text is aimed at people familiar with C++ who are interested in Go, but the comparison does break down later.

Goroutines and std::task

Like C++ (since C++11), Go has support for simple concurrency features such as lockable mutexes (which we’ll see later), but it doesn’t let you create raw threads. Instead if offers goroutines.

Goroutines are, at their simplest, functions that can run concurrently with other functions. Two goroutines might even run on different CPUs, meaning that they run truly simultaneously. You can start one just by using the “go” keyword before a normal function call. The function will then execute, without the caller waiting for it to complete.

go doSomethingThatTakesALongTime()
doSomethingElseAtTheSameTime.

While the goroutine does its work, maybe using traditional cooperative multi-threading on the same CPU, maybe on another CPU, the implicit main goroutine can also do work. Update: But careful. As mentioned later, this program won’t wait for the first goroutine to finish if the main goroutine finishes first.

So far, at least, this is a bit like starting a std::task in C++ with std::async() (since C++11):

auto future = std::async(doSomethingThatTakesALongTime)
doSomethingElseAtTheSameTime()
future.wait();

(Update: As James pointed out in a comment, if you don’t keep the returned future, its destructor will run, which will cause a wait, making the call not async.)

Hybrid Threading (m:n Threading)

A std::task in C++ is not guaranteed to start an actual new OS (kernel) thread, unless you pass std::launch::async as an extra argument. It will sometimes, but that’s up to the runtime and the implementation. I think the runtime, in some implementations, might even be able to move tasks to real threads and between threads, but I’m not sure. Scott Meyer’s “Effective Modern C++” has some good discussion of tasks and C++ concurrency in general.

Goroutines are not ambiguous about this. Goroutines don’t map directly to kernel (OS) threads. They use m:n threading, in which user space (the Go runtime, in this case) schedules its own m “threads”, mapping them onto a pool of n kernel threads. This seems to be what is referred to as “Hybrid Threading” in Robert Love’s “Linux System Programming” book – a mixture of 1:1 Threading (just using kernel threads) and N:1 Threading (or User-Level Threading, apparently sometimes called “green threads” or “lightweight threads”.). Presumably there is some very clever scheduling code in Go’s runtime and in the C++ coroutines implementation.

Because Goroutines don’t map directly to real threads, they don’t need the full stack size given to a kernel thread, letting us comfortably use far more goroutines than we would use threads. However, you only get the full benefit of this when the work is happening purely in the Go runtime. The book explains that “Goroutines that are blocked in I/O or other system calls or are calling non-Go functions, do need an OS-thread”. So I guess you still wouldn’t want, for instance, a goroutine per network connection, if you are serving many network clients.

Channels: Getting One Result

In C++, std::async() returns a std::future. This is a way to later wait for the result from the task, or just get the result if it is already ready, having made good use of the original thread’s time in the meantime.

int doSomethingThatTakesALongTime() {
  ...
  return 5;
}
 
auto f = std::async(doSomethingThatTakesALongTime)
doSomethingElseLengthyAtTheSameTime()
auto answer = f.get()

The go keyword doesn’t return a future. (Update: And doesn’t let you use the return value.) ~~You could use the function’s result, but you’d just immediately start waiting for it, defeating the purpose (I wonder why Go allows this at all.)~~

Instead, to provide a result, a Go program would use a Go channel. Go channels are queues of your chosen elements. Channels are a built-in type, so they are generic – they can contain various types. They can be created like so, using the make() built-in function as for other built-in types:

c := make(chan string)

The Go language has special syntax for sending to, or receiving from, channels via the <- and -> operators, though I don’t see what would have been so bad about having send() and receive() methods.

Go channels are “concurrency safe”, so you can pass one to a goroutine, and both goroutines can happily manipulate it. For instance, to get a response to a long-running function, but only after we’ve tried to make good use of the waiting time:

func doSomethingThatTakesALongTime(c chan int) int {
  ...
  c <- 5
}
 
c := make(chan int)
go doSomethingThatTakesALongTime(c)
doOtherStuffInTheMeantime()
answer := <- c

That attempt to receive from the channel (with the <- operator) will block until the channel has something to provide, a bit like how std::future::get() blocks until the result is ready.

It would be fairly easy to create a future type that uses a channel, then you could write an async method that uses the go keyword and returns the future. But I think it would be nice if Go made this simple case simpler by doing this for us. Maybe it does.

I’m also a little uncomfortable with the calling goroutine (the main goroutine here) having to create the channel, repeating the type of that channel (it’s also in the called goroutines function signature). I think it would be nice if there was some syntax that gave us a new instance of the channel (Ready for receiving. See about unidirectional channels later.) that the called goroutine expected to get, while also allowing us to use an existing channel if necessary. That would be hard to get right, of course.

If the channel is empty and has been closed then it will then provide a zero value immediately. Or you can get the extra result to check if the channel is still open or still has values:

answer, ok := <- c
if (!ok) {
  // The channel is empty and has been closed.
}

Goroutines and C++ Coroutines

Goroutines feel a little like C++ coroutines (not yet in C++ as of C++17). I don’t know if their scheduling and stack management is quite as clever. I suspect not, because I keep reading stuff about C++ coroutinesÂ being “stackless”, which I think means that they map directly to kernel threads, while Goroutines are apparently “stackful”.

Goroutines allow cooperative scheduling, via blocking channel sends and receives (see select) and sleeps. So do C++ coroutines, via co_yield, co_return, co_await, futures and generators.

So I’ll mention C++ coroutines later as I show other features of Goroutines and channels. I’d love to know the actual differences, if someone can explain it to me clearly enough that I understand. I really hope the end result is as easy to use as Goroutines.

C++ Coroutines: Getting One Result

For now, let’s look at how a C++ coroutine would let us return a single value, much like we did with std::async() (with a std::future) and with Go’s go keyword (with a channel). We would use a std::future return type for the function’s implementation, use co_await to launch the function, and co_return to return the result into the std::future.

This is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet.

std::future<int> doSomethingThatTakesALongTime() {
  ...
  co_return 5;
}

int main() {
  int answer = co_await doSomethingThatTakesALongTime();
  doOtherStuffInTheMeantime()
  
  // Attempting to get the value blocks until the result is ready.
  // (I'm not sure about this part. Do we need to explicitly wait
  // for the future to have a value?)
  std::cout << "answer: " << answer << '\n';
  return 0;
}

Here the co_await keyword is a little like the go keyword in Go.

Channels: Multiple Results

But channels are a queue of values, not just single values, so you could receive a series of results. Go’s range-based for loop has built-in support for this, as mentioned in section 8.4.2 (Pipelines) in the book. For instance:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan string) {
  for i := 0; i < 10; i++ {
     ...
     c <- response
  }

  c.Close()
}
 
c := make(chan string)
go sendMessagesToOtherSolarSystemsAndGetResponses(c)
doOtherStuffInTheMeantime()
for response := range c {
  fmt.Println(response)
  ...
}

fmt.Println("Not waiting any longer for more responses.")

Notice that, unlike a regular range-based for loop in Go, this gives you the value, not the index (which would be meaningless for a channel). In a regular range-based for loop, you need to get the index and value (maybe using _ for the index) to get the value.

The range-based for loop blocks at each operation, waiting for a new value from the channel, and it will stop automatically when the channel is empty and closed.

As mentioned in section 8.5 (Looping in Parallel), if we return from the main goroutine early, for instance in response to an error, we must make sure to “drain” the channel if we don’t want the other goroutines to be blocked when they try to send, even when using a buffered channel (see below). You could launch a separate goroutine just to do this.

C++ Coroutines: Multiple Results

With C++ coroutines, to return multiple results, we can use co_yield instead of co_return, using a generator instead of a std::future.

Again, this is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet. In particular, I don’t know if there will really be a std::generator in the standard library.

std::generator<std::string> sendMessagesToOtherSolarSystemsAndGetResponses() {
  while (still_waiting) {
    ...
    // "return" one result via the generator.
    co_yield response;
  }

  // TODO: Do we need a co_return somewhere too?
}

int main() {
  auto messages = co_await sendMessagesToOtherSolarSystemsAndGetResponses();

  // The range-based for loop waits for the next message each time.
  // TODO: When does it stop?
  for (auto response : messages) {
    std::cout << response << '\n';
  }

  std::cout << "Not waiting any longer for more responses.\n";
  return 0;
}

So far that makes a generator look like a bit like a Go channel. But Go channels seem to be more versatile, as we will see.

Unidirectional channels

By default channels allow goroutines to both send or receive messages. This is rarely what you want inside a particular goroutine, so you can declare a channel type that is only for sending or only for receiving. You would typically create a normal channel and then the goroutine’s function would declare that it takes a unidirectional channel. A send-only channel is of type chan<-, and a receive only signal is of type <-chan. I do think that syntax is hard to love.

For instance, the function from our previous example really only needs a channel that it can send messages on, not receive them too, so we should change the type of the channel:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan<- string) {
  for i := 0; i < 10; i++ {
    ...
    c <- response
  }

  c.Close()
}

The compiler can then help by preventing a message from being sent on the channel. When we pass the regular channel to the goroutine it will cast implicitly to the unidirectional channel. But we may not cast the unidirectional channel back to a regular channel.

I don’t know if there is some equivalent for this in the world of C++ coroutines and generators. Maybe it is generally unwise to compare C++ generators and Go channels.

Buffered Channels

So far we have created unbuffered channels. When we sent a message on the channel, the sending goroutine blocked until the message was received by the receiving goroutine. And when we tried to receive a message the receiving goroutine blocked until the sending goroutine had put a message in the channel. Therefore, unbuffered channels are also called synchronized channels.

But in our example, it would be better if the sending goroutine could keep adding messages to the channel without waiting for them to be received. We can use a buffered channel instead, specifying the size of the buffer:

c := make(chan string, 100) go sendMessagesToOtherSolarSystemsAndGetResponses(c)

The sending goroutine will then only block when the channel is full, and the receiving goroutine will only block when the channel is empty.

Using an Unbuffered Channel to Signal “done”

Unbuffered channels can be useful to signal that a goroutine has finished. This is mentioned in section 8.4.1 (Unbuffered Channels) of the book.

For instance, you will often want the main goroutine to wait for other goroutines to finish instead of stopping the whole program. By using an extra “done” channel, the second goroutine can signal to the main goroutine that is has finished, by sending an item on the “done” channel, and the main goroutine can block on that by attempting to receive from the “done” channel. For instance, here the goroutine does not return any responses so we need another way to signal that we have finished:

func sendMessagesToOtherSolarSystems(done chan struct{}) {
  for i := 0; i < 10; i++ {
    ...
  }

  done -< struct{}{}
}

done := make(chan struct{})
go sendMessagesToOtherSolarSystems(done)
doOtherStuffInTheMeantime()
<- done // This blocks

However, when the goroutine is already returning results via a channel, I believe it is simper to just wait for the channel to be closed, as we did in the previous example by using a range for over the channel.

The book recommends use of struct{} as the channel element, probably because it has no value. A bool seems more more concise, but you’d need a comment saying that the value (true or false) was meaningless. I guess you could also just do a range for over the channel, waiting for it to close, like when we used a channel to return responses, but without ever sending anything.

I feel that this “done” synchronization technique could have a more explicit syntax. I think that would make the code clearer and would mean that various code that does the same thing would have less uninteresting differences. Sync.WaitGroup (see below) seems to be what I want, but this raw “done” channel technique seems more popular for simple cases.

If we were using C++ coroutines, I’d guess we would do the same thing by waiting for the result from a std::future.

Waiting for Multiple Goroutines to Signal “done”

A variation of the “done” channel technique is useful when waiting for multiple goroutines to finish. If we know how many goroutines we have started, we can use a channel to wait for that many “done” messages. This is mentioned in section 8.5 (Looping in Parallel) of the book, though it doesn’t call it a “done” channel there. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, done chan struct{}) {
  ...

  done -< struct{}{}
}

done := make(chan struct{}
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// Wait for as many done messages as goroutines that we started.
// We could have counted them instead, and then counted down here.
for range : solarSystems {
  <- done
}

As section 8.5 (Looping in Parallel) of the book says, if you wanted to return something useful from each goroutine, even just an error, you would use that instead of a struct{}.

Again, I don’t know how we would do this with C++ coroutines.

sync.WaitGroup

As an alternative to the “done” channel technique when waiting for multiple goroutines to finish, Go provides the sync.WaitGroup concurrency-safeÂ counter. It has Add() and Done() methods and a Wait() method that blocks until the count falls back to zero. This is mentioned at the end of section 8.5 (Looping in Parallel) in the book.

func sendMessageToSolarSystem(coordinates Coordinates, wg sync.WaitGroup) {
  ...

  wg.Done() // Doing it via defer would be safer.
}

var wg sync.WaitGroup
for coord range : solarSystems {
  wg.Add(1)
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

wg.Wait()

Notice that there is no Add() method overload that uses a default of 1, because Go has no function overloads and no default function parameter values.

I don’t know how we’d do the same with C++ coroutines. Even clever and correct use of a std::condition_variable would not be helpful if C++ coroutines do not map directly to real kernel threads (see above).

Using a Channel as a Counting Semaphore

Section 8.8 (Example: Concurrent Directory Traversal) of the book suggests using a buffered channel of a specific size to allocate and release tokens, preventing too many goroutines from doing work at the same time, causing them to block until a token is available. In the example, this limits the use of a constrained resource, preventing the goroutines from opening too many files. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, sema chan struct{}) {
  sema <- struct{}{} // Acquire token.
  defer func() { <- sema }() // Release it later.
  ...
}

// Don't try to send more than 10 messages at a time.
sema := make(chan struct{}, 10)
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// This ignores the need to wait for completion.

I think I would prefer some syntax or API that let me explicitly specify a pool that allows only a specific number of coroutines to be active (or maybe even instantiated). Obviously that would only be useful in simple cases and it’s generally unpleasant to hard-code numbers anyway.

Select: Multiple results, from multiple channels, without blocking

Go has a select/case construct built in to the language just for channels. This is mentioned in section 8.7 (Multiplexing with Select) of the book.

It’s a bit like switch/case. It’s also a bit like the POSIX select() function in C, used to respond to activity on file descriptors, such as network connections. It can wait for activity on one of the specified channels. Or it can repeatedly poll if you supply a default case and put it in a loop.

c1 := make(chan string)
go sendMessagesToExoplanetsViaRadio(c1)
c2 := make(chan string)
go sendMessagesToExoplanetsByBlockingTheSun(c2)
 
for {
  select {
    case response := <- c1:
      fmt.Println("Response to or radio message: %s", response)

    case response := <- c2:
      fmt.Println("Response to or sun message: %s", response)

    default:
      if (too_late) {
        break
      }
      doOtherStuffInTheMeantime()
      // or sleep.
  }
}

fmt.Println("Not waiting any longer for more responses.")

I wonder how C++ could support a similar construct with coroutines and generators.

Canceling a Goroutine by Closing a Channel and Polling

The main goroutine might no longer need the result from a goroutine that it has started. Maybe another goroutine found the answer first, or maybe it was in response to a user request that has been canceled, or a connection that has been disconnected.

(Update: See the comment below, about Contexts, added in Go 1.7.)

Section 8.9 (Cancellation) of the book suggests signaling this by closing a channel, and suggests that the called goroutines should poll for this channel closure via select with a default case (see above). The book calls this channel “done” but I think that confuses this technique with technique that sends a done message over the done channel.

For instance:


// Poll to see if the channel has been closed.
func canceled(canceler chan<- string) bool {
  select {
    case <-canceler:
      return true
    default:
      return false
  }
}
  
func sendMessagesToExoplanetsViaRadio(c chan<- string, canceler chan struct{}) {
   ...

   if canceled(canceller) {
      // Obviously not neat code:
     c <- "Not sent"
   }

   ...
   c <- response
}
  
c1 := make(chan string)
canceler1 := make(chan struct{})
go sendMessagesToExoplanetsViaRadio(c1, canceler1)
c2 := make(chan string)
canceler2 := make(chan struct{})
go sendMessagesToExoplanetsByBlockingTheSun(c2, canceler2)
 
for {
  select {
    case response := <- c1:
      // Cancel the other attempt because we don't need it.
      canceler2.Close();

    case response := <- c2:
      // Cancel the other attempt because we don't need it.
      canceler1.Close();
  }
}

Again, I would like more explicit syntax or API for goroutine cancellation. For me, being able to implement things with channels doesn’t mean that they are the best API for expressing those things.

Communicating Instead of Sharing

Go programs tend to send messages via channels to change (indirectly) shared data and to let goroutines see the latest state of that data. “Do not communicate by sharing memory; instead, share memory by communicating.” is a popular phrase in the Go world.

This allows the use of data to be restricted to a single goroutine, often called a “monitor goroutine”. The locking (or just general lack of races if the Go channel is implemented with lock-free code) is then restricted to the channels, implicitly, with no explicit locking of the data being necessary. There are some examples of this in section 9.1 (Race Conditions) in the book.

It occurs to me that, unlike locking via a mutex, this does not guarantee that all goroutines see the latest state of any shared data. But it does guarantee an order, which is maybe all that usually matters.

Because channels are concurrency-safe, you can even send them as elements of a channel. Section 8.10 (Example: Chat Server) does this so it can use “client” channels as identifications of client goroutines (as well as sending strings over the client channels), keeping track of them via entering and leaving channels and a map of the client channels.

Concurrency-safe containers

We’ve already seen that Go channels are “concurrency-safe”, so you can share them between goroutines without additional locking. ~~So are slices and maps~~.

Update: No, slices and maps are not concurrency safe. I must have misunderstood something that I read somewhere. This makes the rest of this sub-section irrelevant, though hopefully correct.

Interestingly, the Java standard library moved away from making container types thread safe, because (as I understand it) the developer would generally use them as part of a larger custom data structure that would need its own locking (synchronization) anyway. Java now provides some separate containers explicitly for use with concurrency.

Likewise, in the C++ standard library, types only tend to have locking (for instance std::shared_ptr’s “partial internal synchronization“) when it wouldn’t be possible for the application code to do the locking itself (external synchronization). Presumably, Go programs then tend to have more locking that necessary, and presumably this is Go again choosing convenience and safety over raw performance.

However, I would really like to see a concurrency-safe queue in C++, similar to a Go channel. A lock-free concurrent queue seems like exactly the kind of difficult-to-implement, but easy to use, and broadly useful, thing that belongs in the standard library.

No thread-local storage

Because Goroutines don’t map directly to real threads, Go doesn’t try to provide “thread-local storage”. C++ has this, via its thread_local keyword, but I believe this is not useful, or wise, with C++ coroutines either.

Mutexes

Goroutines are a suitable way to co-operatively schedule work, and communication over channels can avoid the need for explicit locking, but you might still need to explicitly prevent concurrent access of shared data. Mutexes do this in both Go and C++, but feel free to skip to the summary if you only cared about Goroutines and channels.

Go’s sync.Mutex is like std::mutex

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.Mutex

func something() {
  m.Lock()
  defer m.Unlock()

  // do something to the data.
}

which corresponds to this in C++ with std::mutex:

std::mutex m;

void something() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data.h
}

I’ve always wished there was some shorter syntax for that lock_guard in C++.

Note that you can use std::unique_lock, instead of std::lock_guard, if yo need to manually lock and unlock.

Go’s sync.RWMutex is like std::shared_mutex with std::shared_lock

A Read/Write mutex (or “shared exclusive lock”) allows multiple threads to read from shared data, while ensuring that they each see the latest state of that shared data, but only allows one thread at a time to write to the shared data.

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.RWMutex

func somethingThatReads() {
  m.RLock()
  defer m.RUnlock()

  // get something from the data.
}

func somethingThatWrites() {
  m.Lock()
  defer m.Unlock()

  // do something to the data
}

which corresponds to this in C++ with std::shared_mutex:

std::shared_mutex m;

void somethingThatReads() {
  std::shared_lock<std::mutex> lock(our_mutex);

  // get something from the data
}

void somethingThatWrites() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data
}

I do find the Go names more obvious here. They seem to be aimed more at how people will actually use these locks in most situations.

No recursive/re-entrant mutex

Go mutexes, like C++’s s std::mutex and std::shared_mutex, are not re-entrant, so if a thread tries to lock a mutex that it has already locked, there will be a deadlock. C++ does provide a re-entrant mutex via std::recursive_mutex, but actually using it is generally considered rather disgusting and a sign that the code should be restructured to avoid it.

As the Go book wisely says. “The purpose of a mutex is to ensure that certain invariants of the shared variables are maintained at critical points during program execution. One of the invariants is â€œno goroutine is accessing the shared variables,â€ but there may be additional invariants specific to the data structures that the mutex guards. When a goroutine acquires a mutex lock, it may assume that the invariants hold. While it holds the lock, it may update the shared variables so that the invariants are temporarily violated. However, when it releases the lock, it must guarantee that order has been restored and the invariants hold once again. Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.”

Summary

Go’s concurrency features are another example of its choosing safety and simplicity over potential raw performance, in an area where standard C++ does not yet offer such simplicity even as an option.

Both support standard mutex locking, in much the same way. But otherwise their strengths go in opposite directions:

C++ lets you write lock-free concurrent code so the compiler, with the help of some hardware features, can arrange for the compiled code to never attempt simultaneous changing of your shared data, while ensuring that different threads see changes when necessary. But that is notoriously difficult to get right. Nevertheless it needs to be available so we can benefit from the work of people who know how to use it.

On the other hand, Go lets you write concurrency code that is easier to get right than when just using standard locks There is probably some small performance cost, but it’s likely to actually work and you’ll probably understand how it works. After all, when concurrency code is wrong (C++ makes it easier to be wrong), performance can suffer as well as safety.

I do have some hope that things will get better for C++ when coroutines arrive and are widely understood, but it looks like they alone might not be enough to match Go’s goroutines ad channels.

A C++ developer looks at Go (the programming language), Part 2: Modularity and Object Orientation

June 28, 2017GeneralC++, golang, javamurrayc

This is the second part of a series. Here are all 3 parts:

In part 1, I looked at the simpler features of Go, such as its general syntax, its basic type system, and its for loop. Here I mention its support for packages and object orientation. As before, I strongly suggest that you read the book to learn about this stuff properly. Again, I welcome any friendly corrections and clarifications.

Overall, I find Go’s syntax for object orientation a bit messy, inconsistent and frequently too implicit, and for most uses I prefer the obviousness of C++’s inheritance hierarchy. But after writing the explanations down, I think I understand it.

I’m purposefully trying not to mention the build system, distribution, or configuration, for now.

Packages

Go code is organized in packages, which are much like Java package names, and a bit like C++ namespaces. The package name is declared at the start of each file:

package foo

And files in other packages should import them like so:

package bar

import (
  "foo"
  "moo"
)

func somefunc() {
  foo.Yadda()
  var a moo.Thing
  ...
}

The package name should match the file’s directory name. This is how the import statements find the packages’ files. You can have multiple files in the same directory that are all part of the same package.

Your “main” package, with your main function, is an exception to this rule. Its directory doesn’t need to be named “main” because you won’t be importing the main package from anywhere.

~~Go doesn’t seem to allow nested packages, unlike C++ (foo::thing::Yadda) and Java (foo.thing.Yadda), though people seem to work around this by creating separate libraries, which seems awkward.~~

I don’t know how people should specify the API of a Go package without providing the implementation. C and C++ have header files for this purpose, separating declaration and implementation.

Structs

You can declare structs in go much as in C. For instance:

type Thing struct {
  // Member fields.
  // Notice the lack of the var keyword.
  a int
  B int // See below about symbol visibility
}

var foo Thing
foo.B = 3

var bar Thing = Thing{3}

var goo *Thing = new(Thing)
goo.B = 5

As usual, I have used the var form to demonstrate the actual type, but you would probably want to use the short := form.

Notice that we can create it as a value or as a pointer (using the built-in new() function), though in Go, unlike C or C++, this does not determine whether its actual memory will be on the stack or the heap. The compiler decides, generally based on whether the memory needs to outlive the function call.

Previously we’ve seen the built-in make() function used to instantiate slices and maps (and we’ll see it in part 3 with channels). make() is only for those built-in types. For our own types, we can use the new() function. I find the distinction a bit messy, but I generally dislike the whole distinction between built-in types and types that can be implemented using the language itself. I like how the C++ standard library is implemented in C++, with very little special support from the language itself when something is added to the library.

Go types often have “constructor” functions (not methods) which you should call to properly instantiate the type, but I don’t think there is any way to enforce correct initialization like a default constructor in C++ or Java. For instance:


type Thing struct {
  a int
  name string
  ...
}

func NewThing() *Thing {
  // 100 is a suitable default value for a in this type:
  f := Thing{100, nil}
  return &f
}

// Notice that different "constructors" must have different names,
// because go doesn't have function or method overloading.
func NewThingWithName(name string) *Thing {
  f := Thing{100, name}
  return &f
}

Embedding Structs

You can anonymously “embed” one struct within an other, like so:

type Person struct {
   Name string
}

type Employee struct {
  Person
  Position string
}

var a Employee
a.Name = "bob"
a.Position = "builder"

This feels a bit like inheritance in C++ and Java~~, but this is just containment. It doesn’t give us any real “is a” meaning and doesn’t give us real polymorphism~~. For instance, you can do this:

var e = new(Employee)

// Compilation error.
var p *Person = e

// This works instead.
// So if we thought of this as a cast (we probably shouldn't),
// this would mean that we have to explicitly cast to the base class.
var p *Person = e.Person

// This works.
e.methodOnPerson()

// And this works.
// Name is a field in the contained Person struct.
e.Name = 2

// These work too, but the extra qualification is unnecessary.
e.Person.methodOnPerson()

Interfaces, which we’ll see later, do give us some sense of an “is a” meaning.

Methods

Unlike C, but like C++ and Java classes, structs in Go can have methods – functions associated with the struct. But the syntax is a little different than in C++ or Java. Methods are declared outside of the struct declaration, and the association is made by specifying a “receiver” before the function name. For instance, this declares (and implements) a DoSomething method for the Thing struct:

func (t Thing) DoSomething() {
  ...
}

Notice that you have to specify a name for the receiver – there is no built-in “self” or “this” instance name. This feels like an unnecessary invitation to inconsistency.

You can use a pointer type instead, and you’ll have to if you want to change anything about the struct instance:

func (t *Thing) ChangeSomething() {
  t.a = 4
}

Because you should also want to keep your code consistent, you’d therefore want to use a pointer type for all method receivers. So I don’t know why the language lets it ever be a struct value type.

Unlike C++ or Java, this lets you check the instance for nil (Go’s null or nullptr), making it acceptable to call your method on a null instance. This reminds me of how Objective-C happily lets you call a method (“send a message to” in Objective-C terminology) on a nil instance, with no crash, even returning a nil or zero return value. I find that undisciplined in Objective-C, and it bothers me that Go allows this sometimes, but not consistently.

Unlike C++ or Java, you can even associate methods with non struct (non class) types. For instance:

type Meters int
type Feet int

func (Meters) convertToFeet() (Feet) {
  ...
}

Meters m = 10
f := p.convertToFeet()

No equality or comparison operator overloading

In C++, you can overload operator =, !=, <, >, etc, so you can use instances of your type with the regular operators, making your code look tidy:

MyType a = getSomething();
MyType b = getSomethingElse();
if (a == b) {
  ...
}

You can’t do that in Go (or Java, though it has the awkward Comparable interface and equals() method). Only some built-in types are comparable in Go – the numeric types, string, pointers, or channels, or structs or arrays made up of these types. This is an issue when dealing with interfaces, which we’ll see later.

Symbol Visibility: Uppercase or lowercase first letter

Symbols (types, functions, variables) that start with an uppercase letter are available from outside the package. Struct methods and member variables that start with an uppercase letter are available from outside the struct. Otherwise they are private to the package or struct.

For instance:

type Thing int // This type will be available outside of the package.
var Thingleton Thing// This variable will be available outside of the package.

type thing int // Not available outside of the package.
var thing1 thing // Not available outside of the package.
var thing2 Thing // Not available outside of the package.

// Available outside of the package.
func DoThing() {
  ...
}

// Not available outside of the package.
func doThing() {
  ...
}

type Stuff struct {
  Thing1 Thing // Available outside of the package.
  thing2 Thing // "private" to the struct.
}

// Available outside of the struct.
func (s Stuff) Foo() {
  ...
}

// Not available outside of the struct.
func (s Stuff) bar() {
  ...
}

// Not available outside of the package.
type localstuff struct {
...
}

I find this a bit strange. I prefer the explicit public and private keywords in C++ and Java.

Interfaces

Interfaces have methods

If two Go types satisfy an interface then they both have the methods of that interface. This is similar to Java interfaces. A Go interface is also a bit like a completely abstract class in C++ (having only pure virtual methods), but it’s also a lot like a C++ concept (not yet in C++, as of C++17). For instance:

type Shape interface {
  // The interface's methods.
  // Note the lack of the func keyword.
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
  DrawOnSurface(s Surface)
}

type Rectangle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (r *Rectangle) SetPosition(x int, y int) {
  ...
}

func (r *Rectangle) GetPosition() (x int, y int) {
  ...
}
func (r *Rectangle) DrawOnSurface(s Surface) {
   ...
}

// Other methods:
func (r *Rectangle) setCornerType(c CornerType) {
   ...
}
func (r *Rectangle) cornerType() (CornerType) {
   ...
}

type Circle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (c *Circle) SetPosition(x int, y int) {
  ...
}

func (c *Circle) GetPosition() (x int, y int) {
  ...
}

func (c *Circle) DrawOnSurface(s Surface) {
  ...
}

// Other methods:
...

You can then use the interface type instead of the specific “concrete” type:

var someCircle *Circle = new(Circle)
var s Shape = someCircle
s.DrawOnSurface(someSurface)

Notice that we use a Shape, not a *Shape (pointer to Shape), even though we are casting from a *Circle (pointer to circle). “Interface values” seem to be implicitly pointer-like, which seems unnecessarily confusing. I guess it would feel more consistent if pointers to interfaces just had the same behaviour as these “interface values”, even if the language had to disallow interface types that weren’t pointers.

Types satisfy interfaces implicitly

However, there is no explicit declaration that a type should implement an interface.

In this way Go interfaces are like C++ concepts, though C++ concepts are instead a purely compile-time feature for use with generic (template) code. Your class can conform to a C++ concept without you declaring that it does. And therefore, like Go interfaces, you can, if you must, use an existing type without changing it.

The compiler still checks that types are compatible, but presumably by checking the types’ list of methods rather than checking a class hierarchy or list of implemented interfaces. For instance:

var a *Circle = new(Circle)
var b Shape = a // OK. The compiler can check that Circle has Shape's methods.

Like C++ with dynamic_cast, Go can also check at runtime. For instance, you can check if one interface value refers to an instance that also satisfies another interface:

// Sometimes the Shape (our interface type) is also a Drawable
// (another interface type), sometimes not.
var a Shape = Something.GetShape()

// Notice that we want to cast to a Drawable, not a *Drawable,
// because Drawable is an interface.
var b = a.(Drawable) // Panic (crash) if this fails.

var b, ok = a.(Drawable) // No panic.
if ok {
  b.DrawOnSurface(someSurface)
}

Or we can check that an interface value refers to a particular concrete type. For instance:

// Get Shape() returns an interface value.
// Shape is our interface.
var a Shape = Something.GetShape()

// Notice that we want to cast to a *Thing, not a Thing,
// because Thing is a concrete type, not an interface.
var b = a.(*Thing) // Panic (crash) if this fails.

var b, ok = a.(*Thing) // No panic.
if ok {
  b.DoSomething()
}

Runtime dispatch

Interface methods are also like C++ virtual methods (or any Java method), and interface variables are also like instances of polymorphic base classes. To actually call the interface’s method via an interface variable, the program needs to examine its actual type at runtime and call that type’s specific method. Maybe, as with C++, the compiler can sometimes optimize away that indirection.

This is obviously not as efficient as directly calling a method, identified at compile time, of a templated type in a C++ template. But it is obviously much simpler.

Comparing interfaces

Interface values can be compared sometimes, but this seems like a risky business. Interface values are:

Not equal if their types are different.
Not equal if their types are the same and only one is nil.
Equal if their types are the same, and the types are comparable (see above), and their values are equal.

But if the types are the same, yet those types are not comparable, Go will cause a “panic” at runtime.

Wishing for an implements keyword

In C++ you can, if you wish, explicitly declare that a class should conform to the concept, or you can explicitly derive from a base class, and in Java you must use the “implements” keyword. Not having this with Go would take some getting used to. I’d want these declarations to document my architecture, explicitly showing what’s expected of my”concrete” classes in terms of their general purpose instead of just expressing their that by how some other code happens to use them. Not having this feels fragile.

The book suggests putting this awkward code somewhere to check that a type really implements an interface. Note the use of _ to mean that we don’t need to keep a named variable for the result.

var _ MyInterface = (*MyType)(nil)

The compiler should complain that the conversion is impossible if the type does not satisfy the interface. I think it would be wise this as the very minimum of testing, particularly if your package is providing types that are not really used in the package itself. For me, this is a poor substitute for an obvious compile-time check, using a specific language construct, on the type itself.

Interface embedding

Embedding an interface in an interface

Go has no notion of inheritance hierarchies, but you can “embed” one interface in another, to indicate that a class that satisfies one interface also satisfies the other. For instance:

type Positionable interface {
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
}

type Drawable interface {
  drawOnSurface(s Surface) }
}

type Shape interface {
  Positionable
  Drawable
}

To satisfy the Shape interface, any type must also satisfy the Drawable and Positionable interfaces. Therefore, any type that satisfies the Shape interface can be used with a method associated with the Drawable or Positionable interfaces. So it’s a bit like a Java interface extending another interface.

Embedding an interface-satisfying struct in a struct

We saw earlier how you can embed one struct in another anonymously. If the contained struct implements an interface, then the containing struct then also implements that interface, with no need for manually-implemented forwarding methods. For instance:

type Drawable interface {
 drawOnSurface(s Surface)
}

type Painter struct {
  ...
}

// Make Painter satisfy the Drawable interface.
func (p *Painter) drawOnSurface(s Surface) {
  ...
}

type Circle struct {
 // Make Circle satisfy the Drawable interface via Painter.
 Painter
 ...
}

func main() {
  ...
  var c *Circle = new(Circle)
 
  // This is OK.
  // Circle satisfies Drawable, via Painter
  c.drawOnSurface(someSurface)

  // This is also OK.
  // Circle can be used as an interface value of type Drawable, via Painter.
  var d Drawable = c
  d.drawOnSurface(someSurface)
}

This again feels a bit like inheritance.

I actually quite like how the (interfaces of the) anonymously contained structs affect the interface of the parent struct, even with Go’s curious interface system, though I wish the syntax was more obvious about what is happening. It might be nice to have something similar in C++. Encapsulation instead of inheritance (and the Decorator pattern) is a perfectly valid technique, and C++ generally tries to let you do things in multiple ways without having an opinion about what’s best, though that can itself be a source of complexity. But in C++ (and Java), you currently have to hand-code lots of forwarding methods to achieve this and you still need to inherit from something to tell the type system that you support the encapsulated type’s interface.

A C++ developer looks at Go (the programming language), Part 1: Simple Features

June 26, 2017GeneralC++, golang, javamurrayc

This is the first part of a series. Here are all 3 parts:

I’m reading “The Go Programming Language” by Brian Kernighan and Alan Donovan. It is a perfect programming language introduction, clearly written and perfectly structured, with nicely chosen examples. It contains no hand-waving – it’s aware of other languages and briefly acknowledges the choices made in the language design without lengthy discussion.

As an enthusiastic C++ developer, and a Java developer, I’m not a big fan of the overall language. It seems like an incremental improvement on C, and I’d rather use it than C, but I still yearn for the expressiveness of C++. I also suspect that Go cannot achieve the raw performance of C or C++ due to its safety features, though that maybe depends on compiler optimization. But it’s perfectly valid to knowingly choose safety over performance, particularly if you get more safety and more performance than with Java.

I would choose Go over C++ for a simple proof of concept program using concurrency and networking. Goroutines and channels, which I’ll mention in a later post, are convenient abstractions, and Go has standard API for HTTP requests. Concurrency is hard, and it’s particularly easy to choose safety over performance when writing network code.

Here are some of my superficial observations about the simpler features, which mostly seem like straightforward improvements on C. In part 2 I’ll mention the higher-level features and I’ll hopefully do a part 3 about concurrency. I strongly recommend that you read the book to understand these issues properly.

I welcome friendly corrections and clarifications. There are surely several mistakes here, hopefully none major.

No semicolons at the end of lines

Let’s start with the most superficial thing. Unlike C, C++, or Java, Go doesn’t need semicolons at the end of lines of code. So this is normal:

a = b
c = d

This is nicer for people learning their first programming language. It can take a while for those semicolons to become a natural habit.

No () parentheses with if and for

Here’s another superficial difference. Unlike C or Java, Go doesn’t put its conditions inside parentheses with if and for. That’s another small change that feels arbitrary and makes C coders feel less comfortable.

For instance, in Go we might write this:

for i := 0; i < 100; i++ {
  ...
}

if a == 2 {
  ...
}

Which in C would look like this:

for (int i = 0; i < 100; i++) {
  ...
}

if (a == 2) {
  ...
}

Type inference

Go has type inference, from literal values or from function return values, so you don’t need to restate types that the compiler should know about. This is a bit like C++’s auto keyword (since C++11). For instance:

var a = 1 // An int.
var b = 1.0 // A float64.
var c = getThing()

There’s also a := syntax that avoids the need for var, though I don’t see the need for both in the language:

a := 1 // An int.
b := 1.0 // A float64
d := getThing()

I love type inference via auto in modern C++, and find it really painful to use any language that doesn’t have this. Java feels increasingly verbose in comparison, but maybe Java will get there. I don’t see why C can’t have this. After all, they eventually allowed variables to be declared not just at the start of functions, so change is possible.

Types after names

Go has types after the variable/parameter/function names, which feels rather arbitrary, though I guess there are reasons, and personally I can adapt. So, in C you’d have

Foo foo = 2;

but in Go you’d have

var foo Foo = 2

Keeping a more C-like syntax would have eased C developers into the language. These are often not people who embrace even small changes in the language.

No implicit conversions

Go doesn’t have implicit conversions between types, such as int and uint, or floats and int. This also applies to comparison via == and !=.

So, these won’t compile:

var a int = -2
var b uint = a
var c int = b

var d float64 = 1.345
var e int = c

C compiler warnings can catch some of these, but a) People generally don’t turn on all these warnings, and they don’t turn on warnings as errors, and b) the warnings are not this strict.

Notice that Go has the type after the variable (or parameter, or function) name, not before.

Notice that, unlike Java, Go still has unsigned integers. Unlike C++’s standard library, Go uses signed integers for sizes and lengths. Hopefully C++ will get do that too one day.

No implicit conversions because of underlying types

Go doesn’t even allow implicit conversions between types that, in C, would just be typedefs. So, this won’t compile

type Meters int
type Feet int
var a Meters = 100
var b Feet = a

I think I’d like to see this as a warning in C and C++ compilers when using typedef.

However, you are allowed to implicitly assign a literal (untyped) value, which looks like the underlying type, to a typed variable, but you can’t assign from an actual typed variable of the underlying type:

type Meters int
var a Meters = 100 // No problem.

var i int = 100
var b Meters = i // Will not compile.

No enums

Go has no enums. You should instead use const values with the iota keyword. So, while C++ code might have this:

enum class Continent {
  NORTH_AMERICA,
  SOUTH_AMERICA,
  EUROPE,
  AFRICA,
  ...
};

Continent c = Continent::EUROPE;
Continent d = 2; // Will not compile

in Go, you’d have this:

type continent int

const (
  CONTINENT_NORTH_AMERICA continent = iota
  CONTINENT_SOUTH_AMERICA // Also a continent, with the next value via iota.
  CONTINENT_EUROPE // Also a continent, with the next value via iota.
  CONTINENT_AFRICA // Also a continent, with the next value via iota.
)

var c continent = CONTINENT_EUROPE
var d continent = 2 // But this works too.

Notice how, compared to C++ enums, particularly C++11 scoped enums, each value’s name must have an explicit prefix, and the compiler won’t stop you from assigning a literal number to a variable of the enum type. Also, the Go compiler doesn’t treat these as a group of associated values, so it can’t warn you, for instance, if you forget to mention one in a switch/case block.

Switch/Case: No fallthrough by default

In C and C++, you almost always need a break statement at the end of each case block. Otherwise, the code in the following case block will run too. This can be useful, particularly when you want the same code to run in response to multiple values, but it’s not the common case. In Go, you have to add an explicit fallthrough keyword to get this behaviour, so the code is more concise in the general case.

Switch/Case: Not just basic types

In Go, unlike in C and C++, you can switch on any comparable value, not just values known at compile time, such as ints,Â enums, or other constexpr values. So you can switch on strings, for instance:

switch str {
  case "foo":
  doFoo()
case "bar":
  doBar()
}

This is convenient and I guess that it is still compiled to efficient machine code when it uses compile-time values. C++ seems to have resisted this convenience because it couldn’t always be as efficient as a standard switch/case, but I think that unnecessarily ties the switch/case syntax to its original meaning in C when people expected to be more aware of the mapping from C code to machine code.

Pointers, but no ->, and no pointer arithmetic

Go has normal types and pointer types, and uses * and & as in C and C++. For instance:

var a thing = getThing();
var p *thing = &a;
var b thing = *p; // Copy a by value, via the p pointer

As in C++, the new keyword returns a pointer to a new instance:

var a *thing = new(thing)
var a thing = new(thing) // Compilation error

This is like C++, but unlike Java, in which any non-fundamental types (not ints or booleans, for instance) are effectively used via a reference (it just looks like a value), which can confuse people at first by allowing inadvertent sharing.

Unlike C++, you can call a method on a value or a pointer using the same dot operator:

var a *Thing = new(Thing) // You wouldn't normally specify the type.
var b Thing = *a
a.foo();
b.foo();

I like this. After all, the compiler knows whether the type is a pointer or a value, so why should it bother me with complaints about a . where there should be a -> or vice-versa? However, along with type inference, this can slightly obscure whether your code is dealing with a pointer (maybe sharing the value with other code) or a value. I’d like to see this in C++, though it would be awkward with smart pointers.

You cannot do pointer arithmetic in Go. For instance, if you have an array, you can’t step through that array by repeatedly adding 1 to a pointer value and dereferencing it. You have to access the array elements by index, which I think involves bounds checking. This avoids some mistakes that can happen in C and C++ code, leading to security vulnerabilities when your code accesses unexpected parts of your application’s memory.

Go functions can take parameters by value or by pointer. This is like C++, but unlike Java, which always takes non-fundamental types by (non const) reference, though it can look to beginner programmers as if they are being copied by value. I’d rather have both options with the code showing clearly what is happening via the function signature, as in C++ or Go.

Like Java, Go has no notion of const pointers or const references. So if your function takes a parameter as a pointer, for efficiency, your compiler can’t stop you from changing the value that it points to. In Java, this is often done by creating an immutable type, and many Java types, such as String, are immutable, so you can’t change them even if you want to. But I prefer language support for constness as in C++, for pointer/reference parameters and for values initialized at runtime. Which leads us to const in Go.

References, sometimes

Go does seem to have references (roughly, pointers that look like values), but only for the built-in slice, map, and channel types.Â (See below about slices and maps.) So, for instance, this function can change its input slide parameter, and that change will be visible to the caller, even though the parameter is not declared as a pointer:

func doThing(someSlice []int) {
  someSlice[2] = 3;
}

In C++, this would be more obviously a reference:

void doThing(Thing& someSlice) {
  someSlice[2] = 3;
}

I’m not sure if this is a fundamental feature of the language or just something about how those types are implemented. It seems confusing for just some types to act differently, and I find the explanation a bit hand-wavy. Convenience is nice, but so is consistency.

const

Go’s const keyword is not like const in C (rarely useful) or C++, where it indicates that a variable’s value should not be changed after initialization. It is more like C++’s constexpr keyword (since C++11), which defines values at compile time. So it’s a bit like a replacement for macros via #define in C, but with type safety. For instance:

const pi = 3.14

Notice that we don’t specify a type for the const value, so the value can be used with various types depending on the syntax of the value, a bit like a C macro #define. But we can restrict it by specifying a type:

const pi float64 = 3.14

Unlike constexpr in C++, there is no concept of constexpr functions or types that can be evaluated at compile time, so you can’t do this:

const pi = calculate_pi()

and you can’t do this

type Point struct {
  X int
  Y int
}

const point = Point{1, 2}

though you can do this with a simple type whose underlying type can be const:

type Yards int
const length Yards = 100

Only for loops

All loops in Go are for loops – there are no while or do-while loops. This simplifies the language in one way compared, for instance, to C, C++, or Java, though there are now multiple forms of for loop.

For instance:

for i := 0; i < 100; i++ {
  ...
}

or, like a while loop in C:

for keepGoing {
  ...
}

And for loops have a range-based syntax for containers such as string, slices or maps, which I’ll mention later:

for i, c := range things {
  ...
}

C++ has range-based for loops too, since C++11, but I like that Go can ~~(optionally)~~ give you the index as well as the value. (It gives you the index, or the index and the value, letting you ignore the index with the _ variable name.)

A native (Unicode) String type

Go has a built-in string type, and built in comparison operators such as ==, !=, and < (as does Java). Like Java, Strings are immutable, so you can’t change them after you’ve created them, though you can create new Strings by concatenating other Strings with the built in operator +. For instance:

str1 := "foo"
str2 := str1 + "bar"

Go source code is always UTF-8 encoded and string literals may contain non-ASCII utf-8 code points. Go calls Unicode code points “runes”.

Although the built-in len() function returns the number of bytes, and the built in operator [] for strings operates on bytes, there is a utf8 package for dealing with strings as runes (Unicode code points). For instance:

str := "foo"
l := utf8.RuneCountInString(str)

And the range-based for loop deals in runes, not bytes:

str := "foo"
for _, r := range str {
  fmt.Println("rune: %q", r)
}

C++ still has no standard equivalent.

Slices

Go’s slices are a bit like dynamically-allocated arrays in C, though they are really views of an underlying array, and two slices can be views into different parts of the same underlying array. They feel a bit like std::string_view from C++17, or GSL::span, but they can be resized easily, like std::vector in C++17 or ArrayList in Java.

We can declare a span like so, and append to it:

a := []int{5, 4, 3, 2, 1} // A slice
a = append(a, 0)

Arrays (whose size cannot change, unlike slices) have a very similar syntax:

a := [...]int{5, 4, 3, 2, 1} // An array.
b := [5]int{5, 4, 3, 2, 1} // Another array.

You must be careful to pass arrays to functions by pointer, or they will be (deep) copied by value.

Slices are not (deep) comparable, or copyable, unlike std::array or std::vector in C++, which feels rather inconvenient.

Slices don’t grow beyond their capacity (which can be more than their current length) when you append values. To do that you must manually create a new slice and copy the old slice’s elements into it. You can keep a pointer to an element in a slice (really to the element in the underlying array). So, as with maps (below), the lack of resizing is probably to remove any possibility of an pointer becoming invalid.

The built in append() function may allocate a bigger underlying array if it would need more than the existing capacity (which can be more than the current length). So you should always assign the result of append() like so:

a = append(a, 123)

I don’t think you can keep a pointer to an element in a slice. If you could, the garbage collection system would need to keep the previous underlying array around until you had stopped using that pointer.

Unlike C or C++ arrays, and unlike operator [] with std::vector, attempting to access an invalid index of a slice will result in a panic (effectively a crash) rather than just undefined behaviour. I prefer this, though I imagine that the bounds checking has some small performance cost.

Maps

Go has a built-in map type. This is roughly equivalent to C++’s std::map (balanced binary trees), or std::unordered_map (hash tables). Go maps are apparently hash tables but I don’t know if they are separate-chaining hash tables (like std::unordered_map) or open-addressing hash tables (like nothing in standard C++ yet, unfortunately).

Obviously, keys in hash tables have to be hashable and comparable. The book mentions comparability, but so few things are comparable that they would all be easily hashable too. Only basic types (int, float64, string, etc, but not slices) or structs made up only of basic types are comparable, so that’s all you can use as a key. You can get around this by using a basic type (such as an int or string) that is (or can be made into) a hash of your value. I prefer C++’s need for a std::hash<> specialization, though I wish it was easier to write one.

Unlike C++’, you can’t keep a pointer to an element in a map, so changing one part of the value means copying the whole value back into the map, presumably with another lookup. Go apparently does this to completely avoid the problem of invalid pointers when the map has to grow. C++ instead lets you take the risk, specifying when your pointer could become invalid.

Go maps are clearly a big advantage over C, where you otherwise have to use some third-party data structure or write your own, typically with very little type safety.

They look like this:

m := make(map[int]string)
m[3] = "three"
m[4] = "four"

Multiple return values

Functions in Go can have multiple return types, which I find more obvious then output parameters. For instance:

func getThings() (int, Foo) {
  return 2, getFoo()
}

a, b := getThings()

This is a bit like returning tuples in modern C++, particularly with structured bindings in C++17:

std::tuple<int, Foo> get_things() {
  return make_tuple(2, get_foo());
}

auto [i, f] = get_things();

Garbage Collection

Like Java, Go has automatic memory management, so you can trust that instances will not be released until you have finished using them, and you don’t need to explicitly release them. So you can happily do this, without worrying about releasing the instance later:

func getThing() *Thing {
  a := new(Thing)
  ...
  return a
}

b := getThing()
b.foo()

And you can even do this, not caring, and not easily even knowing, whether the instance was created on the stack or the heap:

func getThing() *Thing {
  var a Thing
  ...
  return &a
}

b := getThing()
b.foo()

I don’t know how Go avoids circular references or unwanted “leak” references, as Java or C++ would with weak references.

I wonder how, or if, Go avoids Java’s problem with intermittent slowdowns due to garbage collection. Go seems to be aimed at system-level code, so I guess it must do better somehow.

However, also like Java, and probably like all garbage collection, this is only useful for managing memory, not resources in general. The programmer is usually happy to have memory released some time after the code has finished using it, not necessarily immediately. But other resources, such as file descriptors and database connections, need to be released immediately. Some things, such as mutex locks, often need to be released at the end of an obvious scope. Destructors make this possible. For instance, in C++:

void Something::do_something() {
  do_something_harmless();

  {
    std::lock_guard<std::mutex> lock(our_mutex);
    change_some_shared_state();
  }
  
  do_something_else_harmless();
}

Go can’t do this, so it has defer() instead, letting you specify something to happen whenever a function ends. It’s a annoying that defer is associated with functions, not to scopes in general.

func something() {
  doSomethingHarmless()

  ourMutex.Lock()
  defer ourMutex.Unlock()
  changeSomeSharedState()

  // The mutex has not been released yet when this remaining code runs,
  // so you'd want to restrict the use of the resource (a mutex here) to
  // another small function, and just call it in this function.
  doSomethingElseHarmless()
}

This feels like an awkward hack, like Java’s try-with-resources.

I would prefer to see a language that somehow gives me all of scoped resource management (with destructors), reference-counting (like std::shared_ptr<>) and garbage collection, in a concise syntax, so I can have predictable, obvious, but reliable, resource releasing when necessary, and garbage collection when I don’t care.

Of course, I’m not pretending that memory management is easy in C++. When it’s difficult it can be very difficult. So I do understand the choice of garbage collection. I just expect a system level language to offer more.

Things I don’t like in Go

As well as the minor syntactic annoyances mentioned above, and the lack of simple generic resource (not just memory) management, I have a couple of other frustrations with the language.

(I’m not loving the support for object orientation either, but I’ll mention that in a later article when I’ve studied it more.)

No generics

Go’s focus on type safety, particularly for numeric types, makes the lack of generics surprising. I can remember how frustrating it was to use Java before generics, and this feels almost that awkward. Without generics I soon find myself having to choose between lack of type safety or repeatedly reimplementing code for each type, feeling like I’m fighting the language.

I understand that generics are difficult to implement, and they’d have to make a choice about how far to take them (probably further than Java, but not as far as C++), and I understand that Go would then be much more than a better C. But I think generics are inevitable once, like Go, you pursue static type safety.

Somehow go’s slice and map containers are generic, probably because they are built-in types.

Lack of standard containers

Go has no queue or stack in its standard library. In C++, I use std::queue and std::stack regularly. I think these would need generics. People can use go’s slice (a dynamically-allocated array) to achieve the same things, and you can wrap that up in your own type, but your type, can only contain specific types, so you’ll be reimplementing this for every type. Or your container can hold interface{} types (apparently a bit like a Java Object or a C++ void*), giving up (static) type safety.

C++: std::string_view not so useful when calling C functions

June 20, 2017GeneralC++, C++17, gtkmmmurrayc

string_view does not own string data

C++17 adds std::string_view, which is a thin view of a character array, holding just a pointer and a length. This makes it easy to provide just one method that can efficiently take either a const char*, or a std::string, without unnecessary copying of the underlying array. For instance:

void use_string(std::string_view str);

You can then call that function like so:

use_string("abc");

std::string str("abc");
use_string(str);

This involves no deep copying of the character array until the function’s implementation needs to do that. Most obviously, it involves no copying when you are just passing a string literal to the function. For instance it doesn’t create a temporary std::string just to call the function, as would be necessary if the function took std::string.

string_view knows nothing of null-termination

However, though the string literal (“abc”) is null-terminated, and the std::string is almost-certainlyÂ null-terminated (but implementation defined), our use_string() function cannot know for sure that the underlying array is null terminated. It could have been called liked so:

const char* str = "abc"; // null-terminated
use_string(std::string_view(str, 2));Â  // not 3.

or even like so:

const char str[] = {'a', 'b', 'c'}; //not null-terminated
use_string(std::string_view(str, 3));

or as a part of a much larger string that we are parsing.

Unlike std::string, there is no std::string_view::c_str() which will give you a null-terminated character array. There is std::string_view::data() but, like std::string::data(), that doesn’t guarantee the the character array will be null-terminated. (update: since C++11, std::string::data() is guaranteed to be null-terminated, but std::string_view::data() in C++17 is not.)

So if you call a typical C function, such as gtk_label_set_text(), you have to construct a temporary std::string, like so:

void use_string(std::string_view str) {
  gtk_label_set_text(label, std::string(str).c_str());
}

But that creates a copy of the array inside the std::string, even if that wasn’t really necessary. std::string_view has no way to know if the original array is null-terminated, so it can’t copy only when necessary.

This is understandable, and certainly useful for pure C++ code bases, or when using C APIs that deal with lengths instead of just null termination. I do like that it’s in the standard library now. But it’s a little disappointing in my real world of integrating with typical C APIs, for instance when implementing gtkmm.

Implementation affecting the interface

Of course, any C function that is expected to take a large string would have a version that takes a length. For instance, gtk_text_buffer_set_text(), so we can (and gtkmm will) use std::string_view as the parameter for any C++ function that uses that C function. But it’s a shame that we can’t have a uniform API that uses the same type for all string parameters. I don’t like when the implementation details dictate the types used in our API.

There is probably no significant performance issue for small strings, even when using the temporary std::string() technique, but it wouldn’t be nice to make the typical cases worse, even just theoretically.

In gtkmm, we could create our own string_view type, which is aware of null termination, but we are trying to be as standard as possible so our API is as obvious as possible.

C++ Implementations of “Engineering the Compiler” Pseudocode

May 26, 2017Generalalgorithms, bigoquiz, book, C++, C++17, compilersmurrayc

Implementing is understanding

I’m gradually reading through the Engineering a Compiler book. I’m enjoying it but after a while I started wondering if I really understood the algorithms and I wanted to make sure of that before going further. So I implemented the algorithms in C++ (C++17, because it’s nice), testing them against the example inputs and grammars from the book. For instance, here is my code to construct, and use, the action and goto tables which define a DFA for a bottom-up table-driven LR(1) parser, along with some test code to use it with a simple expression grammar.

So far I’ve done this for Chapter 2 (Scanners) and Chapter 3 (Parsers) and I’ll try to keep going. It is a useful exercise for me, and maybe it’s useful to someone else reading the book.Â Please note that the code will probably be meaningless to you if you haven’t read the book or something similar. On the other hand, the code will probably seem childlike if you’ve studied compilers properly.

Trying to getÂ the code to work often showed me that I had had only the illusion of fully understanding. Much of the pseudocode in the book is not very clear, even when you adjust your mind to its vaguely mathematical syntax, and to know what it really means you have to closely read the text descriptions, sometimes finding clues several pages back. For instance it’s not always clear what is meant by a particular symbol, and I found at least one conditional check that appeared in the description but not in the code. I would much rather see descriptions inline in the code.

Pseudocode is vague

I’m not a fan of pseudocode either, though I understand the difficulty in choosing a real programming language for examples. It means putting some readers off. But there could at least be some real code in an appendix. The book has some detailed walkthroughs that show that the authors must have implemented the algorithms somehow, so it’s a shame that I couldn’t just look at that code. I wonder what real programming language might be the most compact for manipulating sets of states when dealing with finite state automata states and grammar symbols.

For the parsers chapter this wasn’t helped by the, presumably traditional, ambiguity of the “FIRST” sets terminology. FIRST(symbol), FIRST(symbols), FIRST(production), and FIRST+(production) are all things.

My code isn’t meant to be efficient. For instance, my LR(1) parser table-building code has some awfully inefficient use of std::set as a key in a std::map. The code is already lengthier than I’d like, so I would prefer not to make it more efficient at the cost of readability. I’ve also probably overdone it with the half-constexpr and vaguely-concepty-generic Grammar classes. But I am tempted by the idea of making these fully constexpr with some kind of constexpr map, and then one day having the compiler build the generated (DFA) tables at compile time, so I wanted to explore that just a little.

But the book is good

Despite my complaints about the presentation of the algorithms, so far I’d still recommend the book. I feel like it’s getting the ideas into my head , it’s not really that bad, and as far as I know there’s nothing better. Of course, given that I haven’t read the alternatives, my recommendation shouldn’t mean much.

Also, these days I always write up my notes as a bigoquiz.com quiz, so here are my quizzable compiler notes so far.

Boost Graph Library: modernization

May 17, 2017Generalbgl, boost, C++, c++11, C++14, C++17murrayc

Modern C++ (C++11 and later) can greatly simplify generic templated C++ code. I’ve recently been playing around with modernizing the Boost Graph Library code. The experience is similar to how I modernized the libsigc++ code, though I have not gone nearly into that much depth yet. The BGL is currently a big messy jumble of code that isn’t getting much love, and modernizing it could start to let its accumulated wisdom shine through, while also freeing it of other boost dependencies.

Please note that this is just an experiment in my own fork that even I am not pursuing particularly seriously. These changes are not likely to ever be accepted into the BGL because it would mean requiring modern C++ compilers. Personally, I think that’s the only way to move forward. I also think that Boost’s monolithic release process, and lack of a real versioning strategy, holds its libraries back from evolving. At the least, I think only a small set of generally-useful libraries should be released together, and I think that set should drop anything that’s now been moved into the standard library. BGL seems to have been stagnant for the last decade, so there doesn’t seem to be much to lose.

I’ve modernized the example code (and also tried to remove the “using namespace boost” lines), and done some work to modernize the boost graph code itself.

At the least, liberal use of auto can let you delete lots of ugly type declarations that really exist just to make things compile rather than to provide useful clues to the reader. For instance, auto makes the example code less cluttered with magic incantations. Range-based for loops simplify more code – for instance, in the examples.The simpler code is then easier to understand, letting you see the useful work underneath the boiler plate.

I’ve jumped right into C++17 and used structured bindings (and in the examples) because they are particularly helpful with the BGL API, which has many methods that return the begin and end of a range inside a std::pair. This lets us avoid some more type declarations. However, in the examples, I used a make_range_pair() utility function in many places instead, so I could use a simple range-based for. I guess that the range library would provide a type to use instead, maybe as part of the standard library one day.

I also replaced most uses of the boost type traits (some from boost::mpl) with the std:: equivalents. For instance, std::is_same instead of boost::is_same. It should now be possible to remove the boost::mpl dependency completely with a little work.

I’ve also tried putting all of BGL in the boost::graph namespace, instead of just in the boost namespace, but the API currently expects application code to use “using namespace boost”, to make its generic API work, and this makes that even more obvious.

As a next step, I guess that the boost graph implementation code could be simplified much more by use of decltype(auto). As I was modernizing libsigc++, I sometimes found templates that were used only by specializations that were used only by typedefs that were no longer needed, because they could be replaced by decltype(auto) return types. You have to pull at the threads one by one.