Using GitHub Actions to check builds

GitHub Actions

A couple of weeks ago I spent some time adding GitHub Actions to some of my projects. I was lucky enough to get access to the beta, but it has now become available for everyone.

GitHub Actions lets you specify builds, tests, etc, to run on certain triggers, such as after every commit push. I’ve now used it in several projects, each using different build environments. It has worked very well, and has really improved the developer experience, and therefore improved the code, in a few of my projects. It has encouraged me to use pull requests even for changes to niche personal projects so I can benefit fully.

Actually, I’m not sure that Actions is the best name for the system. It really lets you describe Jobs, and each Job might use a few Actions. But maybe that’s just a matter of perspective.

Now that it’s public, the pricing suggests that most open source projects will have enough time for free, for which I’m grateful. Nevertheless, history shows that it wouldn’t be wise to tie your projects too closely to one proprietary system. For instance, that’s another reason not to duplicate your build rules in a GitHub Actions workflow file. You should have a simple build system that you can just call. I generally do that via a simple Makefile, as I mention below.

Here are the GitHub projects that I’ve added GitHub Actions to:

libsigc++

We now have GitHub Actions to build and run tests for libsigc++ with several versions of 3 C++ compilers: 3 versions of gcc, 5 versions of clang, and even 1 version of msvc (thanks to Stuart Dootson). We will also add an action to check our C++ code formatting via clang-format.  If you’d like every libsigc++ commit to work with some other compiler, please do try to make that happen via a pull request.

This will give us valuable confidence that our changes will work everywhere, and give us valuable clues when they don’t. It’s great to know that people can get this feedback automatically when they submit a PR, without waiting for a human.

I might be abusing GitHub’s generosity with so many builds jobs, but luckily there are not that many commits to this project. I wonder if GitHub hope to fund some of the GitHub Actions costs for open source projects via GitHub Sponsors.

angular-bigoquiz -client (The bigoquiz.com frontend)

I added GitHub Actions to build and run tests for this Angular project. I put the npm/ng commands behind a Makefile to keep the CI rules as simple as possible. The only thing worse than having undocumented or overly-complicated build steps, is repeating those incantations in multiple places.

I like how it uses just one Workflow definition, but runs it with several Node versions, thanks to the strategy.matix specifier, and the with.node-version specifier made available by the actions/setup-node action.

See the Workflow definitions.

go-bigoquiz -server (The bigoquiz.com backend)

I added GitHub Actions to build and run tests for this Go microservice. It only builds for go 1.12, but I think I could parameterise that like the node version for angular-bigoquiz-client.

Again, I put the go commands behind a Makefile to keep the CI rules as simple as possible.

I did some major refactoring of this code, adding tests along the way, and having CI made that fairly painless.

See the Workflow definitions.

Minor Imperfections

I have lots of experience using Atlassian’s competing Bitbucket Pipelines system. GitHub Actions feels very similar, though I still miss a few things:

Can’t trigger builds for individual commits

I often create PRs with many small commits while refactoring old code. 10 commits is not so unusual, and 30 has happened. Good test coverage makes this possible, and inevitably that test coverage shows a problem in one of the commits. When that happens in a BitBucket PR, with BitBucket Pipelines, I like doing a manual binary search, starting a new CI run for individual commits until I see where the problem started. (Actually, I wish that it would automatically do that binary search, like a git bisect.)

Unfortunately, I can’t see any way to start a job for an arbitrary commit, so I can only ever see the result for the latest commit in a PR.

No caching

Doing full builds of projects on every commit is simple but wasteful. Ideally, we could do reliable incremental builds, though that’s hard to get right. But some caching could save time and heat.

For instance, Bitbucket Pipelines lets me cache previous Docker build steps and even gives me a ccache cache to reuse C++ compilation units. Well, the ccache cache is not a real cache – it’s just a dump of files that’s re-generated every couple of weeks, and is stale in the meantime, but it’s better than nothing.

Providing CI build time to companies is obviously meant to be a profitable business. Unfortunately, it makes the incentives a bit confused. For instance, I don’t believe that Bitbucket are eager to help their customers spend less money on build time, so they don’t offer very capable caching. But competing services, such as GitHub Actions, if they get this right, could make them care about losing customers.

Brain, refactored

For a long time I was too busy to learn anything much new, while running a company and caring for two small children. But more recently, I had the time to catch up on some technical knowledge and I remembered how much I love pure learning.

This post seems to be a big list of what I’ve learned over the last couple of years. Actually, I meant to write it a year ago, so now it’s also a list of what I need to revise.

C++

I caught up with the C++ renaissance by reading Effective Modern C++. and watching heaps of CppCon, C++Now, and Meeting C++ conference videos, such as CppCon 2014: Herb Sutter’s Back to the Basics: Essentials of Modern C++ Style, (the one with almost always auto).

I exercised my growing knowledge of modern C++ by first using some modern C++ in gtkmm and Glom, and then doing a massive rewrite of libsigc++. I also explored some modern C++ generic programming techniques with tuples and dynamic programming.

Along the way I experimented, in Glom’s code, with some of the ideas from Sean Parent’s talks, such as the “no naked for loops” idea from his “C++ Seasoning” talk from Going Native 2013.

I read the old Boost Graph Library (BGL) book, to tie C++ and graph theory together in my head. I spent lots of time, off and on, trying to clean the BGL code up, and modernize it, but it’s a struggle and I’ve only managed to get some simple stuff accepted so far.

The BGL book is ancient but its ideas about C++ concepts are only just about to become mainstream, hopefully in C++20. I also read Alexander Stepanov’s  wonderful From Mathematics to Generic Programming book, which convinced me even more.

I tried to get a simple explicit operator bool check into clang-tidy itself, but that didn’t get far and I chose not to be a pain by being pushy about such an unimportant contribution.

I started to learn about some compiler theory by writing some actual code.

I listened to the complete backlog of CppCast podcast episodes and I now listen to every episode as soon as its published.

Java and Android

I revised and updated my Java skills by reading Effective Java Programming and Programming Android. I regularly listened to the Fragmented and Android Backstage podcasts, but not so much recently. I gained some real-world Android programming experience by creating the Android app for Galaxy Zoo after first trying to create an Android app for my Glom database system. The Galaxy Zoo app was quite popular, but sadly, it’s not working right now – I need to find time to adapt it to their changed  backend API.

Java and web

I spent some time with Java as a way to work on backend (distributed) systems, reading the JAX-RS book, the,Spring Microservices in action book, and the Spring in Action book.

I gained some more Java/GWT/AppEngine (and Resty-GWT) experience by creating the Big-O Quiz website. I  used bigoquiz.com to take organized notes about the computer science that I learned subsequently. It asks me questions to remind me. But see below about me rewriting bigoquiz.com with Go and Angular.

iOS / Objective-C

I learned iOS (iPhone/iPad) programming by watching all 18 lectures in Stanford’s “Developing iOS 7 Apps for iPhone and iPad” course and then creating the iOS app for Galaxy Zoo. I used Objective-C, but now I have a reason to learn Swift.

Like the Android app, this is currently broken, because the backend server now has a very different API.

Go

I learned Go via the The Go Programming Language book, taking extensive notes from the perspective of a C++ developer. To gain initial experience, I rewrote bigoquiz.com‘s backend with Go, instead of Java, and found the resulting code refreshingly simple.

I now use Go regularly at work for backend microservices and I’m quite content with it. For larger projects I’d still prefer C++, but not all projects need to be large.

Angular

I finally gave up on GWT, long after everyone else realised that it’s barely maintained. So I took the opportunity to rewrite the bigoquiz.com frontend in Angular, using Typescript. Learning it was easy with the excellent Angular tutorial.

I’ve since tried out React too.

Scala

I read the huge Programming in Scala book, because I use Scala on a large project at work. I’m fond of it, but I think its simplest ideas should just be in Java, and the cleverest stuff tends to make code hard to read.

Algorithms and Data Structures

I grew to love computer science by doing Coursera’s/Stanford’s “Algorithms: Design and Analysis” course, part 1 and part 2 by Tim Roughgarden. I then read Cracking The Coding InterviewThe Algorithm Design Manual, Algorithms, by Robert Sedgewick, and the first half of the CLRS Introduction to Algorithms book. I worked through the first few lectures of Tim Roughgarden’s “A second course in Algorithms” to cover max-flow and min-cost matching algorithms.

I went through the material 2 or 3 times, from various perspectives, eventually making sure to actually implement each data structure and algorithm.

I also read Jon Bentley’s Programming Pearls book, for some practical advice.

I watched, and re-watched, far too many of Tushar Roy’s algorithm walkthrough videos.

Linux

I already have years of Linux development experience, but I knew I was missing out on some knowledge and habits. So I solidified my Linux knowledge by reading Linux System Programming. (And now, later, I think I should read it again, though I wish there was a newer edition.)

I’m still looking for a really good book about bash scripting and general command line cleverness.

Distributed Systems

I didn’t have practical experience with at-scale server-based systems but I became acquainted with the ideas by reading Google’s Site Reliability Engineering book.

I discovered it later, but Tim Berglund’s 4-hour Distributed Systems in One Lesson series of videos  is an excellent and engaging overview of the main themes. If it’s no longer available on YouTube but I think it’s so good that you should buy it.

Mikito Takada’s free Distributed Systems For Fun and Profit book is also an excellent introduction and overview.

I also listened to many old episodes of the Software Engineering Radio podcast and the Software Engineering Daily podcast, which often cover similar topics.

Performance

Even as a regular C and C++ coder, I realized how little I thought about the computer architecture that my code is running on. Various talks by Andrei Alexandrescu and Chandler Carruth really opened my eyes to these performance considerations that seem more relevant in at-scale server code than they were in typical desktop applications or even in embedded code.

For instance, Andrei Alexandrescu’s Code Dive 2015 “Writing Fast Code: part 1” and part 2 talks, as well as his “Optimization Tips – Mo’ Hustle Mo’ Problems” talk at CppCon 2014 and his “Writing Quick Code in C++, Quickly” talk from Going Native 2013.

For instance, Chandler Carruth’s “Understanding Compiler Optimization” talk from Meeting C++ 2015, his “Tuning C++: Benchmarks, and CPUs and Compilers! Oh My!” talk from CppCon 2015, his “Efficiency with Algorithms, Performance with Data Structures” talk from CppCon 2014, and his “Care and Feeding of C++’s Dragons” talk from Going Native 2013. Also his more recent “High Performance Code 201: Hybid Data Structures” from CppCon 2016.

Mike Acton’s famous “Data-Oriented Design and C++” talk, from CppCon 2014, really gets to the point about this too, from more of a C perspective.

Coding Habits

I finally took the time to learn Vim enough to use it as my daily editor, after reading Practical Vim. I also learned to use screen and Tux but I haven’t incorporated either into my daily habits.

Ironically I never found my Linux environment outside of the terminal to be as keyboard friendly as the old pre-X MacOS. So staying in the terminal helped me to keep my hands off the mouse again. I practiced with the Klavaro typing tutor to get some of my good habits back.

I worked through the first 25 tasks from Project Euler and most of the tasks from HackerRank. I spent two intensive months working through every task on InterviewBit (around 260 at the time).

I highly recommend InterviewBit – they have a vast set of coding problems, nicely organised, with helpful test cases and tips for when you get stuck, so you can actually improve. HackerRank is useful too, but it feels less directed and the need to parse stdin input for every task gets annoying.

Along the way, I took a note of each C++ typo or generic algorithm mistake that I made, and started counting how many times I made each mistake, so I could focus on breaking my worst habits.

Management

I read some books about project and people management too, revising agile processes with the Scrum and XP from the Trenches and Kanban and Scrum: Making the most of both free ebooks.

The Manager’s Path maps out the various roles that are popular at software companies today, with suggestions about regular actions that each might perform, but it doesn’t seem useful beyond establishing that common language, and I hope it becomes outdated.

I recently read Measure What Matters, and Radical Focus, about OKRs, and I’m very enthusiastic so far. I like the idea of openness and regularity in businesses, to allow focus and alignment, almost like well-functioning decentralized open source projects that I’ve known.

I also got interested in product management, but I haven’t taken it far yet. The Cracking the Product Manager Interview book was rather superficial. I hope to dig deeper into this topic in future.

Others

I revised my Design Patterns knowledge by reading Head First Design Patterns and I used it to build a Design Patterns quiz.

I learned the Lua programming language and the Moai game-development platform via the Programming in Lua book and the “Developing Mobile Games with Moai SDK” book. This was for an idea I’d had for a kids’ programming app that didn’t seem quite worth the effort in the end.

I learned a bit about Hadoop by reading the “definitive guide” book, but I never got around to using it.

C++17 in libsigc++ : invoke, apply, and constexpr if

I’ve made a fairly minor 2.99.11 release of the unstable libsigc++, just to make use of C++17. Here are some brief descriptions of the C++17 features used in libsigc++ so far:

std::invoke()

std::invoke() invokes a callable object, such as a plain function pointer, pointer to a member method, or a functor, such as a sigc::slot, using one generic syntax. We provide the arguments, and possibly an object intance, in the std::invoke() call. For instance:

  • std::invoke(function, 1, 2);
  • std::invoke(method, object, 1, 2);
  • std::invoke(functor, 1, 2);

By using std::invoke in libsigc++ (and here), I hope that this makes the implementation more generic though it’s not immediately obvious how.

std::apply()

std::apply() is a little like std::invoke(), but lets us invoke a callable object with the arguments in a tuple. For instance:

  • std::apply(function, std::make_tuple(1, 2));
  • std::apply(method, std::make_tuple(object, 1, 2)); // I think.
  • std::apply(functor, std::make_tuple(1, 2));

This is useful for libsigc++ template code that uses tuples to concatenate or rearrange sets of variadic parameters. It lets us replace several slightly less generic helper functions that just called a  functor with a tuple unpacked as variadic arguments via a provided std::index_sequence. Receiving variadic parameters into a tuple is normal, so this complements that well, letting us use some of those parameters in calls to other functions.

constexpr if

I also used constexpr if to avoid the need for a helper template. Before C++17, templated code could sometimes need to use another helper template to perform different operations depending on the type, sometimes where some operations won’t even compile with the other type. The different operations would then be isolated in different template specializations of the helper template, and the main template could just have one generic call to that helper template.

But constexpr if lets you put it all in the main template. For instance, it let us remove the little with_type<> template, which only existed so we could call a functor for some types, but not for others.

I also updated the tuple-utils from my murrayc-tuple-utils project, including several  uses of constexpr (I also used the new C++17 nested namespace syntax when doing that).

libsigc++ has moved to GitHub

We have moved libsigc++ from git.gnome.org to github.com. This should make working on it even nicer.

I’ve also moved its website to GitHub Pages rather than Sourceforge, now using the simple Jekyll templating system. But I’ve made only minimal changes, so I’d be glad if someone submitted a pull request to make the website better and prettier.

We probably should have done this sooner, because libsigc++ is in no way GNOME-specific, but GNOME’s git has been very good to us over the years and it’s been hard to trust anything else after Sourceforge fell apart. However, with GNOME’s gradual move to gitlab.gnome.org (which gtkmm will happily use) we would have to change something anyway. I am a happy (paying) GitHub user and it feels like a natural home.

I’m still rather proud of my modern C++ rewrite of libsigc++ but I’m sure it could benefit from more use than just by gtkmm 4, which is itself not used much yet. Maybe this move to GitHub will help with that.

Google App Engine: Using subdomains

Separating Frontend and Backend App Engine Services into Separate Subdomains

I’m in the middle of re-implementing my bigoquiz.com website into separate frontend (client) and backend (server) projects, which I’m deploying to separate App Engine “services”. Previously both the front end HTML and JavaScript, and the backend Java, were all in one (GWT) project, deployed together as one default App Engine service. I chose to separate the two services into their own subdomains. So the frontend is currently served from beta.bigoquiz.com and its JavaScript code uses the REST API provided by the backend, served from betaapi.bigoquiz.com. This will later be bigoquiz.com and api.bigoquiz.com.

This seemed wise at the time and I thought it was necessary but in hindsight it was an unnecessary detour. I would have been better off just mapping different paths to the services. I didn’t see how to do this at first with Google AppEngine, but it’s very easy. You just provide a dispatch.yaml file (also mentioned below) to map URL prefixes to the different services. For instance, /api/* would map to the backend service, and everything else would map to the frontend service. Larger systems might do this with a service gateway such as Netflix’s Zuul.

Nevertheless, this post mentions some of the issues you’ll have if you choose to split your frontend and backend into separate subdomains. It mentions App Engine, Go, and Angular (TypeScript), but the ideas are generally useful.

App Engine Service Names

Each service that you deploy to App Engine has an app.yaml file. This can have a line to name the service, which would otherwise just be called “default”. For instance, the first line of my frontend’s app.yaml file currently looks like this:

service: beta

The first line of my backend’s app.yaml file currently looks like this:

service: betaapi

When I deploy these services, via “gcloud app deploy” on the command line, I can then see them listed in the Google Cloud console, in the App Engine section, in the Services section.

Mapping Subdomains to App Engine Services

In the Google Cloud console, in the App Engine section, in the settings section, in the “custom domains” tab, you should add each subdomain. You’ll need to verify your ownership of each subdomain by adding DNS entries for Google to check.

App Engine very recently added the “Managed Security” feature, which automatically creates and manages (LetsEncrypt) SSL certificates, letting you serve content via HTTPS. Using this currently makes using subdomains more complicated, because it doesn’t yet support wildard SSL certificates. That’s likely to be possible soon when LetsEncrypt starts providing wildcards SSL certificates, so this section might become outdated.

Without Managed Security

If you aren’t using Managed Security yet, mapping subdomains to services is quite simple. Google’s documentation suggests that you just add a wildcard CNAME entry to the DNS for your domain, like so:

Record: *
 Type: CNAME
 Value: ghs.googlehosted.com

All subdomains will then be served by google. App Engine will try to map a subdomain to a service of the same name. So foo.example.com will map to a service named foo.

With Managed Security

However, if you are using Managed Security, you’ll need to tell App Engine about each individual subdomain so each subdomain can have its own SSL certificate. Unfortunately, you’ll then need to add the same 8 A and AAAA records to your DNS for your subdomain too.

Although App Engine does automatic subdomain-to-service mapping for wildcard domains, this doesn’t happen with specifically-listed subdomains. You’ll need to specify the mapping manually using a dispatch.yaml file, like so:

dispatch:
  - url: "beta.bigoquiz.com/*"
    service: beta
  - url: "betaapi.bigoquiz.com/*"
    service: betaapi

You can then deploy the dispatch.yaml file from the command line like so:

$ gcloud app deploy dispatch.yaml

I wish there was some way to split these dispatch rules up into separate files, so I could associate them with the separate codebases for the separate services. For now, I’ve put the dispatch.yaml file for all the services in the repository for my frontend code and I manually deploy it.

CORS (Cross Origin Resource Sharing)

By default, modern browsers will not allow Javascript that came from www.example.com (or example.com) to make HTTP requests to another domain or subdomain, such as api.example.com. This same-origin policy prevents malicious pages from accessing data on other sites, possibly via your authentication token.

If you try to access a subdomain from JavaScript served from a different subdomain, you’ll see error messages about this in the browser console, such as:

No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'beta.example.com' is therefore not allowed access.

For instance, JavaScript served from foo.example.com cannot normally access content at bar.example.com.

However, you can allow calls to a subdomain by using the CORS system. The browser will attempt the call, but will only provide the response back to the JavaScript if the response has the appropriate Allowed* headers. The browser may first attempt a separate CORS “pre-flight” request before actually issuing the request, depending on the request details.

If you configure your server to reply to a CORS requests with appropriate AllowedOrigins and AllowedMethods HTTP headers, you can tell the browser to allow the JavaScript to make the HTTP requests and receive the responses.

For instance, in Go, I’ve used rs/cors to respond to CORS requests, like so, passing in the original julienschmidt/httprouter that I’m using.

c := cors.New(cors.Options{
	AllowedOrigins: []string{"example.com},
	AllowedMethods: []string{"GET", "POST", "OPTIONS"},	
})

handler := c.Handler(router)
http.Handle("/", handler)

I also did this in my original Java code by adding a ContainerResponseFilter, annotated with @Provider.

Cookies: CORS

Even when the server responds to CORS requests with AllowedOrigins and AllowedMethods headers, by default the browser will not allow Javascript to send cookies  when it sends HTTP requests to other domains (or subdomains). But you can allow this by adding an AllowCredentials header to the server’s CORS response. For instance, I added the AllowCredentials header in Go on the server side, like so:

c := cors.New(cors.Options{
  	...
	AllowCredentials: true,})

You might need to specify this on the client-side too, because the underlying XMLHttpRequest defaults to not sending cookies with cross-site requests. For instance, I specify withCredentials in Angular (Typescript) calls to http.get(), like so:

this.http.get(url, {withCredentials: true})

Note Angular’s awful documentation for the withCredentials option, though Mozilla’s documentation for the XMLHttpRequest withCredentials option is clearer.

Cookies: Setting the Domain

To use a cookie across subdomains, for instance to send a cookie to a domain other than the one that provided the cookie, you may need to set the cookie’s domain, which makes the cookie available to all subdomains in the domain. Otherwise, the cookie will be available only to the specific subdomain that set it.

I didn’t need to do this because I had just one service on one subdomain. This subdomain sets the cookie in responses, the cookie is then stored by the browser, and the browser provides the cookie in subsequent requests to the same subdomain.

OAuth2 Callback

If your subdomain implements endpoints for oauth2 login and callback, you’ll need to tell App Engine about the subdomain. In the Google Cloud console, in the “APIs & Services” section, go to the Credentials section. Here you should enter the subdomain for your web page under “Authorized JavaScript origins”, and enter the subdomain for your oauth2 login and callback subdomain under “Authorized redirect URIs”.

The subdomain will then be listed appropriately in the configuration file that you download via the “Download JSON” link, which you can parse in your code, so that the oauth2 request specifies your callback URL. For instance, I parse the downloaded config .json file in Go using google.ConfigFromJSON() from the golang.org/x/oauth2/google package, like so:

func GenerateGoogleOAuthConfig(r *http.Request) *oauth2.Config {
	c := appengine.NewContext(r)

	b, err := ioutil.ReadFile(configCredentialsFilename)
	if err != nil {
		log.Errorf(c, "Unable to read client secret file (%s): %v", configCredentialsFilename, err)
		return nil
	}

	config, err := google.ConfigFromJSON(b, credentialsScopeProfile, credentialsScopeEmail)
	if err != nil {
		log.Errorf(c, "Unable to parse client secret file (%) to config: %v", configCredentialsFilename, err)
		return nil
	}

	return config
}

GWT: Changing from GWT-RPC to JSON with RestyGWT

GWT but with REST and JSON

My bigoquiz.com website uses my gwt-bigoquiz project, based on Java and GWT. One advantage of GWT is the serialization between the server and client. Using GWT-RPC, you can use the same Java classes on the server and the (compiled to JavaScript) browser client. (Well, at least when it works – when it doesn’t work it can be very hard to find out why). For instance, your Java service can have a getThing() method which your client code can call asynchronously, getting a Thing. It avoids duplication and avoids manually writing code to parse an intermediate format and recreate objects.

However, GWT-RPC ties you into using GWT on both the server and client. I would rather have a standard REST API that serves, and receives, JSON. I could then experiment with alternative implementations independently on the server side or the client side.

RestyGWT, together with Jersey, makes this quite easy. I have now ported gwt-bigoquiz from GWT-RPC to RestyGWT and Jersey. Here I describe how you might do that for your project.

You might also like to watch David Chandler’s talk about RestyGWT and Jersey at GWT.Create 2015 (slides, GitHub), which covers the same stuff in a little more detail, using slightly older versions.

Build

We need to add the following dependencies to our pom.xml file, so we can use RestyGWT, Jersey, Jackson and the JAX-RS annotations.

I’ve added comments to explain why we need each dependency, based on my best guesses. Please let me know about any errors you find. pom.xml files too often just contain cargo-culted blocks of XML pasted from blog entries or StackOverflow answers, far removed from whoever first wrote them. Still, for later versions of these dependencies, or different environments, you might need a different combination of versions or different workarounds.

<!-- For JAX-RS annotations, such as @GET, @POST, @Path, @PathParam, @QueryParam, etc -->
<dependency>
	<groupId>javax.ws.rs</groupId>
	<artifactId>javax.ws.rs-api</artifactId>
	<version>2.1-m09</version>
	<!-- Note: 2.0 gives us a ClassNotFoundException about RxInvokerProvider when we try to GET from the URL. -->
</dependency>

<!-- For client-side code to query a REST/JSON server API. This uses methods and classes annotated with JAX-RS annotations, doing serialization/deserialization with Jackson. -->
<dependency>
	<groupId>org.fusesource.restygwt</groupId>
	<artifactId>restygwt</artifactId>
	<version>2.2.0</version>
</dependency>

<!-- For JSON serialization/deserialization based on the JAX-RS annotations. Used by RestyGWT on the client side. Also adds some annotations such as @JsonInclude and @JsonIgnore, -->
<dependency>
	<groupId>com.fasterxml.jackson.jaxrs</groupId>
	<artifactId>jackson-jaxrs-json-provider</artifactId>
	<version>2.8.9</version>
</dependency>

<!-- To serve REST/JSON queries of REST resources.
	 A <servlet> tag in the web.xml file indicates that Jersey should use certain classes as REST resources/servelets.
	 These REST resources/servlets use Java methods and classes annotated with JAX-RS and Jackson annotations.

	 jersey-container-servlet needs the Java Servlet API version 3, supported by the AppEngine Java 8 runtime (currently beta).
	 Alternatively, jersey-container-servlet-core needs the Java Servlet API version 2, supported by the AppEngine Java 7 runtime. -->
<dependency>
	<groupId>org.glassfish.jersey.containers</groupId>
	<artifactId>jersey-container-servlet</artifactId>
	<version>2.26-b07</version>
</dependency>

<!-- To let Jersey use Jackson. -->
<dependency>
	<groupId>org.glassfish.jersey.media</groupId>
	<artifactId>jersey-media-json-jackson</artifactId>
	<version>2.26-b07</version>
</dependency>

<!-- Workaround this error: java.lang.IllegalStateException: InjectionManagerFactory not found. See https://stackoverflow.com/a/44546979/1123654 -->
<dependency>
	<groupId>org.glassfish.jersey.inject</groupId>
	<artifactId>jersey-hk2</artifactId>
	<version>2.26-b07</version>
</dependency>

Configuration

We must edit some XML files so we can use RestyGWT and Jersey.

.gwt.xml Module file

Our .gwt.xml Module definition must mention RestyGWT inside the tag, so our client code can use RestyGWT.

<inherits name="org.fusesource.restygwt.RestyGWT"/>

web.xml file

We must add <servlet> and <servlet-mapping> tags in the web.xml file to state that Jersey should use our classes to serve REST resources as servlets.

<servlet>
    <servlet-name>restServlet</servlet-name>
    <servlet-class>org.glassfish.jersey.servlet.ServletContainer</servlet-class>
    <init-param>
        <param-name>jersey.config.server.provider.packages</param-name>
        <param-value>com.murrayc.myproject.server.api</param-value>
    </init-param>
</servlet>
<servlet-mapping>
    <servlet-name>restServlet</servlet-name>
    <url-pattern>/api/*</url-pattern>
</servlet-mapping>

Server-Side Code Changes

The REST Resource

The classes for our REST resources must be in the package specified in the web.xml file., where Jersey can find them. These correspond roughly to the *ServiceImpl classes, derived from GWT’s RemoteServiceServlet, that we used with GWT-RPC.

These classes don’t need to derive from any special base class or implement any special interface. They do need to be annotated with @Path to specify the first part of the name of the resource. For instance:

@Path("thing")
public class ThingResource {
    ...
}

We then annotate methods of the class with @GET or @POST and @Produces.

Without a @Path annotation on the method, the method matches the base path of the class. For instance, this might return all available Things, when a client GETs from the thing/ URL.

@GET
@Produces("application/json")
public Collection<Thing> get() {
    ...
}

We can use additional @Path annotations on the method. For instance, this might return a Thing with a matching Thing ID, when a client GETs from the thing/123 URL, for instance.

@GET
@Path("/{id}")
@Produces("application/json")
public Thing getById(@PathParam("id") String id) {
    ...
}

Note that the Java method name does not appear in the URL.

The ServletContext

In our GWT-RPC service classes, we could get the ServletContext via getServletConfig().getServletContext(). With our REST classes, we instead use the JAX-RS @Context attribute on a member field, like so:

@Context
ServletContext context;

Jersey apparently then assigns the context at runtime.

Client-Side Code Changes

With GWT-RPC, we had a client interface like this:

public interface ThingServiceAsync {
    void getThings(AsyncCallback<List<Thing>> async);
    ...
}

which we called like so:

AsyncCallback<List<Thing>> callback = new AsyncCallback<List<Thing>>() {
    @Override
    public void onFailure(Throwable caught) {
        ...
    }

    @Override
    public void onSuccess(List<Thing> result) {
        getView().setThings(result);
    }
};

ThingServiceAsync.Util.getInstance().getThings(callback);

With RestyGWT, we’ll instead have a client interface like this:

@Path("/api/thing")
public interface ThingClient extends RestService {
    @GET
    public void getThings(MethodCallback<List<Thing>> callback);
    
    @GET
    @Path("/{id}")
    public void getThing(@PathParam("id") String id, MethodCallback<Thing> callback);
}

And we’ll use it like so:

MethodCallback<List<Thing>> callback = new MethodCallback<List<Thing>>() {
    @Override
    public void onFailure(Method method, Throwable caught) {
        ...
    }

    @Override
    public void onSuccess(Method method, List<Thing> result) {
        getView().setThingList(result);
    }
};

Defaults.setServiceRoot(GWT.getHostPageBaseURL());
ThingClient client = GWT.create(ThingClient.class);
client.getThings(callback);

So to change the GWT-RPC client code to RestyGWT client code, you only need to:

  • Create the client interface.
  • Use it instead of the GWT-RPC Async client class.
  • Change AsyncCallback to MethodCallback.
  • Add the Method parameter to onFailure() and onSuccess().

Parameters

A method of a GWT-RPC service just takes Java parameters. But for a REST service, you need to decide whether these are path parameters, or query parameters. For instance, if the URL is /things/123?color=blue then there is one path parameter (123), and one query parameter (color, with value blue). You would indicate this in your service class like so, using the JAX-RS attributes:

@GET
@Path("/{thing-id}")
@Produces("application/json")
public Thing getById(@PathParam("thing-id") String thingId, @QueryParam("color-id") String colorId) {
    ...
}

and in the client interface like so:

@Path("/api/thing")
public interface ThingClient extends RestService {
    @GET
    @Path("/{thing-id}")
    public void getThing(@PathParam("thing-id") String thingId, @QueryParam("color-id") String colorId, MethodCallback<Thing> callback);
}

Ignoring some Getters

Not all get methods should result in items in the JSON. You can avoid this with the Jackson @JsonIgnore annotation. For instance:

public class Thing {
    ...

    @JsonIgnore
    public getIrrelevantStuff() {
       ...
    }
}

Ignoring null and empty objects

To keep your JSON small, you can use Jackson’s @JsonInclude attribute to stop Jersey from writing name/value pairs with null or empty values. Of course this can obscure the JSON format by hiding its possible contents. For instance:

@JsonInclude(JsonInclude.Include.NON_EMPTY)
public class Thing {
    ...
}

You might choose to use NON_NULL rather than NON_EMPTY.

Adding Setters

If our REST GET method returns an instance of Thing, Jersey will serve a JSON string based on any public get*() methods that take no parameters, and the public fields. So, this class,

public class Thing {
    public String id;
    public getStuff() {
       ...
    }
}

would produce JSON like this, recursing into contained class instances:

{
  "id": "123",
  "stuff" {
     "foo": true,
     "bar": 789
  }
}

However, if there are no corresponding setter functions, such as setStuff(), RestyGWT (or Jackson, which it uses) will silently ignore these parts of the JSON when deserializing the JSON into a client-side Java instance. If there is a way to detect this at runtime, I’d like to know about it.

This makes it particularly important to test the serialization and deserialization. For instance:

@Test
public void ThingJsonTest() throws IOException {
    // The Jackson ObjectMapper:
    ObjectMapper objectMapper = new ObjectMapper();

    // Get the JSON for an object:
    Thing objToWrite = new Thing();
    objToWrite.setFoo(2);
    // etc.
    
    final String json = objectMapper.writeValueAsString(objToWrite);
    assertNotNull(json);
    assertFalse(json.isEmpty());

    // Get an object from the JSON: 
    Thing obj = objectMapper.readValue(json, Thing.class);
    assertNotNull(obj);
    assertEquals(2, obj.getFoo());
    // etc.
}

Removing the GWT-RPC code

Once everything is working, we can remove the unused GWT-RPC classes and configuration:

  • The ThingService interface, which extends RemoteService.
  • The ThingServiceAsync interface.
  • The ThingServiceImpl class.
  • The <servlet> and <servlet-mapping> tags from web.xml

A C++ developer looks at Go (the programming language), Part 3: Concurrency

This is the third of my posts about looking at Go from the perspective of a C++ developer. I think that will be enough for now. Here are all 3 parts:

Here I mention Go’s built-in concurrency features, letting you write code that makes the best use of one, or more, processors. I am not an expert in concurrency, and I think few people are. I’ve had some really helpful friendly feedback about the previous parts from Go developers, so I hope that continues. I will gladly correct any mistakes that are pointed out to me so the text can become more useful.

However, the mentions of C++ coroutines here might also be particularly useless. Clang and gcc don’t support coroutines in their stable releases yet and it’s hard to find definitive explanations. What I can find is contradictory and far from as  simple as Go, probably because the C++ coroutines specification has been in flux for a while. I would very much welcome corrections. Interest in C++ coroutines seems to be heating up now so I guess this will settle down soonish. I think it will then be time for some books to be updated. I hope that the end result is at least as expressive as Go but I am not confident of that so far.

As before, I strongly suggest that you read “The Go Programming Language”, (referred to her as “the book”), from which I’ve learned this stuff. At least at first, the structure of this text is aimed at people familiar with C++ who are interested in Go, but the comparison does break down later.

Goroutines and std::task

Like C++ (since C++11), Go has support for simple concurrency features such as lockable mutexes (which we’ll see later), but it doesn’t let you create raw threads. Instead if offers goroutines.

Goroutines are, at their simplest, functions that can run concurrently with other functions. Two goroutines might even run on different CPUs, meaning that they run truly simultaneously. You can start one just by using the “go” keyword before a normal function call. The function will then execute, without the caller waiting for it to complete.

go doSomethingThatTakesALongTime()
doSomethingElseAtTheSameTime.

While the goroutine does its work, maybe using traditional cooperative multi-threading on the same CPU, maybe on another CPU, the implicit main goroutine can also do work. Update: But careful. As mentioned later, this program won’t wait for the first goroutine to finish if the main goroutine finishes first.

So far, at least, this is a bit like starting a std::task in C++ with std::async() (since C++11):

auto future = std::async(doSomethingThatTakesALongTime)
doSomethingElseAtTheSameTime()
future.wait();

(Update: As James pointed out in a comment, if you don’t keep the returned future, its destructor will run, which will cause a wait, making the call not async.)

Hybrid Threading (m:n Threading)

A std::task in C++ is not guaranteed to start an actual new OS (kernel) thread, unless you pass std::launch::async as an extra argument. It will sometimes, but that’s up to the runtime and the implementation. I think the runtime, in some implementations, might even be able to move tasks to real threads and between threads, but I’m not sure. Scott Meyer’s “Effective Modern C++” has some good discussion of tasks and C++ concurrency in general.

Goroutines are not ambiguous about this. Goroutines don’t map directly to kernel (OS) threads. They use m:n threading, in which user space (the Go runtime, in this case) schedules its own m “threads”, mapping them onto a pool of n kernel threads. This seems to be what is referred to as “Hybrid Threading” in Robert Love’s “Linux System Programming” book – a mixture of 1:1 Threading (just using kernel threads) and N:1 Threading (or User-Level Threading, apparently sometimes called “green threads” or “lightweight threads”.). Presumably there is some very clever scheduling code in Go’s runtime and in the C++ coroutines implementation.

Because Goroutines don’t map directly to real threads, they don’t need the full stack size given to a kernel thread, letting us comfortably use far more goroutines than we would use threads. However, you only get the full benefit of this when the work is happening purely in the Go runtime. The book explains that “Goroutines that are blocked in I/O or other system calls or are calling non-Go functions, do need an OS-thread”. So I guess you still wouldn’t want, for instance, a goroutine per network connection, if you are serving many network clients.

Channels: Getting One Result

In C++, std::async() returns a std::future. This is a way to later wait for the result from the task, or just get the result if it is already ready, having made good use of the original thread’s time in the meantime.

int doSomethingThatTakesALongTime() {
  ...
  return 5;
}
 
auto f = std::async(doSomethingThatTakesALongTime)
doSomethingElseLengthyAtTheSameTime()
auto answer = f.get()

The go keyword doesn’t return a future. (Update: And doesn’t let you use the return value.) You could use the function’s result, but you’d just immediately start waiting for it, defeating the purpose (I wonder why Go allows this at all.)

Instead, to provide a result, a Go program would use a Go channel. Go channels are queues of your chosen elements. Channels are a built-in type, so they are generic – they can contain various types. They can be created like so, using the make() built-in function as for other built-in types:

c := make(chan string)

The Go language has special syntax for sending to, or receiving from, channels via the <- and -> operators, though I don’t see what would have been so bad about having send() and receive() methods.

Go channels are “concurrency safe”, so you can pass one to a goroutine, and both goroutines can happily manipulate it. For instance, to get a response to a long-running function, but only after we’ve tried to make good use of the waiting time:

func doSomethingThatTakesALongTime(c chan int) int {
  ...
  c <- 5
}
 
c := make(chan int)
go doSomethingThatTakesALongTime(c)
doOtherStuffInTheMeantime()
answer := <- c

That attempt to receive from the channel (with the <- operator) will block until the channel has something to provide, a bit like how std::future::get() blocks until the result is ready.

It would be fairly easy to create a future type that uses a channel, then you could write an async method that uses the go keyword and returns the future. But I think it would be nice if Go made this simple case simpler by doing this for us. Maybe it does.

I’m also a little uncomfortable with the calling goroutine (the main goroutine here) having to create the channel, repeating the type of that channel (it’s also in the called goroutines function signature). I think it would be nice if there was some syntax that gave us a new instance of the channel (Ready for receiving. See about unidirectional channels later.) that the called goroutine expected to get, while also allowing us to use an existing channel if necessary. That would be hard to get right, of course.

If the channel is empty and has been closed then it will then provide a zero value immediately. Or you can get the extra result to check if the channel is still open or still has values:

answer, ok := <- c
if (!ok) {
  // The channel is empty and has been closed.
}

Goroutines and C++ Coroutines

Goroutines feel a little like C++ coroutines (not yet in C++ as of C++17). I don’t know if their scheduling and stack management is quite as clever. I suspect not, because I keep reading stuff about C++ coroutines  being “stackless”, which I think means that they map directly to kernel threads, while Goroutines are apparently “stackful”.

Goroutines allow cooperative scheduling, via blocking channel sends and receives (see select) and sleeps. So do C++ coroutines, via co_yield, co_return, co_await, futures and generators.

So I’ll mention C++ coroutines later as I show other features of Goroutines and channels. I’d love to know the actual differences, if someone can explain it to me clearly enough that I understand. I really hope the end result is as easy to use as Goroutines.

C++ Coroutines: Getting One Result

For now, let’s look at how a C++ coroutine would let us return a single value, much like we did with std::async() (with a std::future) and with Go’s go keyword (with a channel). We would use a std::future return type for the function’s implementation, use co_await to launch the function, and co_return to return the result into the std::future.

This is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet.

std::future<int> doSomethingThatTakesALongTime() {
  ...
  co_return 5;
}

int main() {
  int answer = co_await doSomethingThatTakesALongTime();
  doOtherStuffInTheMeantime()
  
  // Attempting to get the value blocks until the result is ready.
  // (I'm not sure about this part. Do we need to explicitly wait
  // for the future to have a value?)
  std::cout << "answer: " << answer << '\n';
  return 0;
}

Here the co_await keyword is a little like the go keyword in Go.

Channels: Multiple Results

But channels are a queue of values, not just single values, so you could receive a series of results. Go’s range-based for loop has built-in support for this, as mentioned in section 8.4.2 (Pipelines) in the book. For instance:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan string) {
  for i := 0; i < 10; i++ {
     ...
     c <- response
  }

  c.Close()
}
 
c := make(chan string)
go sendMessagesToOtherSolarSystemsAndGetResponses(c)
doOtherStuffInTheMeantime()
for response := range c {
  fmt.Println(response)
  ...
}

fmt.Println("Not waiting any longer for more responses.")

Notice that, unlike a regular range-based for loop in Go, this gives you the value, not the index (which would be meaningless for a channel). In a regular range-based for loop, you need to get the index and value (maybe using _ for the index) to get the value.

The range-based for loop blocks at each operation, waiting for a new value from the channel, and it will stop automatically when the channel is empty and closed.

As mentioned in section 8.5 (Looping in Parallel), if we return from the main goroutine early, for instance in response to an error, we must make sure to “drain” the channel if we don’t want the other goroutines to be blocked when they try to send, even when using a buffered channel (see below). You could launch a separate goroutine just to do this.

C++ Coroutines: Multiple Results

With C++ coroutines, to return multiple results, we can use co_yield instead of co_return, using a generator instead of a std::future.

Again, this is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet. In particular, I don’t know if there will really be a std::generator in the standard library.

std::generator<std::string> sendMessagesToOtherSolarSystemsAndGetResponses() {
  while (still_waiting) {
    ...
    // "return" one result via the generator.
    co_yield response;
  }

  // TODO: Do we need a co_return somewhere too?
}

int main() {
  auto messages = co_await sendMessagesToOtherSolarSystemsAndGetResponses();

  // The range-based for loop waits for the next message each time.
  // TODO: When does it stop?
  for (auto response : messages) {
    std::cout << response << '\n';
  }

  std::cout << "Not waiting any longer for more responses.\n";
  return 0;
}

So far that makes a generator look like a bit like a Go channel. But Go channels seem to be more versatile, as we will see.

Unidirectional channels

By default channels allow goroutines to both send or receive messages. This is rarely what you want inside a particular goroutine, so you can declare a channel type that is only for sending or only for receiving. You would typically create a normal channel and then the goroutine’s function would declare that it takes a unidirectional channel. A send-only channel is of type chan<-, and a receive only signal is of type <-chan. I do think that syntax is hard to love.

For instance, the function from our previous example really only needs a channel that it can send messages on, not receive them too, so we should change the type of the channel:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan<- string) {
  for i := 0; i < 10; i++ {
    ...
    c <- response
  }

  c.Close()
}

The compiler can then help by preventing a message from being sent on the channel. When we pass the regular channel to the goroutine it will cast implicitly to the unidirectional channel. But we may not cast the unidirectional channel back to a regular channel.

I don’t know if there is some equivalent for this in the world of C++ coroutines and generators. Maybe it is generally unwise to compare C++ generators and Go channels.

Buffered Channels

So far we have created unbuffered channels. When we sent a message on the channel, the sending goroutine blocked until the message was received by the receiving goroutine. And when we tried to receive a message the receiving goroutine blocked until the sending goroutine had put a message in the channel. Therefore, unbuffered channels are also called synchronized channels.

But in our example, it would be better if the sending goroutine could keep adding messages to the channel without waiting for them to be received. We can use a buffered channel instead, specifying the size of the buffer:

c := make(chan string, 100) go sendMessagesToOtherSolarSystemsAndGetResponses(c)

The sending goroutine will then only block when the channel is full, and the receiving goroutine will only block when the channel is empty.

Using an Unbuffered Channel to Signal “done”

Unbuffered channels can be useful to signal that a goroutine has finished. This is mentioned in section 8.4.1 (Unbuffered Channels) of the book.

For instance, you will often want the main goroutine to wait for other goroutines to finish instead of stopping the whole program. By using an extra “done” channel, the second goroutine can signal to the main goroutine that is has finished, by sending an item on the “done” channel, and the main goroutine can block on that by attempting to receive from the “done” channel. For instance, here the goroutine does not return any responses so we need another way to signal that we have finished:

func sendMessagesToOtherSolarSystems(done chan struct{}) {
  for i := 0; i < 10; i++ {
    ...
  }

  done -< struct{}{}
}

done := make(chan struct{})
go sendMessagesToOtherSolarSystems(done)
doOtherStuffInTheMeantime()
<- done // This blocks

However, when the goroutine is already returning results via a channel, I believe it is simper to just wait for the channel to be closed, as we did in the previous example by using a range for over the channel.

The book recommends use of struct{} as the channel element, probably because it has no value. A bool seems more more concise, but you’d need a comment saying that the value (true or false) was meaningless. I guess you could also just do a range for over the channel, waiting for it to close, like when we used a channel to return responses, but without ever sending anything.

I feel that this “done” synchronization technique could have a more explicit syntax. I think that would make the code clearer and would mean that various code that does the same thing would have less uninteresting differences. Sync.WaitGroup (see below) seems to be what I want, but this raw “done” channel technique seems more popular for simple cases.

If we were using C++ coroutines, I’d guess we would do the same thing by waiting for the result from a std::future.

Waiting for Multiple Goroutines to Signal “done”

A variation of the “done” channel technique is useful when waiting for multiple goroutines to finish. If we know how many goroutines we have started, we can use a channel to wait for that many “done” messages. This is mentioned in section 8.5 (Looping in Parallel) of the book, though it doesn’t call it a “done” channel there. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, done chan struct{}) {
  ...

  done -< struct{}{}
}

done := make(chan struct{}
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// Wait for as many done messages as goroutines that we started.
// We could have counted them instead, and then counted down here.
for range : solarSystems {
  <- done
}

As section 8.5 (Looping in Parallel) of the book says, if you wanted to return something useful from each goroutine, even just an error, you would use that instead of a struct{}.

Again, I don’t know how we would do this with C++ coroutines.

sync.WaitGroup

As an alternative to the “done” channel technique when waiting for multiple goroutines to finish, Go provides the sync.WaitGroup concurrency-safe  counter. It has Add() and Done() methods and a Wait() method that blocks until the count falls back to zero. This is mentioned at the end of section 8.5 (Looping in Parallel) in the book.

func sendMessageToSolarSystem(coordinates Coordinates, wg sync.WaitGroup) {
  ...

  wg.Done() // Doing it via defer would be safer.
}

var wg sync.WaitGroup
for coord range : solarSystems {
  wg.Add(1)
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

wg.Wait()

Notice that there is no Add() method overload that uses a default of 1, because Go has no function overloads and no default function parameter values.

I don’t know how we’d do the same with C++ coroutines. Even clever and correct use of a std::condition_variable would not be helpful if C++ coroutines do not map directly to real kernel threads (see above).

Using a Channel as a Counting Semaphore

Section 8.8 (Example: Concurrent Directory Traversal) of the book suggests using a buffered channel of a specific size to allocate and release tokens, preventing too many goroutines from doing work at the same time, causing them to block until a token is available. In the example, this limits the use of a constrained resource, preventing the goroutines from opening too many files. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, sema chan struct{}) {
  sema <- struct{}{} // Acquire token.
  defer func() { <- sema }() // Release it later.
  ...
}

// Don't try to send more than 10 messages at a time.
sema := make(chan struct{}, 10)
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// This ignores the need to wait for completion.

I think I would prefer some syntax or API that let me explicitly specify a pool that allows only a specific number of coroutines to be active (or maybe even instantiated). Obviously that would only be useful in simple cases and it’s generally unpleasant to hard-code numbers anyway.

Select: Multiple results, from multiple channels, without blocking

Go has a select/case construct built in to the language just for channels. This is mentioned in section 8.7 (Multiplexing with Select) of the book.

It’s a bit like switch/case. It’s also a bit like the POSIX select() function in C, used to respond to activity on file descriptors, such as network connections. It can wait for activity on one of the specified channels. Or it can repeatedly poll if you supply a default case and put it in a loop.

c1 := make(chan string)
go sendMessagesToExoplanetsViaRadio(c1)
c2 := make(chan string)
go sendMessagesToExoplanetsByBlockingTheSun(c2)
 
for {
  select {
    case response := <- c1:
      fmt.Println("Response to or radio message: %s", response)

    case response := <- c2:
      fmt.Println("Response to or sun message: %s", response)

    default:
      if (too_late) {
        break
      }
      doOtherStuffInTheMeantime()
      // or sleep.
  }
}

fmt.Println("Not waiting any longer for more responses.")

I wonder how C++ could support a similar construct with coroutines and generators.

Canceling a Goroutine by Closing a Channel and Polling

The main goroutine might no longer need the result from a goroutine that it has started. Maybe another goroutine found the answer first, or maybe it was in response to a user request that has been canceled, or a connection that has been disconnected.

(Update: See the comment below, about Contexts, added in Go 1.7.)

Section 8.9 (Cancellation) of the book suggests signaling this by closing a channel, and suggests that the called goroutines should poll for this channel closure via select with a default case (see above). The book calls this channel “done” but I think that confuses this technique with technique that sends a done message over the done channel.

For instance:


// Poll to see if the channel has been closed.
func canceled(canceler chan<- string) bool {
  select {
    case <-canceler:
      return true
    default:
      return false
  }
}
  
func sendMessagesToExoplanetsViaRadio(c chan<- string, canceler chan struct{}) {
   ...

   if canceled(canceller) {
      // Obviously not neat code:
     c <- "Not sent"
   }

   ...
   c <- response
}
  
c1 := make(chan string)
canceler1 := make(chan struct{})
go sendMessagesToExoplanetsViaRadio(c1, canceler1)
c2 := make(chan string)
canceler2 := make(chan struct{})
go sendMessagesToExoplanetsByBlockingTheSun(c2, canceler2)
 
for {
  select {
    case response := <- c1:
      // Cancel the other attempt because we don't need it.
      canceler2.Close();

    case response := <- c2:
      // Cancel the other attempt because we don't need it.
      canceler1.Close();
  }
}

Again, I would like more explicit syntax or API for goroutine cancellation. For me, being able to implement things with channels doesn’t mean that they are the best API for expressing those things.

Communicating Instead of Sharing

Go programs tend to send messages via channels to change (indirectly) shared data and to let goroutines see the latest state of that data. “Do not communicate by sharing memory; instead, share memory by communicating.” is a popular phrase in the Go world.

This allows the use of data to be restricted to a single goroutine, often called a “monitor goroutine”. The locking (or just general lack of races if the Go channel is implemented with lock-free code) is then restricted to the channels, implicitly, with no explicit locking of the data being necessary. There are some examples of this in section 9.1 (Race Conditions) in the book.

It occurs to me that, unlike locking via a mutex, this does not guarantee that all goroutines see the latest state of any shared data. But it does guarantee an order, which is maybe all that usually matters.

Because channels are concurrency-safe, you can even send them as elements of a channel. Section 8.10 (Example: Chat Server) does this so it can use “client” channels as identifications of client goroutines (as well as sending strings over the client channels), keeping track of them via entering and leaving channels and a map of the client channels.

Concurrency-safe containers

We’ve already seen that Go channels are “concurrency-safe”, so you can share them between goroutines without additional locking. So are slices and maps.

Update: No, slices and maps are not concurrency safe. I must have misunderstood something that I read somewhere. This makes the rest of this sub-section irrelevant, though hopefully correct.

Interestingly, the Java standard library moved away from making container types thread safe, because (as I understand it) the developer would generally use them as part of a larger custom data structure that would need its own locking (synchronization) anyway. Java now provides some separate containers explicitly for use with concurrency.

Likewise, in the C++ standard library, types only tend to have locking (for instance std::shared_ptr’s “partial internal synchronization“) when it wouldn’t be possible for the application code to do the locking itself (external synchronization). Presumably, Go programs then tend to have more locking that necessary, and presumably this is Go again choosing convenience and safety over raw performance.

However, I would really like to see a concurrency-safe queue in C++, similar to a Go channel. A lock-free concurrent queue seems like exactly the kind of difficult-to-implement, but easy to use, and broadly useful, thing that belongs in the standard library.

No thread-local storage

Because Goroutines don’t map directly to real threads, Go doesn’t try to provide “thread-local storage”. C++ has this, via its thread_local keyword, but I believe this is not useful, or wise, with C++ coroutines either.

Mutexes

Goroutines are a suitable way to co-operatively schedule work, and communication over channels can avoid the need for explicit locking, but you might still need to explicitly prevent concurrent access of shared data. Mutexes do this in both Go and C++, but feel free to skip to the summary if you only cared about Goroutines and channels.

Go’s sync.Mutex is like std::mutex

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.Mutex

func something() {
  m.Lock()
  defer m.Unlock()

  // do something to the data.
}

which corresponds to this in C++ with std::mutex:

std::mutex m;

void something() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data.h
}

I’ve always wished there was some shorter syntax for that lock_guard in C++.

Note that you can use std::unique_lock, instead of std::lock_guard, if yo need to manually lock and unlock.

Go’s sync.RWMutex is like std::shared_mutex with std::shared_lock

A Read/Write mutex (or “shared exclusive lock”) allows multiple threads to read from shared data, while ensuring that they each see the latest state of that shared data, but only allows one thread at a time to write to the shared data.

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.RWMutex

func somethingThatReads() {
  m.RLock()
  defer m.RUnlock()

  // get something from the data.
}

func somethingThatWrites() {
  m.Lock()
  defer m.Unlock()

  // do something to the data
}

which corresponds to this in C++ with std::shared_mutex:

std::shared_mutex m;

void somethingThatReads() {
  std::shared_lock<std::mutex> lock(our_mutex);

  // get something from the data
}

void somethingThatWrites() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data
}

I do find the Go names more obvious here. They seem to be aimed more at how people will actually use these locks in most situations.

No recursive/re-entrant mutex

Go mutexes, like C++’s s std::mutex and std::shared_mutex, are not re-entrant, so if a thread tries to lock a mutex that it has already locked, there will be a deadlock. C++ does provide a re-entrant mutex via std::recursive_mutex, but actually using it is generally considered rather disgusting and a sign that the code should be restructured to avoid it.

As the Go book wisely says. “The purpose of a mutex is to ensure that certain invariants of the shared variables are maintained at critical points during program execution. One of the invariants is “no goroutine is accessing the shared variables,” but there may be additional invariants specific to the data structures that the mutex guards. When a goroutine acquires a mutex lock, it may assume that the invariants hold. While it holds the lock, it may update the shared variables so that the invariants are temporarily violated. However, when it releases the lock, it must guarantee that order has been restored and the invariants hold once again. Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.”

Summary

Go’s concurrency features are another example of its choosing safety and simplicity over potential raw performance, in an area where standard C++ does not yet offer such simplicity even as an option.

Both support standard mutex locking, in much the same way. But otherwise their strengths go in opposite directions:

C++ lets you write lock-free concurrent code so the compiler, with the help of some hardware features, can arrange for the compiled code to never attempt simultaneous changing of your shared data, while ensuring that different threads see changes when necessary. But that is notoriously difficult to get right. Nevertheless it needs to be available so we can benefit from the work of people who know how to use it.

On the other hand, Go lets you write concurrency code that is easier to get right than when just using standard locks There is probably some small performance cost, but it’s likely to actually work and you’ll probably understand how it works. After all, when concurrency code is wrong (C++ makes it easier to be wrong), performance can suffer as well as safety.

I do have some hope that things will get better for C++ when coroutines arrive and are widely understood, but it looks like they alone might not be enough to match Go’s goroutines ad channels.

Kids Coding: Simple Java

Moving Beyond Blocks

I recently got my son (9) started on “real” programming with text files and the command line. Here are some notes about that. It’s still early in this experiment so I can’t claim that it’s a great idea, but its kind of working so far. There are so many web sites and product to get kids started in programming, but very little to move them past the basics into something more expansive. They risk getting obsessed but not then staying obsessed.

I learned programming because that is what you did with computers when I was a child. I kept learning programming because I could make the computer do new and interesting things. The problem now is that computers do amazing things without any programming and it requires massive experience and effort to achieve equally impressive results. So we must create a simpler environment which lets the child feel good about creating and understanding something more straightforward. It should be a path to real programming, not another island in a chain.

My son still hadn’t done much real text-based programming other than some playing with processing.org inside the IDE (which I also recommend) and some Python in CodeCombat. He had previously played a bit with Scratch and Lego Wedo, Hour of Code, Open Roberta (really good if you have the Lego Mindstorms education kit) and Tynker. He’s also done some Lego Mindstorms, but I think that’s an awful programming experience, at least with the non-education kit.

I’ve chosen Java, reluctantly, because I am familiar with it, it’s forgiving and convenient enough, and its language constructs (for loops, functions, etc) are enough like other C-like languages that the experience should be generally transferable. It’s far from ideal, but processing.org helps a little.

I love C++, but it’s clearly not suitable as a first text-based programming language. Neither is C, though I can imagine us trying that sometime. I think Python and Go are worth considering, but Python’s loose type system frustrates me personally and Go doesn’t seem quite mainstream enough.

However, if there is a simple language that you enjoy, whatever it is, I suggest that you use it with your kids. You must have enthusiasm to share and you must help them move past frustrations. If solving problems in the language doesn’t feel interesting to you then it probably won’t seem interesting to them. The overall aim here is to share your own love of programming by rediscovering how you learned programming yourself.

Using the Command Line

First I introduced the Linux command line, presenting it as where the real power is, where you can see and control what’s happening under everything else. (Yes, I know. Nevermind.) I kept is simple, just talking about directories and editing text files. We only covered:

  • Opening the terminal.
  • Seeing where you are with pwd.
  • Seeing what is there with ls.
  • Moving into a directory with cd.
  • Moving back with “cd ..”.
  • Opening a text file with gedit, from the terminal.
  • Using & so we can continue using the terminal after opening the file in gedit.
  • Running our program from the terminal.

Hiding the Boiler Plate

I wanted to mimic the experience of programming in simpler times, or of simple scripting today. At least at first, I wanted us to just write commands into a text file and just run the resulting program, without worrying about main functions or having a parent Java class, without  separate compilation and execution steps, and just using a Println() function instead of System.out.Println(). This let us avoid discussion of functions or classes until later.

I put this in my .bash_aliases file:

processing-run() {
    $HOME/processing/processing-java --sketch="$1" --run
}

so we can just run a program like this:

$ processing-run add

which runs the program in add/add.pde.

The .pde file just contains Java code, but not in any function and not part of any class (processing.org’s “static mode”). However, as soon as you want to implement your own function, you’ll have to put the rest of your code in a void setup() method (processing.org’s “active mode”), still not part of any class.

Using processing.org for this is a bit awkward, so I’m open to more suitable Java-like environments.

Doing Something Useful But Simple

I wanted to gradually explore simple programming techniques such as variables and loops, so I needed something that would just manipulate inputs and provide useful outputs.

I’ve chosen to gradually develop little programs based on the arithmetic techniques kids often learn in school. For instance, they learn an algorithm to add large numbers together using a carry – they just don’t call it an algorithm. They also learn long multiplication and long division, which add layers of complexity. As a bonus, this can demystify the magic that school requires them to repeat.

For instance, this commit history shows how we gradually wrote and improved a little program to add numbers (as strings). It’s missing the really basic code we started with. I used it to introducing new programming ideas along the way, such as:

  • Using variables.
  • Writing strings with quote marks.
  • Looping over the characters in a string (and backwards), particularly with the awkward for loop syntax.
  • Text output (standard out)
  • Discovering the length of a string instead of hard-coding it, to make the code more general.
  • ASCII Character codes. (Ignoring full Unicode for now.)
  • Getting a number (0, 1, 2, …) corresponding to a character (‘0’, ‘1’, ‘2’, …). Talking about types.
  • Using variables and manipulating them. For instance, adding the two numbers for two digits from two “numbers in strings”.
  • Variables in the loop and variables outside the loop. Some variables’ values are forgotten at the end of each time the loop is run, and some values are kept and used the next time.
  • Conditionals with if.
  • Putting code into functions and calling it.
  • Making functions take parameters, to make them more generally useful.
  • Calling functions from functions.
  • Passing an array instead of several parameters of the same type, to make the function more general.

You soon end up with a bunch of code that is big enough to seem overwhelming to a child. So every now and then I printed the code out so we could discuss it together. We talked about ideas such as variables, and setting variables, about functions, function parameters, and how to call functions, marking places in the code code with colored pencils.

Version Control

Programming is an iterative, experimental, activity. So I think version control should be introduced early as long as you can keep it simple.

We wrote the code at the same time on our own computers while sat next to each other. I checked in the changes to GitHub because they don’t allow users younger than 13. Seeing the changes, and their descriptions, should help kids remember the learning experience and remind them where they are on the path they’ve takhen. But it’s a real shame that kids can’t really own their coding history on GitHub, even privately.

A C++ developer looks at Go (the programming language), Part 2: Modularity and Object Orientation

This is the second part of a series. Here are all 3 parts:

In part 1, I looked at the simpler features of Go, such as its general syntax, its basic type system, and its for loop. Here I mention its support for packages and object orientation. As before, I strongly suggest that you read the book to learn about this stuff properly. Again, I welcome any friendly corrections and clarifications.

Overall, I find Go’s syntax for object orientation a bit messy, inconsistent and frequently too implicit, and for most uses I prefer the obviousness of C++’s inheritance hierarchy. But after writing the explanations down, I think I understand it.

I’m purposefully trying not to mention the build system, distribution, or configuration, for now.

Packages

Go code is organized in packages, which are much like Java package names, and a bit like C++ namespaces. The package name is declared at the start of each file:

package foo

And files in other packages should import them like so:

package bar

import (
  "foo"
  "moo"
)

func somefunc() {
  foo.Yadda()
  var a moo.Thing
  ...
}

The package name should match the file’s directory name. This is how the import statements find the packages’ files. You can have multiple files in the same directory that are all part of the same package.

Your “main” package, with your main function, is an exception to this rule. Its directory doesn’t need to be named “main” because you won’t be importing the main package from anywhere.

Go doesn’t seem to allow nested packages, unlike C++ (foo::thing::Yadda) and Java (foo.thing.Yadda), though people seem to work around this by creating separate libraries, which seems awkward.

I don’t know how people should specify the API of a Go package without providing the implementation. C and C++ have header files for this purpose, separating declaration and implementation.

Structs

You can declare structs in go much as in C. For instance:

type Thing struct {
  // Member fields.
  // Notice the lack of the var keyword.
  a int
  B int // See below about symbol visibility
}

var foo Thing
foo.B = 3

var bar Thing = Thing{3}

var goo *Thing = new(Thing)
goo.B = 5

As usual, I have used the var form to demonstrate the actual type, but you would probably want to use the short := form.

Notice that we can create it as a value or as a pointer (using the built-in new() function), though in Go, unlike C or C++, this does not determine whether its actual memory will be on the stack or the heap. The compiler decides, generally based on whether the memory needs to outlive the function call.

Previously we’ve seen the built-in make() function used to instantiate slices and maps (and we’ll see it in part 3 with channels). make() is only for those built-in types. For our own types, we can use the new() function. I find the distinction a bit messy, but I generally dislike the whole distinction between built-in types and types that can be implemented using the language itself. I like how the C++ standard library is implemented in C++, with very little special support from the language itself when something is added to the library.

Go types often have “constructor” functions (not methods) which you should call to properly instantiate the type, but I don’t think there is any way to enforce correct initialization like a default constructor in C++ or Java. For instance:


type Thing struct {
  a int
  name string
  ...
}

func NewThing() *Thing {
  // 100 is a suitable default value for a in this type:
  f := Thing{100, nil}
  return &f
}

// Notice that different "constructors" must have different names,
// because go doesn't have function or method overloading.
func NewThingWithName(name string) *Thing {
  f := Thing{100, name}
  return &f
}

Embedding Structs

You can anonymously “embed” one struct within an other, like so:

type Person struct {
   Name string
}

type Employee struct {
  Person
  Position string
}

var a Employee
a.Name = "bob"
a.Position = "builder"

This feels a bit like inheritance in C++ and Java, but this is just containment. It doesn’t give us any real “is a” meaning and doesn’t give us real polymorphism. For instance, you can do this:

var e = new(Employee)

// Compilation error.
var p *Person = e

// This works instead.
// So if we thought of this as a cast (we probably shouldn't),
// this would mean that we have to explicitly cast to the base class.
var p *Person = e.Person

// This works.
e.methodOnPerson()

// And this works.
// Name is a field in the contained Person struct.
e.Name = 2

// These work too, but the extra qualification is unnecessary.
e.Person.methodOnPerson()

Interfaces, which we’ll see later, do give us some sense of an “is a” meaning.

Methods

Unlike C, but like C++ and Java classes, structs in Go can have methods – functions associated with the struct. But the syntax is a little different than in C++ or Java. Methods are declared outside of the struct declaration, and the association is made by specifying a “receiver” before the function name. For instance, this declares (and implements) a DoSomething method for the Thing struct:

func (t Thing) DoSomething() {
  ...
}

Notice that you have to specify a name for the receiver – there is no built-in “self” or “this” instance name. This feels like an unnecessary invitation to inconsistency.

You can use a pointer type instead, and you’ll have to if you want to change anything about the struct instance:

func (t *Thing) ChangeSomething() {
  t.a = 4
}

Because you should also want to keep your code consistent, you’d therefore want to use a pointer type for all method receivers. So I don’t know why the language lets it ever be a struct value type.

Unlike C++ or Java, this lets you check the instance for nil (Go’s null or nullptr), making it acceptable to call your method on a null instance. This reminds me of how Objective-C happily lets you call a method (“send a message to” in Objective-C terminology) on a nil instance, with no crash, even returning a nil or zero return value. I find that undisciplined in Objective-C, and it bothers me that Go allows this sometimes, but not consistently.

Unlike C++ or Java, you can even associate methods with non struct (non class) types. For instance:

type Meters int
type Feet int

func (Meters) convertToFeet() (Feet) {
  ...
}

Meters m = 10
f := p.convertToFeet()

No equality or comparison operator overloading

In C++, you can overload operator =, !=, <, >, etc, so you can use instances of your type with the regular operators, making your code look tidy:

MyType a = getSomething();
MyType b = getSomethingElse();
if (a == b) {
  ...
}

You can’t do that in Go (or Java, though it has the awkward Comparable interface and equals() method). Only some built-in types are comparable in Go – the numeric types, string, pointers, or channels, or structs or arrays made up of these types. This is an issue when dealing with interfaces, which we’ll see later.

Symbol Visibility: Uppercase or lowercase first letter

Symbols (types, functions, variables) that start with an uppercase letter are available from outside the package. Struct methods and member variables that start with an uppercase letter are available from outside the struct. Otherwise they are private to the package or struct.

For instance:

type Thing int // This type will be available outside of the package.
var Thingleton Thing// This variable will be available outside of the package.

type thing int // Not available outside of the package.
var thing1 thing // Not available outside of the package.
var thing2 Thing // Not available outside of the package.

// Available outside of the package.
func DoThing() {
  ...
}

// Not available outside of the package.
func doThing() {
  ...
}

type Stuff struct {
  Thing1 Thing // Available outside of the package.
  thing2 Thing // "private" to the struct.
}

// Available outside of the struct.
func (s Stuff) Foo() {
  ...
}

// Not available outside of the struct.
func (s Stuff) bar() {
  ...
}

// Not available outside of the package.
type localstuff struct {
...
}

I find this a bit strange. I prefer the explicit public and private keywords in C++ and Java.

Interfaces

Interfaces have methods

If two Go types satisfy an interface then they both have the methods of that interface. This is similar to Java interfaces. A Go interface is also a bit like a completely abstract class in C++ (having only pure virtual methods), but it’s also a lot like a C++ concept (not yet in C++, as of C++17). For instance:

type Shape interface {
  // The interface's methods.
  // Note the lack of the func keyword.
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
  DrawOnSurface(s Surface)
}

type Rectangle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (r *Rectangle) SetPosition(x int, y int) {
  ...
}

func (r *Rectangle) GetPosition() (x int, y int) {
  ...
}
func (r *Rectangle) DrawOnSurface(s Surface) {
   ...
}

// Other methods:
func (r *Rectangle) setCornerType(c CornerType) {
   ...
}
func (r *Rectangle) cornerType() (CornerType) {
   ...
}

type Circle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (c *Circle) SetPosition(x int, y int) {
  ...
}

func (c *Circle) GetPosition() (x int, y int) {
  ...
}

func (c *Circle) DrawOnSurface(s Surface) {
  ...
}

// Other methods:
...

You can then use the interface type instead of the specific “concrete” type:

var someCircle *Circle = new(Circle)
var s Shape = someCircle
s.DrawOnSurface(someSurface)

Notice that we use a Shape, not a *Shape (pointer to Shape), even though we are casting from a *Circle (pointer to circle). “Interface values” seem to be implicitly pointer-like, which seems unnecessarily confusing. I guess it would feel more consistent if pointers to interfaces just had the same behaviour as these “interface values”, even if the language had to disallow interface types that weren’t pointers.

Types satisfy interfaces implicitly

However, there is no explicit declaration that a type should implement an interface.

In this way Go interfaces are like C++ concepts, though C++ concepts are instead a purely compile-time feature for use with generic (template) code. Your class can conform to a C++ concept without you declaring that it does. And therefore, like Go interfaces, you can, if you must, use an existing type without changing it.

The compiler still checks that types are compatible, but presumably by checking the types’ list of methods rather than checking a class hierarchy or list of implemented interfaces. For instance:

var a *Circle = new(Circle)
var b Shape = a // OK. The compiler can check that Circle has Shape's methods.

Like C++ with dynamic_cast, Go can also check at runtime. For instance, you can check if one interface value refers to an instance that also satisfies another interface:

// Sometimes the Shape (our interface type) is also a Drawable
// (another interface type), sometimes not.
var a Shape = Something.GetShape()

// Notice that we want to cast to a Drawable, not a *Drawable,
// because Drawable is an interface.
var b = a.(Drawable) // Panic (crash) if this fails.

var b, ok = a.(Drawable) // No panic.
if ok {
  b.DrawOnSurface(someSurface)
}

Or we can check that an interface value refers to a particular concrete type. For instance:

// Get Shape() returns an interface value.
// Shape is our interface.
var a Shape = Something.GetShape()

// Notice that we want to cast to a *Thing, not a Thing,
// because Thing is a concrete type, not an interface.
var b = a.(*Thing) // Panic (crash) if this fails.

var b, ok = a.(*Thing) // No panic.
if ok {
  b.DoSomething()
}

Runtime dispatch

Interface methods are also like C++ virtual methods (or any Java method), and interface variables are also like instances of polymorphic base classes. To actually call the interface’s method via an interface variable, the program needs to examine its actual type at runtime and call that type’s specific method. Maybe, as with C++, the compiler can sometimes optimize away that indirection.

This is obviously not as efficient as directly calling a method, identified at compile time, of a templated type in a C++ template. But it is obviously much simpler.

Comparing interfaces

Interface values can be compared sometimes, but this seems like a risky business. Interface values are:

  • Not equal if their types are different.
  • Not equal if their types are the same and only one is nil.
  • Equal if their types are the same, and the types are comparable (see above), and their values are equal.

But if the types are the same, yet those types are not comparable, Go will cause a “panic” at runtime.

Wishing for an implements keyword

In C++ you can, if you wish, explicitly declare that a class should conform to the concept, or you can explicitly derive from a base class, and in Java you must use the “implements” keyword. Not having this with Go would take some getting used to. I’d want these declarations to document my architecture, explicitly showing what’s expected of my”concrete” classes in terms of their general purpose instead of just expressing their that by how some other code happens to use them. Not having this feels fragile.

The book suggests putting this awkward code somewhere to check that a type really implements an interface. Note the use of _ to mean that we don’t need to keep a named variable for the result.

var _ MyInterface = (*MyType)(nil)

The compiler should complain that the conversion is impossible if the type does not satisfy the interface. I think it would be wise this as the very minimum of testing, particularly if your package is providing types that are not really used in the package itself. For me, this is a poor substitute for an obvious compile-time check, using a specific language construct, on the type itself.

Interface embedding

Embedding an interface in an interface

Go has no notion of inheritance hierarchies, but you can “embed” one interface in another, to indicate that a class that satisfies one interface also satisfies the other. For instance:

type Positionable interface {
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
}

type Drawable interface {
  drawOnSurface(s Surface) }
}

type Shape interface {
  Positionable
  Drawable
}

To satisfy the Shape interface, any type must also satisfy the Drawable and Positionable interfaces. Therefore, any type that satisfies the Shape interface can be used with a method associated with the Drawable or Positionable interfaces. So it’s a bit like a Java interface extending another interface.

Embedding an interface-satisfying struct in a struct

We saw earlier how you can embed one struct in another anonymously. If the contained struct implements an interface, then the containing struct then also implements that interface, with no need for manually-implemented forwarding methods. For instance:

type Drawable interface {
 drawOnSurface(s Surface)
}

type Painter struct {
  ...
}

// Make Painter satisfy the Drawable interface.
func (p *Painter) drawOnSurface(s Surface) {
  ...
}

type Circle struct {
 // Make Circle satisfy the Drawable interface via Painter.
 Painter
 ...
}

func main() {
  ...
  var c *Circle = new(Circle)
 
  // This is OK.
  // Circle satisfies Drawable, via Painter
  c.drawOnSurface(someSurface)

  // This is also OK.
  // Circle can be used as an interface value of type Drawable, via Painter.
  var d Drawable = c
  d.drawOnSurface(someSurface)
}

This again feels a bit like inheritance.

I actually quite like how the (interfaces of the) anonymously contained structs affect the interface of the parent struct, even with Go’s curious interface system, though I wish the syntax was more obvious about what is happening. It might be nice to have something similar in C++. Encapsulation instead of inheritance (and the Decorator pattern) is a perfectly valid technique, and C++ generally tries to let you do things in multiple ways without having an opinion about what’s best, though that can itself be a source of complexity. But in C++ (and Java), you currently have to hand-code lots of forwarding methods to achieve this and you still need to inherit from something to tell the type system that you support the encapsulated type’s interface.

 

A C++ developer looks at Go (the programming language), Part 1: Simple Features

This is the first part of a series. Here are all 3 parts:

I’m reading “The Go Programming Language” by Brian Kernighan and Alan Donovan. It is a perfect programming language introduction, clearly written and perfectly structured, with nicely chosen examples. It contains no hand-waving – it’s aware of other languages and briefly acknowledges the choices made in the language design without lengthy discussion.

As an enthusiastic C++ developer, and a Java developer, I’m not a big fan of the overall language. It seems like an incremental improvement on C, and I’d rather use it than C, but I still yearn for the expressiveness of C++. I also suspect that Go cannot achieve the raw performance of C or C++ due to its safety features, though that maybe depends on compiler optimization. But it’s perfectly valid to knowingly choose safety over performance, particularly if you get more safety and more performance than with Java.

I would choose Go over C++ for a simple proof of concept program using concurrency and networking. Goroutines and channels, which I’ll mention in a later post, are convenient abstractions, and Go has standard API for HTTP requests. Concurrency is hard, and it’s particularly easy to choose safety over performance when writing network code.

Here are some of my superficial observations about the simpler features, which mostly seem like straightforward improvements on C. In part 2 I’ll mention the higher-level features and I’ll hopefully do a part 3 about concurrency. I strongly recommend that you read the book to understand these issues properly.

I welcome friendly corrections and clarifications. There are surely several mistakes here, hopefully none major.

No semicolons at the end of lines

Let’s start with the most superficial thing. Unlike C, C++, or Java, Go doesn’t need semicolons at the end of lines of code. So this is normal:

a = b
c = d

This is nicer for people learning their first programming language. It can take a while for those semicolons to become a natural habit.

No () parentheses with if and for

Here’s another superficial difference. Unlike C or Java, Go doesn’t put its conditions inside parentheses with if and for. That’s another small change that feels arbitrary and makes C coders feel less comfortable.

For instance, in Go we might write this:

for i := 0; i < 100; i++ {
  ...
}

if a == 2 {
  ...
}

Which in C would look like this:

for (int i = 0; i < 100; i++) {
  ...
}

if (a == 2) {
  ...
}

Type inference

Go has type inference, from literal values or from function return values, so you don’t need to restate types that the compiler should know about. This is a bit like C++’s auto keyword (since C++11). For instance:

var a = 1 // An int.
var b = 1.0 // A float64.
var c = getThing()

There’s also a := syntax that avoids the need for var, though I don’t see the need for both in the language:

a := 1 // An int.
b := 1.0 // A float64
d := getThing()

I love type inference via auto in modern C++, and find it really painful to use any language that doesn’t have this. Java feels increasingly verbose in comparison, but maybe Java will get there. I don’t see why C can’t have this. After all, they eventually allowed variables to be declared not just at the start of functions, so change is possible.

Types after names

Go has types after the variable/parameter/function names, which feels rather arbitrary, though I guess there are reasons, and personally I can adapt. So, in C you’d have

Foo foo = 2;

but in Go you’d have

var foo Foo = 2

Keeping a more C-like syntax would have eased C developers into the language. These are often not people who embrace even small changes in the language.

No implicit conversions

Go doesn’t have implicit conversions between types, such as int and uint, or floats and int. This also applies to comparison via == and !=.

So, these won’t compile:

var a int = -2
var b uint = a
var c int = b

var d float64 = 1.345
var e int = c

C compiler warnings can catch some of these, but a) People generally don’t turn on all these warnings, and they don’t turn on warnings as errors, and b) the warnings are not this strict.

Notice that Go has the type after the variable (or parameter, or function) name, not before.

Notice that, unlike Java, Go still has unsigned integers. Unlike C++’s standard library, Go uses signed integers for sizes and lengths. Hopefully C++ will get do that too one day.

No implicit conversions because of underlying types

Go doesn’t even allow implicit conversions between types that, in C, would just be typedefs. So, this won’t compile

type Meters int
type Feet int
var a Meters = 100
var b Feet = a

I think I’d like to see this as a warning in C and C++ compilers when using typedef.

However, you are allowed to implicitly assign a literal (untyped) value, which looks like the underlying type, to a typed variable, but you can’t assign from an actual typed variable of the underlying type:

type Meters int
var a Meters = 100 // No problem.

var i int = 100
var b Meters = i // Will not compile.

No enums

Go has no enums. You should instead use const values with the iota keyword. So, while C++ code might have this:

enum class Continent {
  NORTH_AMERICA,
  SOUTH_AMERICA,
  EUROPE,
  AFRICA,
  ...
};

Continent c = Continent::EUROPE;
Continent d = 2; // Will not compile

in Go, you’d have this:

type continent int

const (
  CONTINENT_NORTH_AMERICA continent = iota
  CONTINENT_SOUTH_AMERICA // Also a continent, with the next value via iota.
  CONTINENT_EUROPE // Also a continent, with the next value via iota.
  CONTINENT_AFRICA // Also a continent, with the next value via iota.
)

var c continent = CONTINENT_EUROPE
var d continent = 2 // But this works too.

Notice how, compared to C++ enums, particularly C++11 scoped enums, each value’s name must have an explicit prefix, and the compiler won’t stop you from assigning a literal number to a variable of the enum type. Also, the Go compiler doesn’t treat these as a group of associated values, so it can’t warn you, for instance, if you forget to mention one in a switch/case block.

Switch/Case: No fallthrough by default

In C and C++, you almost always need a break statement at the end of each case block. Otherwise, the code in the following case block will run too. This can be useful, particularly when you want the same code to run in response to multiple values, but it’s not the common case. In Go, you have to add an explicit fallthrough keyword to get this behaviour, so the code is more concise in the general case.

Switch/Case: Not just basic types

In Go, unlike in C and C++, you can switch on any comparable value, not just values known at compile time, such as ints,  enums, or other constexpr values. So you can switch on strings, for instance:

switch str {
  case "foo":
  doFoo()
case "bar":
  doBar()
}

This is convenient and I guess that it is still compiled to efficient machine code when it uses compile-time values. C++ seems to have resisted this convenience because it couldn’t always be as efficient as a standard switch/case, but I think that unnecessarily ties the switch/case syntax to its original meaning in C when people expected to be more aware of the mapping from C code to machine code.

Pointers, but no ->, and no pointer arithmetic

Go has normal types and pointer types, and uses * and & as in C and C++. For instance:

var a thing = getThing();
var p *thing = &a;
var b thing = *p; // Copy a by value, via the p pointer

As in C++, the new keyword returns a pointer to a new instance:

var a *thing = new(thing)
var a thing = new(thing) // Compilation error

This is like C++, but unlike Java, in which any non-fundamental types (not ints or booleans, for instance) are effectively used via a reference (it just looks like a value), which can confuse people at first by allowing inadvertent sharing.

Unlike C++, you can call a method on a value or a pointer using the same dot operator:

var a *Thing = new(Thing) // You wouldn't normally specify the type.
var b Thing = *a
a.foo();
b.foo();

I like this. After all, the compiler knows whether the type is a pointer or a value, so why should it bother me with complaints about a . where there should be a -> or vice-versa? However, along with type inference, this can slightly obscure whether your code is dealing with a pointer (maybe sharing the value with other code) or a value. I’d like to see this in C++, though it would be awkward with smart pointers.

You cannot do pointer arithmetic in Go. For instance, if you have an array, you can’t step through that array by repeatedly adding 1 to a pointer value and dereferencing it. You have to access the array elements by index, which I think involves bounds checking. This avoids some mistakes that can happen in C and C++ code, leading to security vulnerabilities when your code accesses unexpected parts of your application’s memory.

Go functions can take parameters by value or by pointer. This is like C++, but unlike Java, which always takes non-fundamental types by (non const) reference, though it can look to beginner programmers as if they are being copied by value. I’d rather have both options with the code showing clearly what is happening via the function signature, as in C++ or Go.

Like Java, Go has no notion of const pointers or const references. So if your function takes a parameter as a pointer, for efficiency, your compiler can’t stop you from changing the value that it points to. In Java, this is often done by creating an immutable type, and many Java types, such as String, are immutable, so you can’t change them even if you want to. But I prefer language support for constness as in C++, for pointer/reference parameters and for values initialized at runtime. Which leads us to const in Go.

References, sometimes

Go does seem to have references (roughly, pointers that look like values), but only for the built-in slice, map, and channel types.  (See below about slices and maps.) So, for instance, this function can change its input slide parameter, and that change will be visible to the caller, even though the parameter is not declared as a pointer:

func doThing(someSlice []int) {
  someSlice[2] = 3;
}

In C++, this would be more obviously a reference:

void doThing(Thing& someSlice) {
  someSlice[2] = 3;
}

I’m not sure if this is a fundamental feature of the language or just something about how those types are implemented. It seems confusing for just some types to act differently, and I find the explanation a bit hand-wavy. Convenience is nice, but so is consistency.

const

Go’s const keyword is not like const in C (rarely useful) or C++, where it indicates that a variable’s value should not be changed after initialization. It is more like C++’s constexpr keyword (since C++11), which defines values at compile time. So it’s a bit like a replacement for macros via #define in C, but with type safety. For instance:

const pi = 3.14

Notice that we don’t specify a type for the const value, so the value can be used with various types depending on the syntax of the value, a bit like a C macro #define. But we can restrict it by specifying a type:

const pi float64 = 3.14

Unlike constexpr in C++, there is no concept of constexpr functions or types that can be evaluated at compile time, so you can’t do this:

const pi = calculate_pi()

and you can’t do this

type Point struct {
  X int
  Y int
}

const point = Point{1, 2}

though you can do this with a simple type whose underlying type can be const:

type Yards int
const length Yards = 100

Only for loops

All loops in Go are for loops – there are no while or do-while loops. This simplifies the language in one way compared, for instance, to C, C++, or Java, though there are now multiple forms of for loop.

For instance:

for i := 0; i < 100; i++ {
  ...
}

or, like a while loop in C:

for keepGoing {
  ...
}

And for loops have a range-based syntax for containers such as string, slices or maps, which I’ll mention later:

for i, c := range things {
  ...
}

C++ has range-based for loops too, since C++11, but I like that Go can (optionally) give you the index as well as the value. (It gives you the index, or the index and the value, letting you ignore the index with the _ variable name.)

A native (Unicode) String type

Go has a built-in string type, and built in comparison operators such as ==, !=, and < (as does Java). Like Java, Strings are immutable, so you can’t change them after you’ve created them, though you can create new Strings by concatenating other Strings with the built in operator +. For instance:

str1 := "foo"
str2 := str1 + "bar"

Go source code is always UTF-8 encoded and string literals may contain non-ASCII utf-8 code points. Go calls Unicode code points “runes”.

Although the built-in len() function returns the number of bytes, and the built in operator [] for strings operates on bytes, there is a utf8 package for dealing with strings as runes (Unicode code points). For instance:

str := "foo"
l := utf8.RuneCountInString(str)

And the range-based for loop deals in runes, not bytes:

str := "foo"
for _, r := range str {
  fmt.Println("rune: %q", r)
}

C++ still has no standard equivalent.

Slices

Go’s slices are a bit like dynamically-allocated arrays in C, though they are really views of an underlying array, and two slices can be views into different parts of the same underlying array. They feel a bit like std::string_view from C++17, or GSL::span, but they can be resized easily, like std::vector in C++17 or ArrayList in Java.

We can declare a span like so, and append to it:

a := []int{5, 4, 3, 2, 1} // A slice
a = append(a, 0)

Arrays (whose size cannot change, unlike slices) have a very similar syntax:

a := [...]int{5, 4, 3, 2, 1} // An array.
b := [5]int{5, 4, 3, 2, 1} // Another array.

You must be careful to pass arrays to functions by pointer, or they will be (deep) copied by value.

Slices are not (deep) comparable, or copyable, unlike std::array or std::vector in C++, which feels rather inconvenient.

Slices don’t grow beyond their capacity (which can be more than their current length) when you append values. To do that you must manually create a new slice and copy the old slice’s elements into it. You can keep a pointer to an element in a slice (really to the element in the underlying array). So, as with maps (below), the lack of resizing is probably to remove any possibility of an pointer becoming invalid.

The built in append() function may allocate a bigger underlying array if it would need more than the existing capacity (which can be more than the current length). So you should always assign the result of append() like so:

a = append(a, 123)

I don’t think you can keep a pointer to an element in a slice. If you could, the garbage collection system would need to keep the previous underlying array around until you had stopped using that pointer.

Unlike C or C++ arrays, and unlike operator [] with std::vector, attempting to access an invalid index of a slice will result in a panic (effectively a crash) rather than just undefined behaviour. I prefer this, though I imagine that the bounds checking has some small performance cost.

Maps

Go has a built-in map type. This is roughly equivalent to C++’s std::map (balanced binary trees), or std::unordered_map (hash tables). Go maps are apparently hash tables but I don’t know if they are separate-chaining hash tables (like std::unordered_map) or open-addressing hash tables (like nothing in standard C++ yet, unfortunately).

Obviously, keys in hash tables have to be hashable and comparable. The book mentions comparability, but so few things are comparable that they would all be easily hashable too. Only basic types (int, float64, string, etc, but not slices) or structs made up only of basic types are comparable, so that’s all you can use as a key. You can get around this by using a basic type (such as an int or string) that is (or can be made into) a hash of your value. I prefer C++’s need for a std::hash<> specialization, though I wish it was easier to write one.

Unlike C++’, you can’t keep a pointer to an element in a map, so changing one part of the value means copying the whole value back into the map, presumably with another lookup. Go apparently does this to completely avoid the problem of invalid pointers when the map has to grow. C++ instead lets you take the risk, specifying when your pointer could become invalid.

Go maps are clearly a big advantage over C, where you otherwise have to use some third-party data structure or write your own, typically with very little type safety.

They look like this:

m := make(map[int]string)
m[3] = "three"
m[4] = "four"

Multiple return values

Functions in Go can have multiple return types, which I find more obvious then output parameters. For instance:

func getThings() (int, Foo) {
  return 2, getFoo()
}

a, b := getThings()

This is a bit like returning tuples in modern C++, particularly with structured bindings in C++17:

std::tuple<int, Foo> get_things() {
  return make_tuple(2, get_foo());
}

auto [i, f] = get_things();

Garbage Collection

Like Java, Go has automatic memory management, so you can trust that instances will not be released until you have finished using them, and you don’t need to explicitly release them. So you can happily do this, without worrying about releasing the instance later:

func getThing() *Thing {
  a := new(Thing)
  ...
  return a
}

b := getThing()
b.foo()

And you can even do this, not caring, and not easily even knowing, whether the instance was created on the stack or the heap:

func getThing() *Thing {
  var a Thing
  ...
  return &a
}

b := getThing()
b.foo()

I don’t know how Go avoids circular references or unwanted “leak” references, as Java or C++ would with weak references.

I wonder how, or if, Go avoids Java’s problem with intermittent slowdowns due to garbage collection. Go seems to be aimed at system-level code, so I guess it must do better somehow.

However, also like Java, and probably like all garbage collection, this is only useful for managing memory, not resources in general. The programmer is usually happy to have memory released some time after the code has finished using it, not necessarily immediately. But other resources, such as file descriptors and database connections, need to be released immediately. Some things, such as mutex locks, often need to be released at the end of an obvious scope. Destructors make this possible. For instance, in C++:

void Something::do_something() {
  do_something_harmless();

  {
    std::lock_guard<std::mutex> lock(our_mutex);
    change_some_shared_state();
  }
  
  do_something_else_harmless();
}

Go can’t do this, so it has defer() instead, letting you specify something to happen whenever a function ends. It’s a annoying that defer is associated with functions, not to scopes in general.

func something() {
  doSomethingHarmless()

  ourMutex.Lock()
  defer ourMutex.Unlock()
  changeSomeSharedState()

  // The mutex has not been released yet when this remaining code runs,
  // so you'd want to restrict the use of the resource (a mutex here) to
  // another small function, and just call it in this function.
  doSomethingElseHarmless()
}

This feels like an awkward hack, like Java’s try-with-resources.

I would prefer to see a language that somehow gives me all of scoped resource management (with destructors), reference-counting (like std::shared_ptr<>) and garbage collection, in a concise syntax, so I can have predictable, obvious, but reliable, resource releasing when necessary, and garbage collection when I don’t care.

Of course, I’m not pretending that memory management is easy in C++. When it’s difficult it can be very difficult. So I do understand the choice of garbage collection. I just expect a system level language to offer more.

Things I don’t like in Go

As well as the minor syntactic annoyances mentioned above, and the lack of simple generic resource (not just memory) management, I have a couple of other frustrations with the language.

(I’m not loving the support for object orientation either, but I’ll mention that in a later article when I’ve studied it more.)

No generics

Go’s focus on type safety, particularly for numeric types, makes the lack of generics surprising. I can remember how frustrating it was to use Java before generics, and this feels almost that awkward. Without generics I soon find myself having to choose between lack of type safety or repeatedly reimplementing code for each type, feeling like I’m fighting the language.

I understand that generics are difficult to implement, and they’d have to make a choice about how far to take them (probably further than Java, but not as far as C++), and I understand that Go would then be much more than a better C. But I think generics are inevitable once, like Go, you pursue static type safety.

Somehow go’s slice and map containers are generic, probably because they are built-in types.

Lack of standard containers

Go has no queue or stack in its standard library. In C++, I use std::queue and std::stack regularly. I think these would need generics. People can use go’s slice (a dynamically-allocated array) to achieve the same things, and you can wrap that up in your own type, but your type, can only contain specific types, so you’ll be reimplementing this for every type. Or your container can hold interface{} types (apparently a bit like a Java Object or a C++ void*), giving up (static) type safety.