Category Archives: General

Google App Engine: Using subdomains

Separating Frontend and Backend App Engine Services into Separate Subdomains

I’m in the middle of re-implementing my bigoquiz.com website into separate frontend (client) and backend (server) projects, which I’m deploying to separate App Engine “services”. Previously both the front end HTML and JavaScript, and the backend Java, were all in one (GWT) project, deployed together as one default App Engine service. I chose to separate the two services into their own subdomains. So the frontend is currently served from beta.bigoquiz.com and its JavaScript code uses the REST API provided by the backend, served from betaapi.bigoquiz.com. This will later be bigoquiz.com and api.bigoquiz.com.

This seemed wise at the time and I thought it was necessary but in hindsight it was an unnecessary detour. I would have been better off just mapping different paths to the services. I didn’t see how to do this at first with Google AppEngine, but it’s very easy. You just provide a dispatch.yaml file (also mentioned below) to map URL prefixes to the different services. For instance, /api/* would map to the backend service, and everything else would map to the frontend service. Larger systems might do this with a service gateway such as Netflix’s Zuul.

Nevertheless, this post mentions some of the issues you’ll have if you choose to split your frontend and backend into separate subdomains. It mentions App Engine, Go, and Angular (TypeScript), but the ideas are generally useful.

App Engine Service Names

Each service that you deploy to App Engine has an app.yaml file. This can have a line to name the service, which would otherwise just be called “default”. For instance, the first line of my frontend’s app.yaml file currently looks like this:

service: beta

The first line of my backend’s app.yaml file currently looks like this:

service: betaapi

When I deploy these services, via “gcloud app deploy” on the command line, I can then see them listed in the Google Cloud console, in the App Engine section, in the Services section.

Mapping Subdomains to App Engine Services

In the Google Cloud console, in the App Engine section, in the settings section, in the “custom domains” tab, you should add each subdomain. You’ll need to verify your ownership of each subdomain by adding DNS entries for Google to check.

App Engine very recently added the “Managed Security” feature, which automatically creates and manages (LetsEncrypt) SSL certificates, letting you serve content via HTTPS. Using this currently makes using subdomains more complicated, because it doesn’t yet support wildard SSL certificates. That’s likely to be possible soon when LetsEncrypt starts providing wildcards SSL certificates, so this section might become outdated.

Without Managed Security

If you aren’t using Managed Security yet, mapping subdomains to services is quite simple. Google’s documentation suggests that you just add a wildcard CNAME entry to the DNS for your domain, like so:

Record: *
 Type: CNAME
 Value: ghs.googlehosted.com

All subdomains will then be served by google. App Engine will try to map a subdomain to a service of the same name. So foo.example.com will map to a service named foo.

With Managed Security

However, if you are using Managed Security, you’ll need to tell App Engine about each individual subdomain so each subdomain can have its own SSL certificate. Unfortunately, you’ll then need to add the same 8 A and AAAA records to your DNS for your subdomain too.

Although App Engine does automatic subdomain-to-service mapping for wildcard domains, this doesn’t happen with specifically-listed subdomains. You’ll need to specify the mapping manually using a dispatch.yaml file, like so:

dispatch:
  - url: "beta.bigoquiz.com/*"
    service: beta
  - url: "betaapi.bigoquiz.com/*"
    service: betaapi

You can then deploy the dispatch.yaml file from the command line like so:

$ gcloud app deploy dispatch.yaml

I wish there was some way to split these dispatch rules up into separate files, so I could associate them with the separate codebases for the separate services. For now, I’ve put the dispatch.yaml file for all the services in the repository for my frontend code and I manually deploy it.

CORS (Cross Origin Resource Sharing)

By default, modern browsers will not allow Javascript that came from www.example.com (or example.com) to make HTTP requests to another domain or subdomain, such as api.example.com. This same-origin policy prevents malicious pages from accessing data on other sites, possibly via your authentication token.

If you try to access a subdomain from JavaScript served from a different subdomain, you’ll see error messages about this in the browser console, such as:

No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'beta.example.com' is therefore not allowed access.

For instance, JavaScript served from foo.example.com cannot normally access content at bar.example.com.

However, you can allow calls to a subdomain by using the CORS system. The browser will attempt the call, but will only provide the response back to the JavaScript if the response has the appropriate Allowed* headers. The browser may first attempt a separate CORS “pre-flight” request before actually issuing the request, depending on the request details.

If you configure your server to reply to a CORS requests with appropriate AllowedOrigins and AllowedMethods HTTP headers, you can tell the browser to allow the JavaScript to make the HTTP requests and receive the responses.

For instance, in Go, I’ve used rs/cors to respond to CORS requests, like so, passing in the original julienschmidt/httprouter that I’m using.

c := cors.New(cors.Options{
	AllowedOrigins: []string{"example.com},
	AllowedMethods: []string{"GET", "POST", "OPTIONS"},	
})

handler := c.Handler(router)
http.Handle("/", handler)

I also did this in my original Java code by adding a ContainerResponseFilter, annotated with @Provider.

Cookies: CORS

Even when the server responds to CORS requests with AllowedOrigins and AllowedMethods headers, by default the browser will not allow Javascript to send cookies  when it sends HTTP requests to other domains (or subdomains). But you can allow this by adding an AllowCredentials header to the server’s CORS response. For instance, I added the AllowCredentials header in Go on the server side, like so:

c := cors.New(cors.Options{
  	...
	AllowCredentials: true,})

You might need to specify this on the client-side too, because the underlying XMLHttpRequest defaults to not sending cookies with cross-site requests. For instance, I specify withCredentials in Angular (Typescript) calls to http.get(), like so:

this.http.get(url, {withCredentials: true})

Note Angular’s awful documentation for the withCredentials option, though Mozilla’s documentation for the XMLHttpRequest withCredentials option is clearer.

Cookies: Setting the Domain

To use a cookie across subdomains, for instance to send a cookie to a domain other than the one that provided the cookie, you may need to set the cookie’s domain, which makes the cookie available to all subdomains in the domain. Otherwise, the cookie will be available only to the specific subdomain that set it.

I didn’t need to do this because I had just one service on one subdomain. This subdomain sets the cookie in responses, the cookie is then stored by the browser, and the browser provides the cookie in subsequent requests to the same subdomain.

OAuth2 Callback

If your subdomain implements endpoints for oauth2 login and callback, you’ll need to tell App Engine about the subdomain. In the Google Cloud console, in the “APIs & Services” section, go to the Credentials section. Here you should enter the subdomain for your web page under “Authorized JavaScript origins”, and enter the subdomain for your oauth2 login and callback subdomain under “Authorized redirect URIs”.

The subdomain will then be listed appropriately in the configuration file that you download via the “Download JSON” link, which you can parse in your code, so that the oauth2 request specifies your callback URL. For instance, I parse the downloaded config .json file in Go using google.ConfigFromJSON() from the golang.org/x/oauth2/google package, like so:

func GenerateGoogleOAuthConfig(r *http.Request) *oauth2.Config {
	c := appengine.NewContext(r)

	b, err := ioutil.ReadFile(configCredentialsFilename)
	if err != nil {
		log.Errorf(c, "Unable to read client secret file (%s): %v", configCredentialsFilename, err)
		return nil
	}

	config, err := google.ConfigFromJSON(b, credentialsScopeProfile, credentialsScopeEmail)
	if err != nil {
		log.Errorf(c, "Unable to parse client secret file (%) to config: %v", configCredentialsFilename, err)
		return nil
	}

	return config
}

GWT: Changing from GWT-RPC to JSON with RestyGWT

GWT but with REST and JSON

My bigoquiz.com website uses my gwt-bigoquiz project, based on Java and GWT. One advantage of GWT is the serialization between the server and client. Using GWT-RPC, you can use the same Java classes on the server and the (compiled to JavaScript) browser client. (Well, at least when it works – when it doesn’t work it can be very hard to find out why). For instance, your Java service can have a getThing() method which your client code can call asynchronously, getting a Thing. It avoids duplication and avoids manually writing code to parse an intermediate format and recreate objects.

However, GWT-RPC ties you into using GWT on both the server and client. I would rather have a standard REST API that serves, and receives, JSON. I could then experiment with alternative implementations independently on the server side or the client side.

RestyGWT, together with Jersey, makes this quite easy. I have now ported gwt-bigoquiz from GWT-RPC to RestyGWT and Jersey. Here I describe how you might do that for your project.

You might also like to watch David Chandler’s talk about RestyGWT and Jersey at GWT.Create 2015 (slides, GitHub), which covers the same stuff in a little more detail, using slightly older versions.

Build

We need to add the following dependencies to our pom.xml file, so we can use RestyGWT, Jersey, Jackson and the JAX-RS annotations.

I’ve added comments to explain why we need each dependency, based on my best guesses. Please let me know about any errors you find. pom.xml files too often just contain cargo-culted blocks of XML pasted from blog entries or StackOverflow answers, far removed from whoever first wrote them. Still, for later versions of these dependencies, or different environments, you might need a different combination of versions or different workarounds.

<!-- For JAX-RS annotations, such as @GET, @POST, @Path, @PathParam, @QueryParam, etc -->
<dependency>
	<groupId>javax.ws.rs</groupId>
	<artifactId>javax.ws.rs-api</artifactId>
	<version>2.1-m09</version>
	<!-- Note: 2.0 gives us a ClassNotFoundException about RxInvokerProvider when we try to GET from the URL. -->
</dependency>

<!-- For client-side code to query a REST/JSON server API. This uses methods and classes annotated with JAX-RS annotations, doing serialization/deserialization with Jackson. -->
<dependency>
	<groupId>org.fusesource.restygwt</groupId>
	<artifactId>restygwt</artifactId>
	<version>2.2.0</version>
</dependency>

<!-- For JSON serialization/deserialization based on the JAX-RS annotations. Used by RestyGWT on the client side. Also adds some annotations such as @JsonInclude and @JsonIgnore, -->
<dependency>
	<groupId>com.fasterxml.jackson.jaxrs</groupId>
	<artifactId>jackson-jaxrs-json-provider</artifactId>
	<version>2.8.9</version>
</dependency>

<!-- To serve REST/JSON queries of REST resources.
	 A <servlet> tag in the web.xml file indicates that Jersey should use certain classes as REST resources/servelets.
	 These REST resources/servlets use Java methods and classes annotated with JAX-RS and Jackson annotations.

	 jersey-container-servlet needs the Java Servlet API version 3, supported by the AppEngine Java 8 runtime (currently beta).
	 Alternatively, jersey-container-servlet-core needs the Java Servlet API version 2, supported by the AppEngine Java 7 runtime. -->
<dependency>
	<groupId>org.glassfish.jersey.containers</groupId>
	<artifactId>jersey-container-servlet</artifactId>
	<version>2.26-b07</version>
</dependency>

<!-- To let Jersey use Jackson. -->
<dependency>
	<groupId>org.glassfish.jersey.media</groupId>
	<artifactId>jersey-media-json-jackson</artifactId>
	<version>2.26-b07</version>
</dependency>

<!-- Workaround this error: java.lang.IllegalStateException: InjectionManagerFactory not found. See https://stackoverflow.com/a/44546979/1123654 -->
<dependency>
	<groupId>org.glassfish.jersey.inject</groupId>
	<artifactId>jersey-hk2</artifactId>
	<version>2.26-b07</version>
</dependency>

Configuration

We must edit some XML files so we can use RestyGWT and Jersey.

.gwt.xml Module file

Our .gwt.xml Module definition must mention RestyGWT inside the tag, so our client code can use RestyGWT.

<inherits name="org.fusesource.restygwt.RestyGWT"/>

web.xml file

We must add <servlet> and <servlet-mapping> tags in the web.xml file to state that Jersey should use our classes to serve REST resources as servlets.

<servlet>
    <servlet-name>restServlet</servlet-name>
    <servlet-class>org.glassfish.jersey.servlet.ServletContainer</servlet-class>
    <init-param>
        <param-name>jersey.config.server.provider.packages</param-name>
        <param-value>com.murrayc.myproject.server.api</param-value>
    </init-param>
</servlet>
<servlet-mapping>
    <servlet-name>restServlet</servlet-name>
    <url-pattern>/api/*</url-pattern>
</servlet-mapping>

Server-Side Code Changes

The REST Resource

The classes for our REST resources must be in the package specified in the web.xml file., where Jersey can find them. These correspond roughly to the *ServiceImpl classes, derived from GWT’s RemoteServiceServlet, that we used with GWT-RPC.

These classes don’t need to derive from any special base class or implement any special interface. They do need to be annotated with @Path to specify the first part of the name of the resource. For instance:

@Path("thing")
public class ThingResource {
    ...
}

We then annotate methods of the class with @GET or @POST and @Produces.

Without a @Path annotation on the method, the method matches the base path of the class. For instance, this might return all available Things, when a client GETs from the thing/ URL.

@GET
@Produces("application/json")
public Collection<Thing> get() {
    ...
}

We can use additional @Path annotations on the method. For instance, this might return a Thing with a matching Thing ID, when a client GETs from the thing/123 URL, for instance.

@GET
@Path("/{id}")
@Produces("application/json")
public Thing getById(@PathParam("id") String id) {
    ...
}

Note that the Java method name does not appear in the URL.

The ServletContext

In our GWT-RPC service classes, we could get the ServletContext via getServletConfig().getServletContext(). With our REST classes, we instead use the JAX-RS @Context attribute on a member field, like so:

@Context
ServletContext context;

Jersey apparently then assigns the context at runtime.

Client-Side Code Changes

With GWT-RPC, we had a client interface like this:

public interface ThingServiceAsync {
    void getThings(AsyncCallback<List<Thing>> async);
    ...
}

which we called like so:

AsyncCallback<List<Thing>> callback = new AsyncCallback<List<Thing>>() {
    @Override
    public void onFailure(Throwable caught) {
        ...
    }

    @Override
    public void onSuccess(List<Thing> result) {
        getView().setThings(result);
    }
};

ThingServiceAsync.Util.getInstance().getThings(callback);

With RestyGWT, we’ll instead have a client interface like this:

@Path("/api/thing")
public interface ThingClient extends RestService {
    @GET
    public void getThings(MethodCallback<List<Thing>> callback);
    
    @GET
    @Path("/{id}")
    public void getThing(@PathParam("id") String id, MethodCallback<Thing> callback);
}

And we’ll use it like so:

MethodCallback<List<Thing>> callback = new MethodCallback<List<Thing>>() {
    @Override
    public void onFailure(Method method, Throwable caught) {
        ...
    }

    @Override
    public void onSuccess(Method method, List<Thing> result) {
        getView().setThingList(result);
    }
};

Defaults.setServiceRoot(GWT.getHostPageBaseURL());
ThingClient client = GWT.create(ThingClient.class);
client.getThings(callback);

So to change the GWT-RPC client code to RestyGWT client code, you only need to:

  • Create the client interface.
  • Use it instead of the GWT-RPC Async client class.
  • Change AsyncCallback to MethodCallback.
  • Add the Method parameter to onFailure() and onSuccess().

Parameters

A method of a GWT-RPC service just takes Java parameters. But for a REST service, you need to decide whether these are path parameters, or query parameters. For instance, if the URL is /things/123?color=blue then there is one path parameter (123), and one query parameter (color, with value blue). You would indicate this in your service class like so, using the JAX-RS attributes:

@GET
@Path("/{thing-id}")
@Produces("application/json")
public Thing getById(@PathParam("thing-id") String thingId, @QueryParam("color-id") String colorId) {
    ...
}

and in the client interface like so:

@Path("/api/thing")
public interface ThingClient extends RestService {
    @GET
    @Path("/{thing-id}")
    public void getThing(@PathParam("thing-id") String thingId, @QueryParam("color-id") String colorId, MethodCallback<Thing> callback);
}

Ignoring some Getters

Not all get methods should result in items in the JSON. You can avoid this with the Jackson @JsonIgnore annotation. For instance:

public class Thing {
    ...

    @JsonIgnore
    public getIrrelevantStuff() {
       ...
    }
}

Ignoring null and empty objects

To keep your JSON small, you can use Jackson’s @JsonInclude attribute to stop Jersey from writing name/value pairs with null or empty values. Of course this can obscure the JSON format by hiding its possible contents. For instance:

@JsonInclude(JsonInclude.Include.NON_EMPTY)
public class Thing {
    ...
}

You might choose to use NON_NULL rather than NON_EMPTY.

Adding Setters

If our REST GET method returns an instance of Thing, Jersey will serve a JSON string based on any public get*() methods that take no parameters, and the public fields. So, this class,

public class Thing {
    public String id;
    public getStuff() {
       ...
    }
}

would produce JSON like this, recursing into contained class instances:

{
  "id": "123",
  "stuff" {
     "foo": true,
     "bar": 789
  }
}

However, if there are no corresponding setter functions, such as setStuff(), RestyGWT (or Jackson, which it uses) will silently ignore these parts of the JSON when deserializing the JSON into a client-side Java instance. If there is a way to detect this at runtime, I’d like to know about it.

This makes it particularly important to test the serialization and deserialization. For instance:

@Test
public void ThingJsonTest() throws IOException {
    // The Jackson ObjectMapper:
    ObjectMapper objectMapper = new ObjectMapper();

    // Get the JSON for an object:
    Thing objToWrite = new Thing();
    objToWrite.setFoo(2);
    // etc.
    
    final String json = objectMapper.writeValueAsString(objToWrite);
    assertNotNull(json);
    assertFalse(json.isEmpty());

    // Get an object from the JSON: 
    Thing obj = objectMapper.readValue(json, Thing.class);
    assertNotNull(obj);
    assertEquals(2, obj.getFoo());
    // etc.
}

Removing the GWT-RPC code

Once everything is working, we can remove the unused GWT-RPC classes and configuration:

  • The ThingService interface, which extends RemoteService.
  • The ThingServiceAsync interface.
  • The ThingServiceImpl class.
  • The <servlet> and <servlet-mapping> tags from web.xml

A C++ developer looks at Go (the programming language), Part 3: Concurrency

This is the third of my posts about looking at Go from the perspective of a C++ developer. I think that will be enough for now. Here are all 3 parts:

Here I mention Go’s built-in concurrency features, letting you write code that makes the best use of one, or more, processors. I am not an expert in concurrency, and I think few people are. I’ve had some really helpful friendly feedback about the previous parts from Go developers, so I hope that continues. I will gladly correct any mistakes that are pointed out to me so the text can become more useful.

However, the mentions of C++ coroutines here might also be particularly useless. Clang and gcc don’t support coroutines in their stable releases yet and it’s hard to find definitive explanations. What I can find is contradictory and far from as  simple as Go, probably because the C++ coroutines specification has been in flux for a while. I would very much welcome corrections. Interest in C++ coroutines seems to be heating up now so I guess this will settle down soonish. I think it will then be time for some books to be updated. I hope that the end result is at least as expressive as Go but I am not confident of that so far.

As before, I strongly suggest that you read “The Go Programming Language”, (referred to her as “the book”), from which I’ve learned this stuff. At least at first, the structure of this text is aimed at people familiar with C++ who are interested in Go, but the comparison does break down later.

Goroutines and std::task

Like C++ (since C++11), Go has support for simple concurrency features such as lockable mutexes (which we’ll see later), but it doesn’t let you create raw threads. Instead if offers goroutines.

Goroutines are, at their simplest, functions that can run concurrently with other functions. Two goroutines might even run on different CPUs, meaning that they run truly simultaneously. You can start one just by using the “go” keyword before a normal function call. The function will then execute, without the caller waiting for it to complete.

go doSomethingThatTakesALongTime()
doSomethingElseAtTheSameTime.

While the goroutine does its work, maybe using traditional cooperative multi-threading on the same CPU, maybe on another CPU, the implicit main goroutine can also do work. Update: But careful. As mentioned later, this program won’t wait for the first goroutine to finish if the main goroutine finishes first.

So far, at least, this is a bit like starting a std::task in C++ with std::async() (since C++11):

auto future = std::async(doSomethingThatTakesALongTime)
doSomethingElseAtTheSameTime()
future.wait();

(Update: As James pointed out in a comment, if you don’t keep the returned future, its destructor will run, which will cause a wait, making the call not async.)

Hybrid Threading (m:n Threading)

A std::task in C++ is not guaranteed to start an actual new OS (kernel) thread, unless you pass std::launch::async as an extra argument. It will sometimes, but that’s up to the runtime and the implementation. I think the runtime, in some implementations, might even be able to move tasks to real threads and between threads, but I’m not sure. Scott Meyer’s “Effective Modern C++” has some good discussion of tasks and C++ concurrency in general.

Goroutines are not ambiguous about this. Goroutines don’t map directly to kernel (OS) threads. They use m:n threading, in which user space (the Go runtime, in this case) schedules its own m “threads”, mapping them onto a pool of n kernel threads. This seems to be what is referred to as “Hybrid Threading” in Robert Love’s “Linux System Programming” book – a mixture of 1:1 Threading (just using kernel threads) and N:1 Threading (or User-Level Threading, apparently sometimes called “green threads” or “lightweight threads”.). Presumably there is some very clever scheduling code in Go’s runtime and in the C++ coroutines implementation.

Because Goroutines don’t map directly to real threads, they don’t need the full stack size given to a kernel thread, letting us comfortably use far more goroutines than we would use threads. However, you only get the full benefit of this when the work is happening purely in the Go runtime. The book explains that “Goroutines that are blocked in I/O or other system calls or are calling non-Go functions, do need an OS-thread”. So I guess you still wouldn’t want, for instance, a goroutine per network connection, if you are serving many network clients.

Channels: Getting One Result

In C++, std::async() returns a std::future. This is a way to later wait for the result from the task, or just get the result if it is already ready, having made good use of the original thread’s time in the meantime.

int doSomethingThatTakesALongTime() {
  ...
  return 5;
}
 
auto f = std::async(doSomethingThatTakesALongTime)
doSomethingElseLengthyAtTheSameTime()
auto answer = f.get()

The go keyword doesn’t return a future. (Update: And doesn’t let you use the return value.) You could use the function’s result, but you’d just immediately start waiting for it, defeating the purpose (I wonder why Go allows this at all.)

Instead, to provide a result, a Go program would use a Go channel. Go channels are queues of your chosen elements. Channels are a built-in type, so they are generic – they can contain various types. They can be created like so, using the make() built-in function as for other built-in types:

c := make(chan string)

The Go language has special syntax for sending to, or receiving from, channels via the <- and -> operators, though I don’t see what would have been so bad about having send() and receive() methods.

Go channels are “concurrency safe”, so you can pass one to a goroutine, and both goroutines can happily manipulate it. For instance, to get a response to a long-running function, but only after we’ve tried to make good use of the waiting time:

func doSomethingThatTakesALongTime(c chan int) int {
  ...
  c <- 5
}
 
c := make(chan int)
go doSomethingThatTakesALongTime(c)
doOtherStuffInTheMeantime()
answer := <- c

That attempt to receive from the channel (with the <- operator) will block until the channel has something to provide, a bit like how std::future::get() blocks until the result is ready.

It would be fairly easy to create a future type that uses a channel, then you could write an async method that uses the go keyword and returns the future. But I think it would be nice if Go made this simple case simpler by doing this for us. Maybe it does.

I’m also a little uncomfortable with the calling goroutine (the main goroutine here) having to create the channel, repeating the type of that channel (it’s also in the called goroutines function signature). I think it would be nice if there was some syntax that gave us a new instance of the channel (Ready for receiving. See about unidirectional channels later.) that the called goroutine expected to get, while also allowing us to use an existing channel if necessary. That would be hard to get right, of course.

If the channel is empty and has been closed then it will then provide a zero value immediately. Or you can get the extra result to check if the channel is still open or still has values:

answer, ok := <- c
if (!ok) {
  // The channel is empty and has been closed.
}

Goroutines and C++ Coroutines

Goroutines feel a little like C++ coroutines (not yet in C++ as of C++17). I don’t know if their scheduling and stack management is quite as clever. I suspect not, because I keep reading stuff about C++ coroutines  being “stackless”, which I think means that they map directly to kernel threads, while Goroutines are apparently “stackful”.

Goroutines allow cooperative scheduling, via blocking channel sends and receives (see select) and sleeps. So do C++ coroutines, via co_yield, co_return, co_await, futures and generators.

So I’ll mention C++ coroutines later as I show other features of Goroutines and channels. I’d love to know the actual differences, if someone can explain it to me clearly enough that I understand. I really hope the end result is as easy to use as Goroutines.

C++ Coroutines: Getting One Result

For now, let’s look at how a C++ coroutine would let us return a single value, much like we did with std::async() (with a std::future) and with Go’s go keyword (with a channel). We would use a std::future return type for the function’s implementation, use co_await to launch the function, and co_return to return the result into the std::future.

This is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet.

std::future<int> doSomethingThatTakesALongTime() {
  ...
  co_return 5;
}

int main() {
  int answer = co_await doSomethingThatTakesALongTime();
  doOtherStuffInTheMeantime()
  
  // Attempting to get the value blocks until the result is ready.
  // (I'm not sure about this part. Do we need to explicitly wait
  // for the future to have a value?)
  std::cout << "answer: " << answer << '\n';
  return 0;
}

Here the co_await keyword is a little like the go keyword in Go.

Channels: Multiple Results

But channels are a queue of values, not just single values, so you could receive a series of results. Go’s range-based for loop has built-in support for this, as mentioned in section 8.4.2 (Pipelines) in the book. For instance:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan string) {
  for i := 0; i < 10; i++ {
     ...
     c <- response
  }

  c.Close()
}
 
c := make(chan string)
go sendMessagesToOtherSolarSystemsAndGetResponses(c)
doOtherStuffInTheMeantime()
for response := range c {
  fmt.Println(response)
  ...
}

fmt.Println("Not waiting any longer for more responses.")

Notice that, unlike a regular range-based for loop in Go, this gives you the value, not the index (which would be meaningless for a channel). In a regular range-based for loop, you need to get the index and value (maybe using _ for the index) to get the value.

The range-based for loop blocks at each operation, waiting for a new value from the channel, and it will stop automatically when the channel is empty and closed.

As mentioned in section 8.5 (Looping in Parallel), if we return from the main goroutine early, for instance in response to an error, we must make sure to “drain” the channel if we don’t want the other goroutines to be blocked when they try to send, even when using a buffered channel (see below). You could launch a separate goroutine just to do this.

C++ Coroutines: Multiple Results

With C++ coroutines, to return multiple results, we can use co_yield instead of co_return, using a generator instead of a std::future.

Again, this is based on various other blog posts and slides that I’ve seen. I don’t know if this is really correct because I don’t have a compiler that actually supports this yet. In particular, I don’t know if there will really be a std::generator in the standard library.

std::generator<std::string> sendMessagesToOtherSolarSystemsAndGetResponses() {
  while (still_waiting) {
    ...
    // "return" one result via the generator.
    co_yield response;
  }

  // TODO: Do we need a co_return somewhere too?
}

int main() {
  auto messages = co_await sendMessagesToOtherSolarSystemsAndGetResponses();

  // The range-based for loop waits for the next message each time.
  // TODO: When does it stop?
  for (auto response : messages) {
    std::cout << response << '\n';
  }

  std::cout << "Not waiting any longer for more responses.\n";
  return 0;
}

So far that makes a generator look like a bit like a Go channel. But Go channels seem to be more versatile, as we will see.

Unidirectional channels

By default channels allow goroutines to both send or receive messages. This is rarely what you want inside a particular goroutine, so you can declare a channel type that is only for sending or only for receiving. You would typically create a normal channel and then the goroutine’s function would declare that it takes a unidirectional channel. A send-only channel is of type chan<-, and a receive only signal is of type <-chan. I do think that syntax is hard to love.

For instance, the function from our previous example really only needs a channel that it can send messages on, not receive them too, so we should change the type of the channel:

func sendMessagesToOtherSolarSystemsAndGetResponses(c chan<- string) {
  for i := 0; i < 10; i++ {
    ...
    c <- response
  }

  c.Close()
}

The compiler can then help by preventing a message from being sent on the channel. When we pass the regular channel to the goroutine it will cast implicitly to the unidirectional channel. But we may not cast the unidirectional channel back to a regular channel.

I don’t know if there is some equivalent for this in the world of C++ coroutines and generators. Maybe it is generally unwise to compare C++ generators and Go channels.

Buffered Channels

So far we have created unbuffered channels. When we sent a message on the channel, the sending goroutine blocked until the message was received by the receiving goroutine. And when we tried to receive a message the receiving goroutine blocked until the sending goroutine had put a message in the channel. Therefore, unbuffered channels are also called synchronized channels.

But in our example, it would be better if the sending goroutine could keep adding messages to the channel without waiting for them to be received. We can use a buffered channel instead, specifying the size of the buffer:

c := make(chan string, 100) go sendMessagesToOtherSolarSystemsAndGetResponses(c)

The sending goroutine will then only block when the channel is full, and the receiving goroutine will only block when the channel is empty.

Using an Unbuffered Channel to Signal “done”

Unbuffered channels can be useful to signal that a goroutine has finished. This is mentioned in section 8.4.1 (Unbuffered Channels) of the book.

For instance, you will often want the main goroutine to wait for other goroutines to finish instead of stopping the whole program. By using an extra “done” channel, the second goroutine can signal to the main goroutine that is has finished, by sending an item on the “done” channel, and the main goroutine can block on that by attempting to receive from the “done” channel. For instance, here the goroutine does not return any responses so we need another way to signal that we have finished:

func sendMessagesToOtherSolarSystems(done chan struct{}) {
  for i := 0; i < 10; i++ {
    ...
  }

  done -< struct{}{}
}

done := make(chan struct{})
go sendMessagesToOtherSolarSystems(done)
doOtherStuffInTheMeantime()
<- done // This blocks

However, when the goroutine is already returning results via a channel, I believe it is simper to just wait for the channel to be closed, as we did in the previous example by using a range for over the channel.

The book recommends use of struct{} as the channel element, probably because it has no value. A bool seems more more concise, but you’d need a comment saying that the value (true or false) was meaningless. I guess you could also just do a range for over the channel, waiting for it to close, like when we used a channel to return responses, but without ever sending anything.

I feel that this “done” synchronization technique could have a more explicit syntax. I think that would make the code clearer and would mean that various code that does the same thing would have less uninteresting differences. Sync.WaitGroup (see below) seems to be what I want, but this raw “done” channel technique seems more popular for simple cases.

If we were using C++ coroutines, I’d guess we would do the same thing by waiting for the result from a std::future.

Waiting for Multiple Goroutines to Signal “done”

A variation of the “done” channel technique is useful when waiting for multiple goroutines to finish. If we know how many goroutines we have started, we can use a channel to wait for that many “done” messages. This is mentioned in section 8.5 (Looping in Parallel) of the book, though it doesn’t call it a “done” channel there. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, done chan struct{}) {
  ...

  done -< struct{}{}
}

done := make(chan struct{}
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// Wait for as many done messages as goroutines that we started.
// We could have counted them instead, and then counted down here.
for range : solarSystems {
  <- done
}

As section 8.5 (Looping in Parallel) of the book says, if you wanted to return something useful from each goroutine, even just an error, you would use that instead of a struct{}.

Again, I don’t know how we would do this with C++ coroutines.

sync.WaitGroup

As an alternative to the “done” channel technique when waiting for multiple goroutines to finish, Go provides the sync.WaitGroup concurrency-safe  counter. It has Add() and Done() methods and a Wait() method that blocks until the count falls back to zero. This is mentioned at the end of section 8.5 (Looping in Parallel) in the book.

func sendMessageToSolarSystem(coordinates Coordinates, wg sync.WaitGroup) {
  ...

  wg.Done() // Doing it via defer would be safer.
}

var wg sync.WaitGroup
for coord range : solarSystems {
  wg.Add(1)
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

wg.Wait()

Notice that there is no Add() method overload that uses a default of 1, because Go has no function overloads and no default function parameter values.

I don’t know how we’d do the same with C++ coroutines. Even clever and correct use of a std::condition_variable would not be helpful if C++ coroutines do not map directly to real kernel threads (see above).

Using a Channel as a Counting Semaphore

Section 8.8 (Example: Concurrent Directory Traversal) of the book suggests using a buffered channel of a specific size to allocate and release tokens, preventing too many goroutines from doing work at the same time, causing them to block until a token is available. In the example, this limits the use of a constrained resource, preventing the goroutines from opening too many files. For instance:

func sendMessageToSolarSystem(coordinates Coordinates, sema chan struct{}) {
  sema <- struct{}{} // Acquire token.
  defer func() { <- sema }() // Release it later.
  ...
}

// Don't try to send more than 10 messages at a time.
sema := make(chan struct{}, 10)
for coord range : solarSystems {
  go sendMessageToSolarSystem(coord, done)
}
doOtherStuffInTheMeantime()

// This ignores the need to wait for completion.

I think I would prefer some syntax or API that let me explicitly specify a pool that allows only a specific number of coroutines to be active (or maybe even instantiated). Obviously that would only be useful in simple cases and it’s generally unpleasant to hard-code numbers anyway.

Select: Multiple results, from multiple channels, without blocking

Go has a select/case construct built in to the language just for channels. This is mentioned in section 8.7 (Multiplexing with Select) of the book.

It’s a bit like switch/case. It’s also a bit like the POSIX select() function in C, used to respond to activity on file descriptors, such as network connections. It can wait for activity on one of the specified channels. Or it can repeatedly poll if you supply a default case and put it in a loop.

c1 := make(chan string)
go sendMessagesToExoplanetsViaRadio(c1)
c2 := make(chan string)
go sendMessagesToExoplanetsByBlockingTheSun(c2)
 
for {
  select {
    case response := <- c1:
      fmt.Println("Response to or radio message: %s", response)

    case response := <- c2:
      fmt.Println("Response to or sun message: %s", response)

    default:
      if (too_late) {
        break
      }
      doOtherStuffInTheMeantime()
      // or sleep.
  }
}

fmt.Println("Not waiting any longer for more responses.")

I wonder how C++ could support a similar construct with coroutines and generators.

Canceling a Goroutine by Closing a Channel and Polling

The main goroutine might no longer need the result from a goroutine that it has started. Maybe another goroutine found the answer first, or maybe it was in response to a user request that has been canceled, or a connection that has been disconnected.

(Update: See the comment below, about Contexts, added in Go 1.7.)

Section 8.9 (Cancellation) of the book suggests signaling this by closing a channel, and suggests that the called goroutines should poll for this channel closure via select with a default case (see above). The book calls this channel “done” but I think that confuses this technique with technique that sends a done message over the done channel.

For instance:


// Poll to see if the channel has been closed.
func canceled(canceler chan<- string) bool {
  select {
    case <-canceler:
      return true
    default:
      return false
  }
}
  
func sendMessagesToExoplanetsViaRadio(c chan<- string, canceler chan struct{}) {
   ...

   if canceled(canceller) {
      // Obviously not neat code:
     c <- "Not sent"
   }

   ...
   c <- response
}
  
c1 := make(chan string)
canceler1 := make(chan struct{})
go sendMessagesToExoplanetsViaRadio(c1, canceler1)
c2 := make(chan string)
canceler2 := make(chan struct{})
go sendMessagesToExoplanetsByBlockingTheSun(c2, canceler2)
 
for {
  select {
    case response := <- c1:
      // Cancel the other attempt because we don't need it.
      canceler2.Close();

    case response := <- c2:
      // Cancel the other attempt because we don't need it.
      canceler1.Close();
  }
}

Again, I would like more explicit syntax or API for goroutine cancellation. For me, being able to implement things with channels doesn’t mean that they are the best API for expressing those things.

Communicating Instead of Sharing

Go programs tend to send messages via channels to change (indirectly) shared data and to let goroutines see the latest state of that data. “Do not communicate by sharing memory; instead, share memory by communicating.” is a popular phrase in the Go world.

This allows the use of data to be restricted to a single goroutine, often called a “monitor goroutine”. The locking (or just general lack of races if the Go channel is implemented with lock-free code) is then restricted to the channels, implicitly, with no explicit locking of the data being necessary. There are some examples of this in section 9.1 (Race Conditions) in the book.

It occurs to me that, unlike locking via a mutex, this does not guarantee that all goroutines see the latest state of any shared data. But it does guarantee an order, which is maybe all that usually matters.

Because channels are concurrency-safe, you can even send them as elements of a channel. Section 8.10 (Example: Chat Server) does this so it can use “client” channels as identifications of client goroutines (as well as sending strings over the client channels), keeping track of them via entering and leaving channels and a map of the client channels.

Concurrency-safe containers

We’ve already seen that Go channels are “concurrency-safe”, so you can share them between goroutines without additional locking. So are slices and maps.

Update: No, slices and maps are not concurrency safe. I must have misunderstood something that I read somewhere. This makes the rest of this sub-section irrelevant, though hopefully correct.

Interestingly, the Java standard library moved away from making container types thread safe, because (as I understand it) the developer would generally use them as part of a larger custom data structure that would need its own locking (synchronization) anyway. Java now provides some separate containers explicitly for use with concurrency.

Likewise, in the C++ standard library, types only tend to have locking (for instance std::shared_ptr’s “partial internal synchronization“) when it wouldn’t be possible for the application code to do the locking itself (external synchronization). Presumably, Go programs then tend to have more locking that necessary, and presumably this is Go again choosing convenience and safety over raw performance.

However, I would really like to see a concurrency-safe queue in C++, similar to a Go channel. A lock-free concurrent queue seems like exactly the kind of difficult-to-implement, but easy to use, and broadly useful, thing that belongs in the standard library.

No thread-local storage

Because Goroutines don’t map directly to real threads, Go doesn’t try to provide “thread-local storage”. C++ has this, via its thread_local keyword, but I believe this is not useful, or wise, with C++ coroutines either.

Mutexes

Goroutines are a suitable way to co-operatively schedule work, and communication over channels can avoid the need for explicit locking, but you might still need to explicitly prevent concurrent access of shared data. Mutexes do this in both Go and C++, but feel free to skip to the summary if you only cared about Goroutines and channels.

Go’s sync.Mutex is like std::mutex

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.Mutex

func something() {
  m.Lock()
  defer m.Unlock()

  // do something to the data.
}

which corresponds to this in C++ with std::mutex:

std::mutex m;

void something() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data.h
}

I’ve always wished there was some shorter syntax for that lock_guard in C++.

Note that you can use std::unique_lock, instead of std::lock_guard, if yo need to manually lock and unlock.

Go’s sync.RWMutex is like std::shared_mutex with std::shared_lock

A Read/Write mutex (or “shared exclusive lock”) allows multiple threads to read from shared data, while ensuring that they each see the latest state of that shared data, but only allows one thread at a time to write to the shared data.

We lock/unlock Go’s Mutex, like so, though you can also do it manually instead of using defer.

m sync.RWMutex

func somethingThatReads() {
  m.RLock()
  defer m.RUnlock()

  // get something from the data.
}

func somethingThatWrites() {
  m.Lock()
  defer m.Unlock()

  // do something to the data
}

which corresponds to this in C++ with std::shared_mutex:

std::shared_mutex m;

void somethingThatReads() {
  std::shared_lock<std::mutex> lock(our_mutex);

  // get something from the data
}

void somethingThatWrites() {
  std::lock_guard<std::mutex> lock(our_mutex);

  // do something to the data
}

I do find the Go names more obvious here. They seem to be aimed more at how people will actually use these locks in most situations.

No recursive/re-entrant mutex

Go mutexes, like C++’s s std::mutex and std::shared_mutex, are not re-entrant, so if a thread tries to lock a mutex that it has already locked, there will be a deadlock. C++ does provide a re-entrant mutex via std::recursive_mutex, but actually using it is generally considered rather disgusting and a sign that the code should be restructured to avoid it.

As the Go book wisely says. “The purpose of a mutex is to ensure that certain invariants of the shared variables are maintained at critical points during program execution. One of the invariants is “no goroutine is accessing the shared variables,” but there may be additional invariants specific to the data structures that the mutex guards. When a goroutine acquires a mutex lock, it may assume that the invariants hold. While it holds the lock, it may update the shared variables so that the invariants are temporarily violated. However, when it releases the lock, it must guarantee that order has been restored and the invariants hold once again. Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.”

Summary

Go’s concurrency features are another example of its choosing safety and simplicity over potential raw performance, in an area where standard C++ does not yet offer such simplicity even as an option.

Both support standard mutex locking, in much the same way. But otherwise their strengths go in opposite directions:

C++ lets you write lock-free concurrent code so the compiler, with the help of some hardware features, can arrange for the compiled code to never attempt simultaneous changing of your shared data, while ensuring that different threads see changes when necessary. But that is notoriously difficult to get right. Nevertheless it needs to be available so we can benefit from the work of people who know how to use it.

On the other hand, Go lets you write concurrency code that is easier to get right than when just using standard locks There is probably some small performance cost, but it’s likely to actually work and you’ll probably understand how it works. After all, when concurrency code is wrong (C++ makes it easier to be wrong), performance can suffer as well as safety.

I do have some hope that things will get better for C++ when coroutines arrive and are widely understood, but it looks like they alone might not be enough to match Go’s goroutines ad channels.

Kids Coding: Simple Java

Moving Beyond Blocks

I recently got my son (9) started on “real” programming with text files and the command line. Here are some notes about that. It’s still early in this experiment so I can’t claim that it’s a great idea, but its kind of working so far. There are so many web sites and product to get kids started in programming, but very little to move them past the basics into something more expansive. They risk getting obsessed but not then staying obsessed.

I learned programming because that is what you did with computers when I was a child. I kept learning programming because I could make the computer do new and interesting things. The problem now is that computers do amazing things without any programming and it requires massive experience and effort to achieve equally impressive results. So we must create a simpler environment which lets the child feel good about creating and understanding something more straightforward. It should be a path to real programming, not another island in a chain.

My son still hadn’t done much real text-based programming other than some playing with processing.org inside the IDE (which I also recommend) and some Python in CodeCombat. He had previously played a bit with Scratch and Lego Wedo, Hour of Code, Open Roberta (really good if you have the Lego Mindstorms education kit) and Tynker. He’s also done some Lego Mindstorms, but I think that’s an awful programming experience, at least with the non-education kit.

I’ve chosen Java, reluctantly, because I am familiar with it, it’s forgiving and convenient enough, and its language constructs (for loops, functions, etc) are enough like other C-like languages that the experience should be generally transferable. It’s far from ideal, but processing.org helps a little.

I love C++, but it’s clearly not suitable as a first text-based programming language. Neither is C, though I can imagine us trying that sometime. I think Python and Go are worth considering, but Python’s loose type system frustrates me personally and Go doesn’t seem quite mainstream enough.

However, if there is a simple language that you enjoy, whatever it is, I suggest that you use it with your kids. You must have enthusiasm to share and you must help them move past frustrations. If solving problems in the language doesn’t feel interesting to you then it probably won’t seem interesting to them. The overall aim here is to share your own love of programming by rediscovering how you learned programming yourself.

Using the Command Line

First I introduced the Linux command line, presenting it as where the real power is, where you can see and control what’s happening under everything else. (Yes, I know. Nevermind.) I kept is simple, just talking about directories and editing text files. We only covered:

  • Opening the terminal.
  • Seeing where you are with pwd.
  • Seeing what is there with ls.
  • Moving into a directory with cd.
  • Moving back with “cd ..”.
  • Opening a text file with gedit, from the terminal.
  • Using & so we can continue using the terminal after opening the file in gedit.
  • Running our program from the terminal.

Hiding the Boiler Plate

I wanted to mimic the experience of programming in simpler times, or of simple scripting today. At least at first, I wanted us to just write commands into a text file and just run the resulting program, without worrying about main functions or having a parent Java class, without  separate compilation and execution steps, and just using a Println() function instead of System.out.Println(). This let us avoid discussion of functions or classes until later.

I put this in my .bash_aliases file:

processing-run() {
    $HOME/processing/processing-java --sketch="$1" --run
}

so we can just run a program like this:

$ processing-run add

which runs the program in add/add.pde.

The .pde file just contains Java code, but not in any function and not part of any class (processing.org’s “static mode”). However, as soon as you want to implement your own function, you’ll have to put the rest of your code in a void setup() method (processing.org’s “active mode”), still not part of any class.

Using processing.org for this is a bit awkward, so I’m open to more suitable Java-like environments.

Doing Something Useful But Simple

I wanted to gradually explore simple programming techniques such as variables and loops, so I needed something that would just manipulate inputs and provide useful outputs.

I’ve chosen to gradually develop little programs based on the arithmetic techniques kids often learn in school. For instance, they learn an algorithm to add large numbers together using a carry – they just don’t call it an algorithm. They also learn long multiplication and long division, which add layers of complexity. As a bonus, this can demystify the magic that school requires them to repeat.

For instance, this commit history shows how we gradually wrote and improved a little program to add numbers (as strings). It’s missing the really basic code we started with. I used it to introducing new programming ideas along the way, such as:

  • Using variables.
  • Writing strings with quote marks.
  • Looping over the characters in a string (and backwards), particularly with the awkward for loop syntax.
  • Text output (standard out)
  • Discovering the length of a string instead of hard-coding it, to make the code more general.
  • ASCII Character codes. (Ignoring full Unicode for now.)
  • Getting a number (0, 1, 2, …) corresponding to a character (‘0’, ‘1’, ‘2’, …). Talking about types.
  • Using variables and manipulating them. For instance, adding the two numbers for two digits from two “numbers in strings”.
  • Variables in the loop and variables outside the loop. Some variables’ values are forgotten at the end of each time the loop is run, and some values are kept and used the next time.
  • Conditionals with if.
  • Putting code into functions and calling it.
  • Making functions take parameters, to make them more generally useful.
  • Calling functions from functions.
  • Passing an array instead of several parameters of the same type, to make the function more general.

You soon end up with a bunch of code that is big enough to seem overwhelming to a child. So every now and then I printed the code out so we could discuss it together. We talked about ideas such as variables, and setting variables, about functions, function parameters, and how to call functions, marking places in the code code with colored pencils.

Version Control

Programming is an iterative, experimental, activity. So I think version control should be introduced early as long as you can keep it simple.

We wrote the code at the same time on our own computers while sat next to each other. I checked in the changes to GitHub because they don’t allow users younger than 13. Seeing the changes, and their descriptions, should help kids remember the learning experience and remind them where they are on the path they’ve takhen. But it’s a real shame that kids can’t really own their coding history on GitHub, even privately.

A C++ developer looks at Go (the programming language), Part 2: Modularity and Object Orientation

This is the second part of a series. Here are all 3 parts:

In part 1, I looked at the simpler features of Go, such as its general syntax, its basic type system, and its for loop. Here I mention its support for packages and object orientation. As before, I strongly suggest that you read the book to learn about this stuff properly. Again, I welcome any friendly corrections and clarifications.

Overall, I find Go’s syntax for object orientation a bit messy, inconsistent and frequently too implicit, and for most uses I prefer the obviousness of C++’s inheritance hierarchy. But after writing the explanations down, I think I understand it.

I’m purposefully trying not to mention the build system, distribution, or configuration, for now.

Packages

Go code is organized in packages, which are much like Java package names, and a bit like C++ namespaces. The package name is declared at the start of each file:

package foo

And files in other packages should import them like so:

package bar

import (
  "foo"
  "moo"
)

func somefunc() {
  foo.Yadda()
  var a moo.Thing
  ...
}

The package name should match the file’s directory name. This is how the import statements find the packages’ files. You can have multiple files in the same directory that are all part of the same package.

Your “main” package, with your main function, is an exception to this rule. Its directory doesn’t need to be named “main” because you won’t be importing the main package from anywhere.

Go doesn’t seem to allow nested packages, unlike C++ (foo::thing::Yadda) and Java (foo.thing.Yadda), though people seem to work around this by creating separate libraries, which seems awkward.

I don’t know how people should specify the API of a Go package without providing the implementation. C and C++ have header files for this purpose, separating declaration and implementation.

Structs

You can declare structs in go much as in C. For instance:

type Thing struct {
  // Member fields.
  // Notice the lack of the var keyword.
  a int
  B int // See below about symbol visibility
}

var foo Thing
foo.B = 3

var bar Thing = Thing{3}

var goo *Thing = new(Thing)
goo.B = 5

As usual, I have used the var form to demonstrate the actual type, but you would probably want to use the short := form.

Notice that we can create it as a value or as a pointer (using the built-in new() function), though in Go, unlike C or C++, this does not determine whether its actual memory will be on the stack or the heap. The compiler decides, generally based on whether the memory needs to outlive the function call.

Previously we’ve seen the built-in make() function used to instantiate slices and maps (and we’ll see it in part 3 with channels). make() is only for those built-in types. For our own types, we can use the new() function. I find the distinction a bit messy, but I generally dislike the whole distinction between built-in types and types that can be implemented using the language itself. I like how the C++ standard library is implemented in C++, with very little special support from the language itself when something is added to the library.

Go types often have “constructor” functions (not methods) which you should call to properly instantiate the type, but I don’t think there is any way to enforce correct initialization like a default constructor in C++ or Java. For instance:


type Thing struct {
  a int
  name string
  ...
}

func NewThing() *Thing {
  // 100 is a suitable default value for a in this type:
  f := Thing{100, nil}
  return &f
}

// Notice that different "constructors" must have different names,
// because go doesn't have function or method overloading.
func NewThingWithName(name string) *Thing {
  f := Thing{100, name}
  return &f
}

Embedding Structs

You can anonymously “embed” one struct within an other, like so:

type Person struct {
   Name string
}

type Employee struct {
  Person
  Position string
}

var a Employee
a.Name = "bob"
a.Position = "builder"

This feels a bit like inheritance in C++ and Java, but this is just containment. It doesn’t give us any real “is a” meaning and doesn’t give us real polymorphism. For instance, you can do this:

var e = new(Employee)

// Compilation error.
var p *Person = e

// This works instead.
// So if we thought of this as a cast (we probably shouldn't),
// this would mean that we have to explicitly cast to the base class.
var p *Person = e.Person

// This works.
e.methodOnPerson()

// And this works.
// Name is a field in the contained Person struct.
e.Name = 2

// These work too, but the extra qualification is unnecessary.
e.Person.methodOnPerson()

Interfaces, which we’ll see later, do give us some sense of an “is a” meaning.

Methods

Unlike C, but like C++ and Java classes, structs in Go can have methods – functions associated with the struct. But the syntax is a little different than in C++ or Java. Methods are declared outside of the struct declaration, and the association is made by specifying a “receiver” before the function name. For instance, this declares (and implements) a DoSomething method for the Thing struct:

func (t Thing) DoSomething() {
  ...
}

Notice that you have to specify a name for the receiver – there is no built-in “self” or “this” instance name. This feels like an unnecessary invitation to inconsistency.

You can use a pointer type instead, and you’ll have to if you want to change anything about the struct instance:

func (t *Thing) ChangeSomething() {
  t.a = 4
}

Because you should also want to keep your code consistent, you’d therefore want to use a pointer type for all method receivers. So I don’t know why the language lets it ever be a struct value type.

Unlike C++ or Java, this lets you check the instance for nil (Go’s null or nullptr), making it acceptable to call your method on a null instance. This reminds me of how Objective-C happily lets you call a method (“send a message to” in Objective-C terminology) on a nil instance, with no crash, even returning a nil or zero return value. I find that undisciplined in Objective-C, and it bothers me that Go allows this sometimes, but not consistently.

Unlike C++ or Java, you can even associate methods with non struct (non class) types. For instance:

type Meters int
type Feet int

func (Meters) convertToFeet() (Feet) {
  ...
}

Meters m = 10
f := p.convertToFeet()

No equality or comparison operator overloading

In C++, you can overload operator =, !=, <, >, etc, so you can use instances of your type with the regular operators, making your code look tidy:

MyType a = getSomething();
MyType b = getSomethingElse();
if (a == b) {
  ...
}

You can’t do that in Go (or Java, though it has the awkward Comparable interface and equals() method). Only some built-in types are comparable in Go – the numeric types, string, pointers, or channels, or structs or arrays made up of these types. This is an issue when dealing with interfaces, which we’ll see later.

Symbol Visibility: Uppercase or lowercase first letter

Symbols (types, functions, variables) that start with an uppercase letter are available from outside the package. Struct methods and member variables that start with an uppercase letter are available from outside the struct. Otherwise they are private to the package or struct.

For instance:

type Thing int // This type will be available outside of the package.
var Thingleton Thing// This variable will be available outside of the package.

type thing int // Not available outside of the package.
var thing1 thing // Not available outside of the package.
var thing2 Thing // Not available outside of the package.

// Available outside of the package.
func DoThing() {
  ...
}

// Not available outside of the package.
func doThing() {
  ...
}

type Stuff struct {
  Thing1 Thing // Available outside of the package.
  thing2 Thing // "private" to the struct.
}

// Available outside of the struct.
func (s Stuff) Foo() {
  ...
}

// Not available outside of the struct.
func (s Stuff) bar() {
  ...
}

// Not available outside of the package.
type localstuff struct {
...
}

I find this a bit strange. I prefer the explicit public and private keywords in C++ and Java.

Interfaces

Interfaces have methods

If two Go types satisfy an interface then they both have the methods of that interface. This is similar to Java interfaces. A Go interface is also a bit like a completely abstract class in C++ (having only pure virtual methods), but it’s also a lot like a C++ concept (not yet in C++, as of C++17). For instance:

type Shape interface {
  // The interface's methods.
  // Note the lack of the func keyword.
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
  DrawOnSurface(s Surface)
}

type Rectangle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (r *Rectangle) SetPosition(x int, y int) {
  ...
}

func (r *Rectangle) GetPosition() (x int, y int) {
  ...
}
func (r *Rectangle) DrawOnSurface(s Surface) {
   ...
}

// Other methods:
func (r *Rectangle) setCornerType(c CornerType) {
   ...
}
func (r *Rectangle) cornerType() (CornerType) {
   ...
}

type Circle struct {
  ...
}

// Methods to satisfy the Shape interface.
func (c *Circle) SetPosition(x int, y int) {
  ...
}

func (c *Circle) GetPosition() (x int, y int) {
  ...
}

func (c *Circle) DrawOnSurface(s Surface) {
  ...
}

// Other methods:
...

You can then use the interface type instead of the specific “concrete” type:

var someCircle *Circle = new(Circle)
var s Shape = someCircle
s.DrawOnSurface(someSurface)

Notice that we use a Shape, not a *Shape (pointer to Shape), even though we are casting from a *Circle (pointer to circle). “Interface values” seem to be implicitly pointer-like, which seems unnecessarily confusing. I guess it would feel more consistent if pointers to interfaces just had the same behaviour as these “interface values”, even if the language had to disallow interface types that weren’t pointers.

Types satisfy interfaces implicitly

However, there is no explicit declaration that a type should implement an interface.

In this way Go interfaces are like C++ concepts, though C++ concepts are instead a purely compile-time feature for use with generic (template) code. Your class can conform to a C++ concept without you declaring that it does. And therefore, like Go interfaces, you can, if you must, use an existing type without changing it.

The compiler still checks that types are compatible, but presumably by checking the types’ list of methods rather than checking a class hierarchy or list of implemented interfaces. For instance:

var a *Circle = new(Circle)
var b Shape = a // OK. The compiler can check that Circle has Shape's methods.

Like C++ with dynamic_cast, Go can also check at runtime. For instance, you can check if one interface value refers to an instance that also satisfies another interface:

// Sometimes the Shape (our interface type) is also a Drawable
// (another interface type), sometimes not.
var a Shape = Something.GetShape()

// Notice that we want to cast to a Drawable, not a *Drawable,
// because Drawable is an interface.
var b = a.(Drawable) // Panic (crash) if this fails.

var b, ok = a.(Drawable) // No panic.
if ok {
  b.DrawOnSurface(someSurface)
}

Or we can check that an interface value refers to a particular concrete type. For instance:

// Get Shape() returns an interface value.
// Shape is our interface.
var a Shape = Something.GetShape()

// Notice that we want to cast to a *Thing, not a Thing,
// because Thing is a concrete type, not an interface.
var b = a.(*Thing) // Panic (crash) if this fails.

var b, ok = a.(*Thing) // No panic.
if ok {
  b.DoSomething()
}

Runtime dispatch

Interface methods are also like C++ virtual methods (or any Java method), and interface variables are also like instances of polymorphic base classes. To actually call the interface’s method via an interface variable, the program needs to examine its actual type at runtime and call that type’s specific method. Maybe, as with C++, the compiler can sometimes optimize away that indirection.

This is obviously not as efficient as directly calling a method, identified at compile time, of a templated type in a C++ template. But it is obviously much simpler.

Comparing interfaces

Interface values can be compared sometimes, but this seems like a risky business. Interface values are:

  • Not equal if their types are different.
  • Not equal if their types are the same and only one is nil.
  • Equal if their types are the same, and the types are comparable (see above), and their values are equal.

But if the types are the same, yet those types are not comparable, Go will cause a “panic” at runtime.

Wishing for an implements keyword

In C++ you can, if you wish, explicitly declare that a class should conform to the concept, or you can explicitly derive from a base class, and in Java you must use the “implements” keyword. Not having this with Go would take some getting used to. I’d want these declarations to document my architecture, explicitly showing what’s expected of my”concrete” classes in terms of their general purpose instead of just expressing their that by how some other code happens to use them. Not having this feels fragile.

The book suggests putting this awkward code somewhere to check that a type really implements an interface. Note the use of _ to mean that we don’t need to keep a named variable for the result.

var _ MyInterface = (*MyType)(nil)

The compiler should complain that the conversion is impossible if the type does not satisfy the interface. I think it would be wise this as the very minimum of testing, particularly if your package is providing types that are not really used in the package itself. For me, this is a poor substitute for an obvious compile-time check, using a specific language construct, on the type itself.

Interface embedding

Embedding an interface in an interface

Go has no notion of inheritance hierarchies, but you can “embed” one interface in another, to indicate that a class that satisfies one interface also satisfies the other. For instance:

type Positionable interface {
  SetPosition(x int, y int)
  GetPosition() (x int, y int)
}

type Drawable interface {
  drawOnSurface(s Surface) }
}

type Shape interface {
  Positionable
  Drawable
}

To satisfy the Shape interface, any type must also satisfy the Drawable and Positionable interfaces. Therefore, any type that satisfies the Shape interface can be used with a method associated with the Drawable or Positionable interfaces. So it’s a bit like a Java interface extending another interface.

Embedding an interface-satisfying struct in a struct

We saw earlier how you can embed one struct in another anonymously. If the contained struct implements an interface, then the containing struct then also implements that interface, with no need for manually-implemented forwarding methods. For instance:

type Drawable interface {
 drawOnSurface(s Surface)
}

type Painter struct {
  ...
}

// Make Painter satisfy the Drawable interface.
func (p *Painter) drawOnSurface(s Surface) {
  ...
}

type Circle struct {
 // Make Circle satisfy the Drawable interface via Painter.
 Painter
 ...
}

func main() {
  ...
  var c *Circle = new(Circle)
 
  // This is OK.
  // Circle satisfies Drawable, via Painter
  c.drawOnSurface(someSurface)

  // This is also OK.
  // Circle can be used as an interface value of type Drawable, via Painter.
  var d Drawable = c
  d.drawOnSurface(someSurface)
}

This again feels a bit like inheritance.

I actually quite like how the (interfaces of the) anonymously contained structs affect the interface of the parent struct, even with Go’s curious interface system, though I wish the syntax was more obvious about what is happening. It might be nice to have something similar in C++. Encapsulation instead of inheritance (and the Decorator pattern) is a perfectly valid technique, and C++ generally tries to let you do things in multiple ways without having an opinion about what’s best, though that can itself be a source of complexity. But in C++ (and Java), you currently have to hand-code lots of forwarding methods to achieve this and you still need to inherit from something to tell the type system that you support the encapsulated type’s interface.

 

A C++ developer looks at Go (the programming language), Part 1: Simple Features

This is the first part of a series. Here are all 3 parts:

I’m reading “The Go Programming Language” by Brian Kernighan and Alan Donovan. It is a perfect programming language introduction, clearly written and perfectly structured, with nicely chosen examples. It contains no hand-waving – it’s aware of other languages and briefly acknowledges the choices made in the language design without lengthy discussion.

As an enthusiastic C++ developer, and a Java developer, I’m not a big fan of the overall language. It seems like an incremental improvement on C, and I’d rather use it than C, but I still yearn for the expressiveness of C++. I also suspect that Go cannot achieve the raw performance of C or C++ due to its safety features, though that maybe depends on compiler optimization. But it’s perfectly valid to knowingly choose safety over performance, particularly if you get more safety and more performance than with Java.

I would choose Go over C++ for a simple proof of concept program using concurrency and networking. Goroutines and channels, which I’ll mention in a later post, are convenient abstractions, and Go has standard API for HTTP requests. Concurrency is hard, and it’s particularly easy to choose safety over performance when writing network code.

Here are some of my superficial observations about the simpler features, which mostly seem like straightforward improvements on C. In part 2 I’ll mention the higher-level features and I’ll hopefully do a part 3 about concurrency. I strongly recommend that you read the book to understand these issues properly.

I welcome friendly corrections and clarifications. There are surely several mistakes here, hopefully none major.

No semicolons at the end of lines

Let’s start with the most superficial thing. Unlike C, C++, or Java, Go doesn’t need semicolons at the end of lines of code. So this is normal:

a = b
c = d

This is nicer for people learning their first programming language. It can take a while for those semicolons to become a natural habit.

No () parentheses with if and for

Here’s another superficial difference. Unlike C or Java, Go doesn’t put its conditions inside parentheses with if and for. That’s another small change that feels arbitrary and makes C coders feel less comfortable.

For instance, in Go we might write this:

for i := 0; i < 100; i++ {
  ...
}

if a == 2 {
  ...
}

Which in C would look like this:

for (int i = 0; i < 100; i++) {
  ...
}

if (a == 2) {
  ...
}

Type inference

Go has type inference, from literal values or from function return values, so you don’t need to restate types that the compiler should know about. This is a bit like C++’s auto keyword (since C++11). For instance:

var a = 1 // An int.
var b = 1.0 // A float64.
var c = getThing()

There’s also a := syntax that avoids the need for var, though I don’t see the need for both in the language:

a := 1 // An int.
b := 1.0 // A float64
d := getThing()

I love type inference via auto in modern C++, and find it really painful to use any language that doesn’t have this. Java feels increasingly verbose in comparison, but maybe Java will get there. I don’t see why C can’t have this. After all, they eventually allowed variables to be declared not just at the start of functions, so change is possible.

Types after names

Go has types after the variable/parameter/function names, which feels rather arbitrary, though I guess there are reasons, and personally I can adapt. So, in C you’d have

Foo foo = 2;

but in Go you’d have

var foo Foo = 2

Keeping a more C-like syntax would have eased C developers into the language. These are often not people who embrace even small changes in the language.

No implicit conversions

Go doesn’t have implicit conversions between types, such as int and uint, or floats and int. This also applies to comparison via == and !=.

So, these won’t compile:

var a int = -2
var b uint = a
var c int = b

var d float64 = 1.345
var e int = c

C compiler warnings can catch some of these, but a) People generally don’t turn on all these warnings, and they don’t turn on warnings as errors, and b) the warnings are not this strict.

Notice that Go has the type after the variable (or parameter, or function) name, not before.

Notice that, unlike Java, Go still has unsigned integers. Unlike C++’s standard library, Go uses signed integers for sizes and lengths. Hopefully C++ will get do that too one day.

No implicit conversions because of underlying types

Go doesn’t even allow implicit conversions between types that, in C, would just be typedefs. So, this won’t compile

type Meters int
type Feet int
var a Meters = 100
var b Feet = a

I think I’d like to see this as a warning in C and C++ compilers when using typedef.

However, you are allowed to implicitly assign a literal (untyped) value, which looks like the underlying type, to a typed variable, but you can’t assign from an actual typed variable of the underlying type:

type Meters int
var a Meters = 100 // No problem.

var i int = 100
var b Meters = i // Will not compile.

No enums

Go has no enums. You should instead use const values with the iota keyword. So, while C++ code might have this:

enum class Continent {
  NORTH_AMERICA,
  SOUTH_AMERICA,
  EUROPE,
  AFRICA,
  ...
};

Continent c = Continent::EUROPE;
Continent d = 2; // Will not compile

in Go, you’d have this:

type continent int

const (
  CONTINENT_NORTH_AMERICA continent = iota
  CONTINENT_SOUTH_AMERICA // Also a continent, with the next value via iota.
  CONTINENT_EUROPE // Also a continent, with the next value via iota.
  CONTINENT_AFRICA // Also a continent, with the next value via iota.
)

var c continent = CONTINENT_EUROPE
var d continent = 2 // But this works too.

Notice how, compared to C++ enums, particularly C++11 scoped enums, each value’s name must have an explicit prefix, and the compiler won’t stop you from assigning a literal number to a variable of the enum type. Also, the Go compiler doesn’t treat these as a group of associated values, so it can’t warn you, for instance, if you forget to mention one in a switch/case block.

Switch/Case: No fallthrough by default

In C and C++, you almost always need a break statement at the end of each case block. Otherwise, the code in the following case block will run too. This can be useful, particularly when you want the same code to run in response to multiple values, but it’s not the common case. In Go, you have to add an explicit fallthrough keyword to get this behaviour, so the code is more concise in the general case.

Switch/Case: Not just basic types

In Go, unlike in C and C++, you can switch on any comparable value, not just values known at compile time, such as ints,  enums, or other constexpr values. So you can switch on strings, for instance:

switch str {
  case "foo":
  doFoo()
case "bar":
  doBar()
}

This is convenient and I guess that it is still compiled to efficient machine code when it uses compile-time values. C++ seems to have resisted this convenience because it couldn’t always be as efficient as a standard switch/case, but I think that unnecessarily ties the switch/case syntax to its original meaning in C when people expected to be more aware of the mapping from C code to machine code.

Pointers, but no ->, and no pointer arithmetic

Go has normal types and pointer types, and uses * and & as in C and C++. For instance:

var a thing = getThing();
var p *thing = &a;
var b thing = *p; // Copy a by value, via the p pointer

As in C++, the new keyword returns a pointer to a new instance:

var a *thing = new(thing)
var a thing = new(thing) // Compilation error

This is like C++, but unlike Java, in which any non-fundamental types (not ints or booleans, for instance) are effectively used via a reference (it just looks like a value), which can confuse people at first by allowing inadvertent sharing.

Unlike C++, you can call a method on a value or a pointer using the same dot operator:

var a *Thing = new(Thing) // You wouldn't normally specify the type.
var b Thing = *a
a.foo();
b.foo();

I like this. After all, the compiler knows whether the type is a pointer or a value, so why should it bother me with complaints about a . where there should be a -> or vice-versa? However, along with type inference, this can slightly obscure whether your code is dealing with a pointer (maybe sharing the value with other code) or a value. I’d like to see this in C++, though it would be awkward with smart pointers.

You cannot do pointer arithmetic in Go. For instance, if you have an array, you can’t step through that array by repeatedly adding 1 to a pointer value and dereferencing it. You have to access the array elements by index, which I think involves bounds checking. This avoids some mistakes that can happen in C and C++ code, leading to security vulnerabilities when your code accesses unexpected parts of your application’s memory.

Go functions can take parameters by value or by pointer. This is like C++, but unlike Java, which always takes non-fundamental types by (non const) reference, though it can look to beginner programmers as if they are being copied by value. I’d rather have both options with the code showing clearly what is happening via the function signature, as in C++ or Go.

Like Java, Go has no notion of const pointers or const references. So if your function takes a parameter as a pointer, for efficiency, your compiler can’t stop you from changing the value that it points to. In Java, this is often done by creating an immutable type, and many Java types, such as String, are immutable, so you can’t change them even if you want to. But I prefer language support for constness as in C++, for pointer/reference parameters and for values initialized at runtime. Which leads us to const in Go.

References, sometimes

Go does seem to have references (roughly, pointers that look like values), but only for the built-in slice, map, and channel types.  (See below about slices and maps.) So, for instance, this function can change its input slide parameter, and that change will be visible to the caller, even though the parameter is not declared as a pointer:

func doThing(someSlice []int) {
  someSlice[2] = 3;
}

In C++, this would be more obviously a reference:

void doThing(Thing& someSlice) {
  someSlice[2] = 3;
}

I’m not sure if this is a fundamental feature of the language or just something about how those types are implemented. It seems confusing for just some types to act differently, and I find the explanation a bit hand-wavy. Convenience is nice, but so is consistency.

const

Go’s const keyword is not like const in C (rarely useful) or C++, where it indicates that a variable’s value should not be changed after initialization. It is more like C++’s constexpr keyword (since C++11), which defines values at compile time. So it’s a bit like a replacement for macros via #define in C, but with type safety. For instance:

const pi = 3.14

Notice that we don’t specify a type for the const value, so the value can be used with various types depending on the syntax of the value, a bit like a C macro #define. But we can restrict it by specifying a type:

const pi float64 = 3.14

Unlike constexpr in C++, there is no concept of constexpr functions or types that can be evaluated at compile time, so you can’t do this:

const pi = calculate_pi()

and you can’t do this

type Point struct {
  X int
  Y int
}

const point = Point{1, 2}

though you can do this with a simple type whose underlying type can be const:

type Yards int
const length Yards = 100

Only for loops

All loops in Go are for loops – there are no while or do-while loops. This simplifies the language in one way compared, for instance, to C, C++, or Java, though there are now multiple forms of for loop.

For instance:

for i := 0; i < 100; i++ {
  ...
}

or, like a while loop in C:

for keepGoing {
  ...
}

And for loops have a range-based syntax for containers such as string, slices or maps, which I’ll mention later:

for i, c := range things {
  ...
}

C++ has range-based for loops too, since C++11, but I like that Go can (optionally) give you the index as well as the value. (It gives you the index, or the index and the value, letting you ignore the index with the _ variable name.)

A native (Unicode) String type

Go has a built-in string type, and built in comparison operators such as ==, !=, and < (as does Java). Like Java, Strings are immutable, so you can’t change them after you’ve created them, though you can create new Strings by concatenating other Strings with the built in operator +. For instance:

str1 := "foo"
str2 := str1 + "bar"

Go source code is always UTF-8 encoded and string literals may contain non-ASCII utf-8 code points. Go calls Unicode code points “runes”.

Although the built-in len() function returns the number of bytes, and the built in operator [] for strings operates on bytes, there is a utf8 package for dealing with strings as runes (Unicode code points). For instance:

str := "foo"
l := utf8.RuneCountInString(str)

And the range-based for loop deals in runes, not bytes:

str := "foo"
for _, r := range str {
  fmt.Println("rune: %q", r)
}

C++ still has no standard equivalent.

Slices

Go’s slices are a bit like dynamically-allocated arrays in C, though they are really views of an underlying array, and two slices can be views into different parts of the same underlying array. They feel a bit like std::string_view from C++17, or GSL::span, but they can be resized easily, like std::vector in C++17 or ArrayList in Java.

We can declare a span like so, and append to it:

a := []int{5, 4, 3, 2, 1} // A slice
a = append(a, 0)

Arrays (whose size cannot change, unlike slices) have a very similar syntax:

a := [...]int{5, 4, 3, 2, 1} // An array.
b := [5]int{5, 4, 3, 2, 1} // Another array.

You must be careful to pass arrays to functions by pointer, or they will be (deep) copied by value.

Slices are not (deep) comparable, or copyable, unlike std::array or std::vector in C++, which feels rather inconvenient.

Slices don’t grow beyond their capacity (which can be more than their current length) when you append values. To do that you must manually create a new slice and copy the old slice’s elements into it. You can keep a pointer to an element in a slice (really to the element in the underlying array). So, as with maps (below), the lack of resizing is probably to remove any possibility of an pointer becoming invalid.

The built in append() function may allocate a bigger underlying array if it would need more than the existing capacity (which can be more than the current length). So you should always assign the result of append() like so:

a = append(a, 123)

I don’t think you can keep a pointer to an element in a slice. If you could, the garbage collection system would need to keep the previous underlying array around until you had stopped using that pointer.

Unlike C or C++ arrays, and unlike operator [] with std::vector, attempting to access an invalid index of a slice will result in a panic (effectively a crash) rather than just undefined behaviour. I prefer this, though I imagine that the bounds checking has some small performance cost.

Maps

Go has a built-in map type. This is roughly equivalent to C++’s std::map (balanced binary trees), or std::unordered_map (hash tables). Go maps are apparently hash tables but I don’t know if they are separate-chaining hash tables (like std::unordered_map) or open-addressing hash tables (like nothing in standard C++ yet, unfortunately).

Obviously, keys in hash tables have to be hashable and comparable. The book mentions comparability, but so few things are comparable that they would all be easily hashable too. Only basic types (int, float64, string, etc, but not slices) or structs made up only of basic types are comparable, so that’s all you can use as a key. You can get around this by using a basic type (such as an int or string) that is (or can be made into) a hash of your value. I prefer C++’s need for a std::hash<> specialization, though I wish it was easier to write one.

Unlike C++’, you can’t keep a pointer to an element in a map, so changing one part of the value means copying the whole value back into the map, presumably with another lookup. Go apparently does this to completely avoid the problem of invalid pointers when the map has to grow. C++ instead lets you take the risk, specifying when your pointer could become invalid.

Go maps are clearly a big advantage over C, where you otherwise have to use some third-party data structure or write your own, typically with very little type safety.

They look like this:

m := make(map[int]string)
m[3] = "three"
m[4] = "four"

Multiple return values

Functions in Go can have multiple return types, which I find more obvious then output parameters. For instance:

func getThings() (int, Foo) {
  return 2, getFoo()
}

a, b := getThings()

This is a bit like returning tuples in modern C++, particularly with structured bindings in C++17:

std::tuple<int, Foo> get_things() {
  return make_tuple(2, get_foo());
}

auto [i, f] = get_things();

Garbage Collection

Like Java, Go has automatic memory management, so you can trust that instances will not be released until you have finished using them, and you don’t need to explicitly release them. So you can happily do this, without worrying about releasing the instance later:

func getThing() *Thing {
  a := new(Thing)
  ...
  return a
}

b := getThing()
b.foo()

And you can even do this, not caring, and not easily even knowing, whether the instance was created on the stack or the heap:

func getThing() *Thing {
  var a Thing
  ...
  return &a
}

b := getThing()
b.foo()

I don’t know how Go avoids circular references or unwanted “leak” references, as Java or C++ would with weak references.

I wonder how, or if, Go avoids Java’s problem with intermittent slowdowns due to garbage collection. Go seems to be aimed at system-level code, so I guess it must do better somehow.

However, also like Java, and probably like all garbage collection, this is only useful for managing memory, not resources in general. The programmer is usually happy to have memory released some time after the code has finished using it, not necessarily immediately. But other resources, such as file descriptors and database connections, need to be released immediately. Some things, such as mutex locks, often need to be released at the end of an obvious scope. Destructors make this possible. For instance, in C++:

void Something::do_something() {
  do_something_harmless();

  {
    std::lock_guard<std::mutex> lock(our_mutex);
    change_some_shared_state();
  }
  
  do_something_else_harmless();
}

Go can’t do this, so it has defer() instead, letting you specify something to happen whenever a function ends. It’s a annoying that defer is associated with functions, not to scopes in general.

func something() {
  doSomethingHarmless()

  ourMutex.Lock()
  defer ourMutex.Unlock()
  changeSomeSharedState()

  // The mutex has not been released yet when this remaining code runs,
  // so you'd want to restrict the use of the resource (a mutex here) to
  // another small function, and just call it in this function.
  doSomethingElseHarmless()
}

This feels like an awkward hack, like Java’s try-with-resources.

I would prefer to see a language that somehow gives me all of scoped resource management (with destructors), reference-counting (like std::shared_ptr<>) and garbage collection, in a concise syntax, so I can have predictable, obvious, but reliable, resource releasing when necessary, and garbage collection when I don’t care.

Of course, I’m not pretending that memory management is easy in C++. When it’s difficult it can be very difficult. So I do understand the choice of garbage collection. I just expect a system level language to offer more.

Things I don’t like in Go

As well as the minor syntactic annoyances mentioned above, and the lack of simple generic resource (not just memory) management, I have a couple of other frustrations with the language.

(I’m not loving the support for object orientation either, but I’ll mention that in a later article when I’ve studied it more.)

No generics

Go’s focus on type safety, particularly for numeric types, makes the lack of generics surprising. I can remember how frustrating it was to use Java before generics, and this feels almost that awkward. Without generics I soon find myself having to choose between lack of type safety or repeatedly reimplementing code for each type, feeling like I’m fighting the language.

I understand that generics are difficult to implement, and they’d have to make a choice about how far to take them (probably further than Java, but not as far as C++), and I understand that Go would then be much more than a better C. But I think generics are inevitable once, like Go, you pursue static type safety.

Somehow go’s slice and map containers are generic, probably because they are built-in types.

Lack of standard containers

Go has no queue or stack in its standard library. In C++, I use std::queue and std::stack regularly. I think these would need generics. People can use go’s slice (a dynamically-allocated array) to achieve the same things, and you can wrap that up in your own type, but your type, can only contain specific types, so you’ll be reimplementing this for every type. Or your container can hold interface{} types (apparently a bit like a Java Object or a C++ void*), giving up (static) type safety.

C++: std::string_view not so useful when calling C functions

string_view does not own string data

C++17 adds std::string_view, which is a thin view of a character array, holding just a pointer and a length. This makes it easy to provide just one method that can efficiently take either a const char*, or a std::string, without unnecessary copying of the underlying array. For instance:

void use_string(std::string_view str);

You can then call that function like so:

use_string("abc");

or

std::string str("abc");
use_string(str);

This involves no deep copying of the character array until the function’s implementation needs to do that. Most obviously, it involves no copying when you are just passing a string literal to the function. For instance it doesn’t create a temporary std::string just to call the function, as would be necessary if the function took std::string.

string_view knows nothing of null-termination

However, though the string literal (“abc”) is null-terminated, and the std::string is almost-certainly  null-terminated (but implementation defined), our use_string() function cannot know for sure that the underlying array is null terminated. It could have been called liked so:

const char* str = "abc"; // null-terminated
use_string(std::string_view(str, 2));  // not 3.

or even like so:

const char str[] = {'a', 'b', 'c'}; //not null-terminated
use_string(std::string_view(str, 3));

or as a part of a much larger string that we are parsing.

Unlike std::string, there is no std::string_view::c_str() which will give you a null-terminated character array. There is std::string_view::data() but, like std::string::data(), that doesn’t guarantee the the character array will be null-terminated. (update: since C++11, std::string::data() is guaranteed to be null-terminated, but std::string_view::data() in C++17 is not.)

So if you call a typical C function, such as gtk_label_set_text(), you have to construct a temporary std::string, like so:

void use_string(std::string_view str) {
  gtk_label_set_text(label, std::string(str).c_str());
}

But that creates a copy of the array inside the std::string, even if that wasn’t really necessary. std::string_view has no way to know if the original array is null-terminated, so it can’t copy only when necessary.

This is understandable, and certainly useful for pure C++ code bases, or when using C APIs that deal with lengths instead of just null termination. I do like that it’s in the standard library now. But it’s a little disappointing in my real world of integrating with typical C APIs, for instance when implementing gtkmm.

Implementation affecting the interface

Of course, any C function that is expected to take a large string would have a version that takes a length. For instance, gtk_text_buffer_set_text(), so we can (and gtkmm will) use std::string_view as the parameter for any C++ function that uses that C function. But it’s a shame that we can’t have a uniform API that uses the same type for all string parameters. I don’t like when the implementation details dictate the types used in our API.

There is probably no significant performance issue for small strings, even when using the temporary std::string() technique, but it wouldn’t be nice to make the typical cases worse, even just theoretically.

In gtkmm, we could create our own string_view type, which is aware of null termination, but we are trying to be as standard as possible so our API is as obvious as possible.

C++ Implementations of “Engineering the Compiler” Pseudocode

Implementing is understanding

I’m gradually reading through the Engineering a Compiler book. I’m enjoying it but after a while I started wondering if I really understood the algorithms and I wanted to make sure of that before going further. So I implemented the algorithms in C++ (C++17, because it’s nice), testing them against the example inputs and grammars from the book. For instance, here is my code to construct, and use, the action and goto tables which define a DFA for a bottom-up table-driven LR(1) parser, along with some test code to use it with a simple expression grammar.

So far I’ve done this for Chapter 2 (Scanners) and Chapter 3 (Parsers) and I’ll try to keep going. It is a useful exercise for me, and maybe it’s useful to someone else reading the book.  Please note that the code will probably be meaningless to you if you haven’t read the book or something similar. On the other hand, the code will probably seem childlike if you’ve studied compilers properly.

Trying to get  the code to work often showed me that I had had only the illusion of fully understanding. Much of the pseudocode in the book is not very clear, even when you adjust your mind to its vaguely mathematical syntax, and to know what it really means you have to closely read the text descriptions, sometimes finding clues several pages back. For instance it’s not always clear what is meant by a particular symbol, and I found at least one conditional check that appeared in the description but not in the code. I would much rather see descriptions inline in the code.

Pseudocode is vague

I’m not a fan of pseudocode either, though I understand the difficulty in choosing a real programming language for examples. It means putting some readers off. But there could at least be some real code in an appendix. The book has some detailed walkthroughs that show that the authors must have implemented the algorithms somehow, so it’s a shame that I couldn’t just look at that code. I wonder what real programming language might be the most compact for manipulating sets of states when dealing with finite state automata states and grammar symbols.

For the parsers chapter this wasn’t helped by the, presumably traditional, ambiguity of the “FIRST” sets terminology. FIRST(symbol), FIRST(symbols), FIRST(production), and FIRST+(production) are all things.

My code isn’t meant to be efficient. For instance, my LR(1) parser table-building code has some awfully inefficient use of std::set as a key in a std::map. The code is already lengthier than I’d like, so I would prefer not to make it more efficient at the cost of readability. I’ve also probably overdone it with the half-constexpr and vaguely-concepty-generic Grammar classes. But I am tempted by the idea of making these fully constexpr with some kind of constexpr map, and then one day having the compiler build the generated (DFA) tables at compile time, so I wanted to explore that just a little.

But the book is good

Despite my complaints about the presentation of the algorithms, so far I’d still recommend the book. I feel like it’s getting the ideas into my head , it’s not really that bad, and as far as I know there’s nothing better. Of course, given that I haven’t read the alternatives, my recommendation shouldn’t mean much.

Also, these days I always write up my notes as a bigoquiz.com quiz, so here are my quizzable compiler notes so far.

Boost Graph Library: modernization

Modern C++ (C++11 and later) can greatly simplify generic templated C++ code. I’ve recently been playing around with modernizing the Boost Graph Library code. The experience is similar to how I modernized the libsigc++ code, though I have not gone nearly into that much depth yet. The BGL is currently a big messy jumble of code that isn’t getting much love, and modernizing it could start to let its accumulated wisdom shine through, while also freeing it of other boost dependencies.

Please note that this is just an experiment in my own fork that even I am not pursuing particularly seriously. These changes are not likely to ever be accepted into the BGL because it would mean requiring modern C++ compilers. Personally, I think that’s the only way to move forward. I also think that Boost’s monolithic release process, and lack of a real versioning strategy, holds its libraries back from evolving. At the least, I think only a small set of generally-useful libraries should be released together, and I think that set should drop anything that’s now been moved into the standard library. BGL seems to have been stagnant for the last decade, so there doesn’t seem to be much to lose.

I’ve modernized the example code (and also tried to remove the “using namespace boost” lines), and done some work to modernize the boost graph code itself.

At the least, liberal use of auto can let you delete lots of ugly type declarations that really exist just to make things compile rather than to provide useful clues to the reader. For instance, auto makes the example code less cluttered with magic incantations. Range-based for loops simplify more code – for instance, in the examples.The simpler code is then easier to understand, letting you see the useful work underneath the boiler plate.

I’ve jumped right into C++17 and used structured bindings (and in the examples) because they are particularly helpful with the BGL API, which has many methods that return the begin and end of a range inside a std::pair. This lets us avoid some more type declarations. However, in the examples, I used a make_range_pair() utility function in many places instead, so I could use a simple range-based for. I guess that the range library would provide a type to use instead, maybe as part of the standard library one day.

I also replaced most uses of the boost type traits (some from boost::mpl) with the std:: equivalents. For instance, std::is_same instead of boost::is_same. It should now be possible to remove the boost::mpl dependency completely with a little work.

I’ve also tried putting all of BGL in the boost::graph namespace, instead of just in the boost namespace, but the API currently expects application code to use “using namespace boost”, to make its generic API work, and this makes that even more obvious.

As a next step, I guess that the boost graph implementation code could be simplified much more by use of decltype(auto). As I was modernizing libsigc++, I sometimes found templates that were used only by specializations that were used only by typedefs that were no longer needed, because they could be replaced by decltype(auto) return types. You have to pull at the threads one by one.

gtkmm 4 progress

gtkmm 4 is alive

We (mostly Kjell Ahlstedt and myself) have been quietly working away on gtkmm 4 and an associated ABI-breaking version of glibmm. We’ve been tracking GTK+ 4 from git master, making sure that gtkmm builds against it, and making various API-breaking or ABI-breaking changes that had been left in the code as TODO comments waiting for a chance like this.

This includes simple ABI-breaking (but not API-breaking) stuff such as adding a base classes to widget esor changing the type sof a method parameters. Many other changes are about updating the code to be more modern C++, trying to do things the right way and avoiding duplication with new API in the C++ Standard library.

These changes pleases us as purists but, honestly, as an application developer it isn’t going to give you interesting new behaviour in your applications. If you too are enthusiastic about the C++ renaissance, porting to the new APIs might feel rewarding and correct, and you’ll have to do it someday anyway to get whatever new features arrive later, but I cannot claim there is a more compelling reason to do the work. You might get shiny new features via GTK+ 4, but so far it feels like a similar exercise in internal cleanup and removal of unloved API.

There are still people stuck on GTK+ 2, because most people only port code when they need to. I don’t think the transition to GTK+ 4 or gtkmm 4 will be any faster. Certainly not until the GTK+ 4 porting advice gets a lots better. Porting gtkmm and glom to GTK+ 4 has not been fun, particularly as the trend for API changes without explanation has only increased.

Anyway, here are some of the more significant changes so far in glibmm-2.54 (a horrible ABI name, but there are reasons) and gtkmm-4.0, both still thoroughly unstable and subject to API change:

(Lots of this is currently on in git master but will be in tarball releases soonish, when there are glib and gtk+ releases we can depend on.)

Deprecated API is gone

Anything that was deprecated in gtkmm 3 (or glibmm-2.4) is now completely removed from gtkmm 4 (or glibmm-2.54). You’ll want to build your application with gtkmm 3 with all deprecated API disabled before attempting to build against gtkmm 4.

In some cases this included deprecated base classes that we couldn’t let you optionally disable without breaking ABI, but now they are really really gone.

This is a perfect example of API changes that make us feel better but which are really not of much direct benefit to you as an application developer. It’s for your own good, really.

Glib::RefPtr is now std::shared_ptr

In gtkmm 3, Glib::RefPtr<> is a reference-counting smart pointer, mostly used to hide the manual GObject reference-counting that’s needed in C. It worked well, but C++11 introduced std::shared_ptr so we saw a chance to delete some code and  make our API even more “standard”. So, Glib::RefPtr is now just an alias for std::shared_ptr and we will probably change our APIs to use std::shared_ptr instead of Glib::RefPtr.

Glib::RefPtr is an intrusive smart pointer, meaning that it needs, and uses, a reference count in the underlying object rather than in the smartpointer itself. But std::shared_ptr is non-intrusive. So we now just take one reference when we instantiate the std::shared_ptr and release that one reference when the last std::shared_ptr is destroyed, via its Deleter callback. That means we always need to instantiate the std::shared_ptr via Glib::make_refptr_for_instance(), but we hide that inside glibmm/gtkmm code, and application developers would rarely need to do this anyway.

std::shared_ptr<> is a familiar type, so using it in our API should make it easier for the average person to reason about our API. And using it makes us part of the wider ongoing conversation in the C++ community about how to indicate and enforce ownership. For instance, we might start taking Thing* parameters instead of std::shared_ptr<Thing> parameters when we know that the called method will not need to share ownership. I mentioned this idea in an earlier post. However, we often cannot assume much about what the underlying C function really does.

This is a big change. Hopefully it will work.

Now uses libsigc++-3.0 instead of libsigc++-2.0

I rewrote libsigc++ for modern C++, using variadic templates and some other modern C++ techniques. It feels like libsigc++-2.0, but the code is much simpler. Compiler errors might be slightly less cryptic. This requires C++14.

Enums are inside related classes

Enums that were only used with a particular class are now inside that class. For instance, Gio::ApplicationFlags is now Gio::Application::Flags. This makes the API slightly clearer. This didn’t need C++11, but it did need an API break.

Enums are now C++11 enum classes

Enums are now declared as “enum class” (scoped enumerations) instead of “enum” (unscoped enumerations), so they cannot be implicitly converted to other types such as bool or int. That lets the compiler find some subtle programmer errors.

The enum values must now be prefixed by the enum name rather than having a prefix as part of the value name. For instance, we now use Gio::Application::Flags::HANDLES_OPEN instead of Gio::Application::FLAGS_HANDLES_OPEN (actually Gio::APPLICATION_FLAGS_HANDLES_OPEN before we put enums inside classes).

Gtk::TreeView and Gtk::TextView now have real const_iterators

This lets us make the API more const-correct, requiring less arbitrary const_casts<>s in application code.

Removed the old intermediate ListHandle/SListHandle/ArrayHandle/SArrayHandle types

Long ago, the gtkmm API used these intermediate types, wrapping glib types such as GList, GSList, and C arrays, so you didn’t need to choose between using a std::list, std::vector, or other standard container. Since gtkmm 3 we decided that this was more trouble than it was worth, and decided to just uses std::vector everywhere, but it’s only now that we’ve been able to remove them from the glibmm, pangomm, and atkmm APIs.

A possible change: Not using Glib::ustring

We are still considering whether to replace uses of Glib::ustring in our API with std::string, maybe just keeping Glib::ustring for when people really want to manipulate UTF-8 “characters”. I would much prefer standard C++ to have real UTF-8 support, for instance letting us step through and insert characters in a UTF-8 string, but that doesn’t look like it will happen in the foreseeable future.

Glib::ustring still wraps useful UTF-8 APIs in glib, in a std::string-like API, so we wouldn’t want to remove it.

Also, currently it’s useful to know that, for instance, a gtkmm method that returns a Glib::ustring is giving us a UTF-8 string (such as Gtk::FileChooser::get_current_name()), rather than something of unknown encoding (such as Gtk::FileChooser::get_filename()). We allow implicit conversions, for convenience, so we can’t use the compiler to check for awareness of these encoding differences, but having it in the method signature still feels nicer than having to read a method’s documentation.