Thursday, January 22, 2015

my thoughts on golang

I've been playing with go in the evenings and over xmas break for about 8 weeks now. This post is about go the language and the tooling. I may write another post about a simple go package for bioinformatics that I've been writing which is under 1000 lines of code.

First, go is boring, and though it is pretty terse, I do miss things about python like list comprehensions; initializing a variable and writing a for loop is easy enough, but it's one of the things that I use all the time in python.  But, I can't argue with the "less is exponentially more" mantra as I was able to pick up the language very quickly. The tooling is fantastic. My project has dependencies that are wrappers to C-libs, but I can simply do:
    go get
and it just works. The project is only about 1K lines of code, but it compiles in about 0.1 seconds on my laptop. And, when the time comes, I can distribute binaries for common platforms!

vim-go is awesome! I've had various plugins for python in vim, but I think the combination of types and formatting and interfaces help vim-go to help me to a level I've not seen with python. In addition to the obvious stuff such as using incorrect types, if I try to use a struct that doesn't satisfy an interface as required, it will tell me which methods are missing or if I should be using a pointer receiver. Nearly all of the problems I would encounter are caught at edit-time (compile-time) via vim-go. The run-time error messages are pretty readable--though my most common one is nil pointer exception which I've learned to track down pretty quickly--or to obviate altogether by doing things in a more go-ish (?) way.

I needed some iterators for this project and I figured that using function closures would be the fastest, both in terms of development and performance because I hadn't used channels. I had also seen that using channels as iterators is quite slow. I initially wrote the iterator as a function closure and then converted it to a channel. The linked benchmark isn't doing any work inside the iterator, so (presumably) adding the channel machinery slows things down. In my case, I'm doing some parsing in the iterator so having that done concurrently in a go-routine can speed things up quite nicely. Converting from the closure-based iterator to the channel was simple, again due to the niceties in vim-go as allowed by the language itself.

In python, I never think about interfaces (well, almost never). In go, that's the obvious thing to do. I did find myself wishing that interfaces could be defined in terms of fields, and not just methods. Initially, writing:

    func (self *Interval) Start() uint32 { return self.start }

was annoying (I wanted to just have 'start uint32'), but, in dealing with genomic data, we sometimes have 1-based and sometimes 0-based coordinates, so for another struct to satisfy the interface I had to have:

    func (self *Gff) Start() uint32 {
            return uint32(self.Feature.Start() - 1)

So, I needed the wrapper anyway to normalize the positions. Forcing the cast to uint32 ( self.Feature.Start() is uint64), actually helped me catch some problems. Writing methods like that is simple and vim-go does let me know when I'm missing methods to satisfy an interface.

Other than my initial change from closures to go-routines, a failed attempt to convert from interface{} to a strict Interface (see below), then making small tweaks to the capacity in my slice initialization, or channel buffer size, I have done very little optimization but my toy project is within 3X of a highly developed C++ project (where a python version with minimal features was 20X slower than the C++ version). The compilation speed and speed of development is more than worth it for me. I did do some profiling with the easy-to-use tools; my project is very GC heavy, so I can hope that the GC changes in 1.5 and 1.6 will bring it even closer to C++.

(In go, an Interface is a declaration of a set of methods that allow a sort of duck-typing [or maybe that is exactly duck-typing] and the empty interface 'interface{}' is satisfied by all types.) I've seen that lack of generics is a common complaint. One alternative, using interface{}, can make things slow. I'm using a priority queue to implement merging of sorted streams (a la python's heapq.merge) and the heap implementation in go uses interface{}. I tried converting all occurrences of interface{} to "MyInterface" in the heap.go source file and then removing all of my casts, but actually got negligible improvement. For the most part, I found that Interfaces made my code "generic" enough but I did have to implement min() and max() for uint32 types since those are not provided--that could get annoying if I had to have those for many numeric types. The most common syntax annoyance where I still waste precious brain-power is the range syntax. For channels, it's a single thing that's yielded; for most, it's the index and the item. This is becoming 2nd nature, but it is in contrast to python where iterators are a consistent and simple abstraction. It's a minor thing but I guess it is noticeable in a language where most stuff is pretty unsurprising.

Slices just work. I use them heavily and never have to waste brain-cycles on deciding how to use them. I don't know if this is common, but I haven't used arrays, only slices.

Other nits:
1. In other languages I've worked with, I can open a gzipped file and get a single filehandle. With go,
I get 2 handles to track and close. I almost wrote a wrapper for zlib's gzfile, but have been trying to focus on getting something working (for now, my code for that is horrible).
2. For the most part, I really like go fmt, but I wish they had chosen 4 space indent.
3. It'd be nice if functions could have default arguments.

My use of the parallel (ahem, concurrent) features has been pretty basic, thus far relying only on select and range on buffered channels. I haven't used the sync package or even a channel as a sort of lock, but the syntax, even after very little use feels natural.

The reason I like python is because of the simple, concise syntax, standard library (which IMO is still pretty good despite common opinion), iterators/generators, and the numpy/scipy/statsmodels/sklearn stuff. I think that go makes my code a little nicer and it seems to run faster for my common uses. It helps that go has the amazing biogo libraries for genomic data. The tooling is phenomenal and the coding has been very enjoyable.

These are my perceptions after a short time with go. I'd be happy to be relieved of any *mis*conceptions.


Anlhord Smithson said...

Anlhord Smithson said...
This comment has been removed by the author.
James Graves said...

For the most part, I really like go fmt, but I wish they had chosen 4 space indent.

Just set your vim to show 4 column tabs instead. That's what I do.

brentp said...

@James, aye, someone has since pointed that out to me:

set tabstop=4
set shiftwidth=4

and all is well.

NHO said...

This is explicitly what defer there is for. Open handles, defer closes below them, forget about tracking.

>1. In other languages I've worked with, I can open a gzipped file and get a single filehandle. With go,
I get 2 handles to track and close. I almost wrote a wrapper for zlib's gzfile, but have been trying to focus on getting something working (for now, my code for that is horrible).

brentp said...

@NHO, I guess I could restructure the code though it's not obvious to me how to do that. As it is I *do* use defer, but have to pass around the filehandle until I'm in the function that sets up the goroutine that's doing the parsing. If all of the logic isn't in one function, that's what is required to user defer.