Golang Min



December 04, 2017

Go maps in action. Andrew Gerrand 6 February 2013 Introduction. One of the most useful data structures in computer science is the hash table. Many hash table implementations exist with varying properties, but in general they offer fast lookups, adds, and deletes. Max function is used to find the larger of x or y the given inputs (x or y – parameters) in Go language. The standard math package of Go programming language has Max function. Syntax of Max Function in Go Language The syntax of Max function in Go Language is: func Max(x, y float64) float64. Func FMA ¶ 1.14 func FMA(x, y, z float64) float64. FMA returns x. y + z, computed with only one rounding. (That is, FMA returns the fused multiply-add of x, y, and z.) func Float32bits ¶ func Float32bits(f float32) uint32. Float32bits returns the IEEE 754 binary representation of f, with the sign bit of f and the result in the same bit position.

Golang Int Maxvalue

We already know, that Golang is a blazing fast language with one of the smoothest learning curve ever. Despite being easy to learn, there are issues which are not trivial to debug in the language and I feel like memory issues are one of them.

The language utilizes automated garbage collection, so there should be no memory issues then right? Well let’s take a look at a real life example that we encountered with my team.

Figuring out that something stinks

So we’ve spent quite some time in production since our product was released, but our memory issues got visible for us only when users actually started to use the application. When we reached 10k requests per minutes, it became unbearable. Our production environment is a Kubernetes cluster with 6 monolith nodes and a bunch of micro-services. It took about 40 minutes for a node to go down because of the 1 GB memory limit we set before to be a healthy margin.

As you can see if a node reached the limit, Kubernetes just killed and restarted it.

What does it cause ?

If there’s no memory left to allocate you are going to experience things like:

  • Increased response time (And I mean REALLY increased times)
  • CPU usage is sky rocketing
  • Ungracefully terminated processes (Because of the forced restarts)
  • Increased hardware requirements
  • Consistent alarming from the monitoring system

If your application dies, it should be because of the number of requests you receive, and not because of an unfortunate memory leak. You could carry the load from the hardware side for a while, but these kinds of issues need immediate fixes.

CPU Graphs showing the moment of fail

Ridiculous response times

As you can see we almost went up to half a minute sometimes. Just imagine yourself sitting in front of an application for half a minute doing absolutely nothing.

Putting out fire (Or how you should start debugging)

If you are trying to search the internet regarding memory issues of Golang, you are definitely going to find articles related to pprof.

pprof is for profiling your applications.There are a lot of in-depth guides on the internet about how this tool works, therefore I’m not going to write about this, but more about how we used it and what we saw.

First things first, we had to implement a little bit of code which is using the pprof package. Since we are always concerned about the security of our applications, we implemented it in the following way:

As you can see, we are starting a new http server listening on a different port than our application.We do that using a new Goroutine, so it is able to work separately from the original application. This is needed due to the fact that we didn’t want to expose these endpoints for public use.

It is a general principle to never expose anything debug related in a production environment(especially not things like memory profiles). This way we can make sure that non-authenticated people won’t be able to receive any data from the pprof endpoints.

One more little touch, we wanted this whole debug feature to be able to be turned off. We made a new debug variable in our configuration settings, so despite having this feature in production, we could turn it off any time we wanted to.

The practical part

Code was ready, we deployed it to our dev environment. We wanted to wait until the memory is almost full and take a snapshot about the heap with the pprof endpoint to get the clearest picture possible about our memory leak.

The thing is that our dev environment received waaaay less traffic than our prod environment. Do not get me wrong, it still leaked, but it took almost a day for a node to go down. (Fun fact: our dev had only ~0.006% traffic compared to our prod)

We could wait until the right moment, or create some direct traffic for the endpoints, however, we felt confident enough to deploy it to our prod environment since these endpoints could be turned off, and they were only reachable from internal network.

After it was deployed, we just waited for the notification that a node has reached 90% memory.Steps needed to be executed:

  • SSH into the given node container
  • wget pprof endpoints from localhost, put the files into temp directory
  • Exit from container
  • Copy files including the binary from remote
  • Run profiling

How profiling looks like

After we got all the files we needed (from the endpoints + the binary) we started to investigate. Our main focus was to take a look at the goroutines to see if they are stuck and after that examine the heap. This starts with a very easy step:

go tool pprof ./backend ./goroutine

(Notice here that the target is called goroutine, that’s because it is the result of the endpoint /debug/pprof/goroutine)

This is going to be the interactive mode of the tool. Here you can type help for all the commands, however, we are going to take a look to the result of the top command, which looked like this:

As you can see the application goroutines spend most of their time with runtime.gopark which is fine, because it’s the goroutine scheduler. So this is kinda what we expected, we have no problem here.

The heap

Take a look at the main suspect, the heap:

go tool pprof ./backend ./heap

Here, if you are previously installed graphviz (brew install graphviz) you can generate a pretty awesome svg which shows you how the application’s heap looks like with the web command. (You can use it to generate svg for your Goroutines as well, which are usually pretty interesting.)

What we see here is how many actual memory is being used. Golang works in a way, that it’s allocating memory sometimes, just to speed up future allocations based on its predictions. That means, if your application tends to use a lot of memory, then even if it loses all references for a given memory segment, the garbage collector won’t necessarily pick it up instantly, therefore it will remain as “allocated” for sometime.

As you can see, runtime.makemap and runtime.mapassign are both directly related to creating maps. (Others are related to maps indirectly as well, which is also a clue)Not doing anything fancy regarding regular expressions, our attention shifted from third party libs to our code again.

Caching?

What are maps good for? Quick access of data, unique keys for every value, and caching of course.It can be intentional or unintentional caching. After taking another look at our code from this perspective, we didn’t find caching, nor storing global map objects without ever clearing them.

This meant that somehow one of our third party packages was caching data without us knowing about it.With that in mind, it was easy to suspect our url router in the first place.

Gorilla mux

We use Gorilla Mux and it has worked for us so far.So where can Gorilla cache?There are a couple of options where you can set caching and storing with Gorilla.You could set it directly in the Router:

Since we used Go1.8 at that time, this one was fine.

You can set caching in the Gorilla context:

Golang min int32

So we can see 2 things here. The first one is that if we would like to clean ourstored request contexts, we had to wrap the request into ClearHandler function

The second thing is that Clear itself uses a global mutex to clear the data cachedby the Gorilla context. What that means is that this function is a blocking function regardingincoming http requests.

Both of the functions use delete which is a built-in function for deleting maps.Despite using it, we experienced that garbage collector still wasn’t able to collect the data which were allocated here.

When you start to learn Go, probably one of the first things you learn, is that you don’t have to free memory manuallybecause of the garbage collector. If you still decide to do it, you can write code for it which is not recommended, or you can use the fact that GC frees up memory, when you have no more reference to a given memory segment.

So after digging more down into the source code of Gorilla, it turned out that thiswas not the first application producing these kinds of issues while using Gorilla.Therefore the developers being aware of this situation implemented the Purge function,which sounds much cooler than it actually is.

As you can see what happens here, is that as a parameter it gets a maxAge value, which definesthe maximum lifetime of the context we store. If we set it to zero or below, it will always createnew maps for the requests.This means that the original contexts stored in data and datat are going tolose every reference in the application so the garbage collector is going to pick it up.However, if we set it to a value above zero, it’s going to check every request contextif their lifetime is above that or not.It blocks every request while it runs so in an optimal situation you don’t want to call it too often.

In the worst case scenario we delete contexts which are still being used by the application, thereforecreating a lot of serious issues. Yay?

The optional time for a request context to sit in the memory for us at that time, was about half a minute max.So we thought let’s make it a double, set the maxAge to a minute and we are good to go.

We could see the immediate change in our monitoring system.

Memory? Back to normal.

CPU Usage? Never been more optimal.

Summary

If you are trying to find a memory issue in any application, you always have to start with the code you have written.However, if the application is a little bit bigger, and has been running in production for a while, well, you might have to dig deeper.

Golang’s memory profiling has never felt more easier and the documentation about it is getting better every year. After seeing the power it brings to the table, when it comes to debugging memory issues, I have to say that this is one of the better solutions out there right now.

Experiencing a serious bug like this in a production environment is really nerve-racking, but after figuring out what is wrong, you can finally find the official issue about it.

Written by Imre Racz, tech enthusiast.Check LinkedIn

The 64-bit ARMv8 core has introduced new instructions for SHA1 and SHA2 acceleration as part of the Cryptography Extensions. We at Minio were curious as to the difference that these instructions might make, and this turns out to be one of the nicer surprises that you sometimes get from time to time.

We have been running an ARMv8 server at miniNodes.com and if you look at the CPU info you will see the following:

As you can see it nicely lists the sha1 and sh2 features of which the latter is the topic of this blog post.

As part of the minio/sha256-simd repository we have added an arm64 Golang assembly version of which an excerpt is shown below (just one cycle of many cycles):

We have compared this version against the default implementation that is now available in Go. Without further ado, here are the results that are reported by benchcmp:

This is of course a massive increase from a (meager) 6 MB/sec to 615 MB/sec (per core). To be fair, the default implementation for Go on ARM is not accelerated in any way (unlike for instance for Intel CPUs where there is an assembly version). If it had been in assembly the difference would be quite a bit smaller, but in our view still minimally a factor 10x due to the new sha256h, sha256h2, sha256su0, and sha256su1 instructions.

Golang Minify

As you can maybe derive from the name of the minio/sha256-simd repo, yes, we are working on adding SIMD (AVX2, AVX and SSE flavors) support for SHA256 on Intel, so stay tuned for that. We do not promise a 100x speedup for Intel though…

Interestingly enough, there are actually comparable Intel SHA extensions to the ARM equivalents. Linux 4.4 has added support for this but so far we have not been able to identify any CPUs that will actually run this code. If you do, please let us know and with wider support we would actually be interested in adding this to minio/sha256-simd.

Golang Min Heap

Finishing off, the asm2plan9s tool that we initially developed to assist in minio/blake2b-simd is being extended with ARM support (in addition to Intel that is already available).

Golang Min Function

And sharp readers may have noticed that there is one more interesting feature for ARMv8 which is PMULL (polynomial multiplication). This instruction may help very well with Reed Solomon Erasure Coding. We use this technique at Minio for the XL version to guarantee additional safety and protection against bit rot. So stay tuned for that as well.