Address and Leak Sanitizers for C++

You heard that in C++ there’s always the risk to leak memory, and now you’re wondering if there is a tool that can help you find if your program contains memory leaks? Luckily, there are some tools, such as the LeakSanitizer.

First of all, this is something done at runtime – this sort of functionality won’t be able to find your memory leaks at build time, unfortunately. You can use the AddressSanitizer to help you find memory errors, which might include things that lead to memory leaks, but the leak sanitizer is explicitly made for finding memory leaked at runtime.

Now, even though it’s a runtime functionality, you will still need to tell your compiler to activate the sanitizer at build time. There are several options to do so.

As I follow a lot the cpp-best-practices/gui_starter_template, I take advantage of the ENABLE_DEVELOPER_MODE CMake variable that will instruct project_options to enable sanitizers for me. However, all you need in theory is really just to set the flag “-fsanitize=address” at compilation time – beware that Apple Clang, as of version 14.0.0, does not support yet this sanitizer (you’ll need clang or gcc, take a look here to see how to install them).

Finally, as this is a runtime option, and you will need to set a variable to tell the sanitizer that you want to use a specific feature, in this case the “detect_leaks=1“. In order to do so, you’ll need to set the LSAN_OPTIONS before executing your program, e.g.,

LSAN_OPTIONS="detect_leaks=1" ./bin/app

At the end of the program, the leak detector will give you some warnings if there were memory leaks found.

Note: In CLion, you can set sanitizer flags in Preferences | Build, Execution, Deployment | Dynamic Analysis Tools | Sanitizers.

Now, when you run the program in the IDE, the variable will be set, and at the end of the program, you will get some feedback if some memory leaks were found.

How to install Qt6 for C++ on Mac

Lately I have been working on a hobby project written in C++ and Qt6.

Had I not ditched Conan to use Vcpkg, due to their backward incompatible migration from v1 to v2 which completely wasted my build, probably this would post wouldn’t exist. 🙂

Anyway, I stumbled upon a quite nasty issue that seems to hit some people using Qt, probably because of the way it is installed.

In fact, after using Conan, I decided to use homebrew to install qt, and the behavior is, to say the least, weird.

When you build your project with cmake and package it as an OSX Bundle like in the code:

if (APPLE)
    add_executable(myapp MACOSX_BUNDLE)
# ...
endif()

Then you get an issue like:

 

objc[97951]: Class RunLoopModeTracker is implemented in both /usr/local/Cellar/qt/6.3.2/lib/QtCore.framework/Versions/A/QtCore (0x10d0626c8) and /Users/path/to/myapp/build/bin/myapp.app/Contents/Frameworks/QtCore.framework/Versions/A/QtCore (0x11d0ca6c8). One of the two will be used. Which one is undefined.QObject::moveToThread: Current thread (0x6020000024f0) is not the object's thread (0x60200000e910).

Cannot move to target thread (0x6020000024f0)

You might be loading two sets of Qt binaries into the same process. Check that all plugins are compiled against the right Qt binaries. Export DYLD_PRINT_LIBRARIES=1 and check that only one set of binaries are being loaded.qt.qpa.plugin: Could not load the Qt platform plugin "cocoa" in "" even though it was found.This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

However, if you use macdeployqt, the issue disappears. This, though, means you automagically lose the ability to use the “Run in IDE” functionality, because it just doesn’t work.

The weird thing is that after the build (and before using macdeployqt) the package folder contains lots of libraries, including the cocoa plugin it apparently fails to find, the Qt dependencies you need and also the ones you don’t need (e.g., I don’t use QtNetwork, but it also appears in the directory). Therefore, at least from the look of it, everything should be there in the folder, and there shouldn’t be any need for loading things from outside.

Finally, after hours of useless debugging, trying to understand which library or plug-in was causing the double loading, I decided to find another way to install Qt.

I chose aqtinstall, the tool behind the Github Action to install Qt in the CI environment – which I use on Github already. Nonetheless, the method applies to any machine, so why not.

With a couple of simple commands:

$ mkdir qt6
$ cd qt6
$ aqt install-qt mac desktop 6.3.2 clang_64

At this point, you need to set the environment variables so that when you run CMake, the qt libs are found correctly. I recommend creating a simple file, e.g., ~/setup-qt.sh, so that you can load it also from your IDE (e.g., in CLion you can load an environment file in Preferences | Build, Execution, Deployment | Toolchains).

$ export Qt6_DIR=/path/to/qt6/6.3.2/macos/
$ export QT_PLUGIN_PATH=/path/to/qt6/6.3.2/macos/plugins
$ export PKG_CONFIG_PATH=/path/to/qt6/6.3.2/macos/lib/pkgconfig
$ export QML2_IMPORT_PATH=/path/to/qt6/6.3.2/macos/qml
$ export PATH="/path/to/qt6/6.3.2/macos/bin:$PATH"

Now, you should finally have an environment that works like a charm – without having to rely on the deployment tool.

Software Reliability with Dishwashers

Some weeks ago my dishwasher started leaking water in the front. At first, I thought the water filter was full, and after cleaning it properly, the issue didn’t happen anymore.

Until some days ago.

This time it came with much more stubbornness, showing errors on the display. The dishwasher is an AEG, and the error code was looking like a ,10 (yeah, comma included). I googled a bit, and it seems that it means that the dishwasher can’t load any more water. It was quite surprising, as just after a few minutes the cleaning program had started, I was able to clearly hear that the water was flowing through the pipes. Then suddenly silence and that error message.

The QA side of me forced me to randomly press the buttons present on the machine, with some more persistence for the Reset one, hoping it would just heal itself – who knows, maybe it was overwhelmed – pressing F5 typically works 🙂 Strangely enough, the error code disappeared when I pressed the arrow down button, totally unexpected, and the program resumed. I was happy until the dishwasher started leaking water again, and this time another error code appeared: ,30.

I googled again, and I found what seems to be a fail-safe mechanism from the manufacturer to prevent the appliance from leaking too much water – which could be dangerous if there are kids, pets, or cables on the ground, I guess. It seems this is a feature called Aquastop.

That was an interesting finding. I googled (again) a bit to understand how this is all connected together, and I found a short video explaining in five minutes how the Aquastop works. It immediately reminded me of the Circuit Breakers that we use for reliability patterns, for example in microservices.

I like to think that this happens when a team of smart engineers sits down together and tries to solve a real problem in a creative way. It’s astonishing what we can learn from electric engineers, or more specifically, from the products we use on a daily basis, if only we had the time to disassemble stuff and see how it was done. In this case, obviously the product can’t heal itself, because maybe the water hose is perforated, it leaks water, etc., however, the gist of it is this: a simple monitoring tool that sends a signal to stop the water inlet valve, when there is too much of it where it shouldn’t be.

Brilliant.

What’s next? Pingdom for washing machines?

Akka Streaming without backpressure

Recently I bumped into an interesting issue that started like this:

curl: (18) transfer closed with outstanding read data remaining

After googling a bit, it seems that the response entity was never reaching entirely the client issuing the request, therefore curl failed because it was expecting more data than received, and then the connection terminated.

How the hell can this ever happen?

It can happen when there is a network interruption somewhere between the client and the service, or when packets are dropped and never sent, therefore never received.

Internet facing services are many times just a gateway for some more services behind it, so it could happen anywhere. For example, a curl like:

could mean:

which means that one request might actually be processed by several services that will respond according to their responsibility:

  • one pushes logs (Log Processing Service; yeah I know, this is not the proper way to collect logs)
  • one takes care of the fraud and gives us the OK/KO to process the request (Fraud Detection Service)
  • the other one builds and pushes the actual response (Streaming Data Service), after taking it first from the Data Source

Maybe the Data Source is the culprit. However, if you try to run curl on this service, it works perfectly. Surprisingly, the same curl on the Streaming Data Service doesn’t work. We see in the logs repeatedly the following warning coming from the Data Source service:

Dropping the new element because buffer is full and overflowStrategy is [DropHead]

Essentially, if we call Data Source directly, all good. If we call the Streaming Data Service, we get just some data + the warning message above.

We dig into the code, and we find that it’s using akka-http to transfer Chunked HTTP responses. The server side code does something like:

val route =
  path("data") {
    val data: Source[MessageType, NotUsed] = getData
    complete(data)
  }

Aha! Behind the scenes, the Source is created like:

Source.actorRef[Any](bufferSize, OverflowStrategy.dropHead)

and used somewhere else to receive and forward messages:

The MessageSender actor sends messages to the actorRef behind the Source. However, in this scenario, there is no way to tell the MessageSender to slow down. It just takes a short network problem between the Streaming Data Service and the Data Source to drop part of the response.

Our streaming application should take into consideration and respect the contract between producer and consumer, so that if the consumer is slow, the producer should not just push data.

One solution to this problem would be to use a Source.actorRefWithAck, so that the MessageSender would only send a message after receiving an Ack from the Source. In other cases, a Source.queue would be more appropriate, as it allows to buffer messages and to backpressure the producer by giving responses like Enqueued, etc.

For example, the code below tries to send the next piece of data every time the queue is able to enqueue the message sent.

override def preStart(): Unit = {
    self ! DataToSend(initialData)
}
 
def receive: Receive = {
    case data: DataToSend =>
      queue.offer(x).map {
        case QueueOfferResult.Enqueued => self ! data.next()
        case QueueOfferResult.Dropped => // nothing to do
        case QueueOfferResult.Failure(ex) => // handle failure
        case QueueOfferResult.QueueClosed => // handle queue closed
      }
  }

This code should not be used for production 🙂 It’s just an example of what you can do with queue.offer(…).

Now our Streaming Data Service will be able to consume all messages, as the producer sends messages at the speed chosen by the consumer.

And the external curl will work out of the box, because all data will be transferred from one node to the other according to each consumer’s speed.

External References

Java concurrency with volatile

One of the main assumptions we developers often times make is that programs have some sort of sequential consistency[0]. This might be probably due to the fact that it’s not only easier for us humans to think in terms of consistent sequential steps, but also because we may have learnt it that way during college or at work.

Although we are constantly challenged with concurrent daily tasks, the first thoughts when developing an algorithm or a solution are similar to “we begin with step 1, then step 2, then if this happens, do step 3 or go to step 4, then end”.

However, reality differs from that, and any programming language that wants to offer the possibility to execute concurrent code will have to deal with it. Typically, the way this mechanism works is defined in the language memory model specifications.

Enough philosophical thoughts. A couple of days ago I reviewed some code that was supposed to be multi-threaded. It was indeed multi-threaded, however, when I saw the volatile keyword, it rang a bell. So, I tried to understand how the code was supposed to work. In the end, it turned out to introduce a race condition, which means that the concurrent code was incorrect (and obviously difficult to test, still, it doesn’t mean it won’t happen) and its output depended on the timing of events. It was quite similar to the following snippet (instead of using a thread pool, the code was using Akka actors):


import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.List;
import java.util.ArrayList;
public class VolatileRaceCondition {
private static final int NTHREADS = 10;
public static void main(String[] args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
for (int i = 0; i < 100; i++) {
Runnable worker = new MyRunnable(i);
executor.execute(worker);
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination(60, TimeUnit.SECONDS);
System.out.println("Finished all threads");
for(Integer i: WithGlobalVariable.nums) {
System.out.println(i);
}
}
private static class WithGlobalVariable {
public static volatile List<Integer> nums = new ArrayList<Integer>();
}
private static class MyRunnable implements Runnable {
private final int countUntil;
MyRunnable(int countUntil) {
this.countUntil = countUntil;
}
@Override
public void run() {
WithGlobalVariable.nums.add(this.countUntil);
}
}
}

If you try to run it locally, you’ll get always different results. For example:

Exception in thread "pool-1-thread-7" Finished all threads
1
8
7
6
4
2
9
3
5java.lang.ArrayIndexOutOfBoundsException: 33

10
11
at java.util.ArrayList.add(ArrayList.java:459)
12
at VolatileRaceCondition$MyRunnable.run(VolatileRaceCondition.java:43)13

14 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

15
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)16

17
at java.lang.Thread.run(Thread.java:745)18

19
20
21
22
23
25
24
26
28
27
29
31
32
33
34
null
36
35
37
38
39
40
..

The multi-threaded code is not correct, therefore the output is non-deterministic. In fact, we have N threads and each of them tries to execute some actions on a (volatile) static variable. The Java language specification for the volatile modifier says:

The Java programming language allows threads to access shared variables (§17.1). As a rule, to ensure that shared variables are consistently and reliably updated, a thread should ensure that it has exclusive use of such variables by obtaining a lock that, conventionally, enforces mutual exclusion for those shared variables.
The Java programming language provides a second mechanism, volatile fields, that is more convenient than locking for some purposes.
A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable (§17.4).

The way the Java Memory Model does it is 1) by not caching the variable in such a way that it can’t be seen outside the thread (for a quicker access), and 2) by not reordering the operations on that variable (this is done by compilers as an optimization step).

This means that volatile variables are just about visibility, not atomicity – you need locking for mutual exclusivity and read-modify-write operations. The code above will call the method ArrayList.add() concurrently, each time potentially with different state – multiple threads are reading, modifying and writing the arraylist, therefore, each thread may be given control before or after any of these operations (which is the reason why we have an ArrayIndexOutOfBoundsException):

    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

If we were able to have a data structure that allowed to add an element atomically, volatile would probably be OK, even if more threads were operating on it concurrently, because writing means performing a “single operation”, without any dependency on the current value.

One alternative, if we want to use an existing collection and depending on the use case, would be to use a CopyOnWriteArrayList, that provides a thread-safe ArrayList that doesn’t need any external synchronization.

When to use volatile variables?

As a rule of thumb, follow what B. Goetz writes[1]:

You can use volatile variables only when all the following criteria are met:

• Writes to the variable do not depend on its current value, or you can ensure that only a single thread ever updates the value;

• The variable does not participate in invariants with other state variables; and

• Locking is not required for any other reason while the variable is being accessed.

Lessons learnt:

  • When you see concurrency/multi-threading related code in code reviews, pay more attention and try to understand what it does.
  • Thread-safety is a hard topic, and even experienced developers may lack some of the deep understanding it takes to design thread-safe code. Ask questions, do some research, try to understand as much as you can. Thread-related bugs are really hard to find, especially because of the syndrome: “it won’t happen here, we are not having the scale of Google“.

External References

[0]: Java Concurrency in practice, p. 338, B. Goetz.

[1]: Java Concurrency in practice, p. 39, B. Goetz.

[2]: Java 8 Memory Model

Akka metrics and traces with kamon.io

Lately I have been working again with Akka, a fantastic framework to build concurrent, fault tolerant systems.

At first, it came as a surprise to me that besides Lightbend telemetry there was almost nothing “officially developed” for something that I consider essential to build a reactive system.

As you may have seen in the previous post, responsiveness without numbers is a bit weird

– we are cool

– do you have numbers to say that?

– no

– then you are not cool. Not at all 🙂

I am not entirely sure about how much it would cost to subscribe to Lightbend – you need to get in touch with sales, and you may get a contract that probably depends on the volume of your apps, number of nodes, etc. – I am quite sure it would not be so expensive as someone might think. Still, I would prefer not to pay for something that I consider to be basic – this is for me not ancillary.

Enter Kamon.io.

Kamon-io is a quite powerful set of open source libraries that you can plug into your JVM-based projects to gather metrics, traces, spans, and so forth. It’s based on AspectJ, which may not be the most standard way to do things in Scala, but we have to admit that Akka is another kind of beast. In Scala you might have stackable traits to provide metrics, but in Akka they sound like hacks (see here, for example) – it’s not fun that you can’t really “stack” the receive method. Even then, how would you intercept messages going through system actors? You couldn’t do that – it should be done by the akka core team.

Now, the library is quite easy to integrate with – it takes more time to understand what you actually want to measure – see the quickstart. I am going to skip this part, because it’s already documented.

What I would like to show you is how we collect custom metrics – as this is not documented anywhere.

Custom Metrics

As we are going to need Kamon.io to collect metrics, it might be a good idea to use the same approach based on AspectJ, so that the final result is like an extension of the original library that we create based on our needs.

Be wary that you could have something like this every time you want to add something your metrics:

Kamon.counter("app.orders.sent").increment()

but eventually you’ll get tired of it, considering it will bloat your actors code. It’s like having a logged line for each new request your web server is handling – most of the time, web frameworks provide filters that you can apply before/after some events happened, so there is no need to add a single “log.info” statement for that – just create and apply a filter. If you have many actors and many events to record, extracting the handling part might be a better option.

Now, all you need to do is the following: create a new module in your project to have a dedicated resource handling custom metrics. Create the aspect that will handle the interception of the events plus the relative action to take (in this case, simply increment some metric):


package metrics.instrumentation
import com.typesafe.config.ConfigFactory
import org.aspectj.lang.annotation._
import org.slf4j.LoggerFactory
import models.messages.CustomActorEvent
import metrics.CustomMetrics
@Aspect
class CustomActorInstrumentation {
private val config = ConfigFactory.load()
@Pointcut("execution(* org.mypackage.actors.CustomActor.aroundReceive(..)) && args(*, msg)")
def onCustomActorMessagePointcut(msg: Any): Unit = {}
@Before("onCustomActorMessagePointcut(msg)")
def onCustomActorMessageHandler(msg: Any): Unit = {
val customMetrics = CustomMetrics.forSystem("my-system")
msg match {
case e: CustomActorEvent =>
customMetrics.customEvent.increment()
}
}
}

and the CustomMetrics object that wraps all the metrics you want to record – you can find some interesting way to do it here.

Now, CustomActorEvent is a trait. Why do I use pattern matching on an trait, instead of the real message that is received by the actor? As mentioned here:

  • It is a good practice to put an actor’s associated messages in its companion object. This makes it easier to understand what type of messages the actor expects and handles.

Therefore, we define messages inside the companion object that extend a trait that can be easily put into another package, so that we don’t have a tight coupling between the metric-handler and the actor itself.

One last thing worth mentioning: don’t forget to create the relative aop.xml file in your new module with the content you need:


<!DOCTYPE aspectj PUBLIC "-//AspectJ//DTD//EN" "http://www.eclipse.org/aspectj/dtd/aspectj.dtd"&gt;
<aspectj>
<aspects>
<aspect name="metrics.instrumentation.CustomActorInstrumentation"/>
</aspects>
<weaver>
<include within="metrics.instrumentation..*"/>
</weaver>
</aspectj>

view raw

aop.xml

hosted with ❤ by GitHub

You can find very useful information in the AspectJ documentation relative to the configuration.

Good to Know

You will need the following plugins if you plan to use the approach described above:


addSbtPlugin("io.kamon" % "sbt-aspectj-runner" % "1.1.0")
addSbtPlugin("com.lightbend.sbt" % "sbt-javaagent" % "0.1.4")
addSbtPlugin("com.lightbend.sbt" % "sbt-aspectj" % "0.11.0")

view raw

plugins.sbt

hosted with ❤ by GitHub

Now, a question I would like to ask you is the following: what metrics are you collecting?

Reactive Systems – Responsiveness

This article is the first of a series that I plan to write about Reactive Systems.

A brief analysis: it’s 2018 now, and we often read words like “react”, “reactive”, and similar. As if the word “reactive” itself was not ambiguous enough, someone even started baking frameworks naming them “react.js” – this is unfortunately completely unrelated to the concept of reactive systems as we conceive them.

Ambiguities aside, a few years ago a group of people decided that it was necessary to put together a few non-functional requirements they had learnt during their lives to be essential if you want to build good software – then they named this document the Reactive Manifesto.

Granted that I dislike manifestos, because they scream for attention and due to their PowerPoint-like nature they tend to be misunderstood and often overlooked, I have to admit that in this case, if you are truly, positively led by curiosity about what Reactive Systems are without preconceptions (oh, yet another manifesto, …), you will surely agree that Reactive is the correct term to be used here. I still wouldn’t have called it manifesto, though, and I wouldn’t have asked people to sign it – but these are personal preferences 🙂

So, this document – the manifesto – describes Reactive Systems as responsive, resilient, elastic and message-driven.

 

reactive-traits

In this article I will focus primarily on the first principle: responsive.

What Does It Mean?

Responsive means that our systems need to respond in a timely manner to offer a smooth experience. What does timely mean? 1 second? 3 seconds? There are tons of non-academic studies showing that if your customers have to wait more than X seconds, then Y % of them will choose another competitor. This is of course only one of the measurements you might be interested to. You could be interested to when your brand new printer has to start processing the text, once it receives a new request. Is it OK to have the user of your service/product wait for 10 seconds? Maybe. Everything depends on the use case.

Why Is It So Difficult?

The challenge here implies dealing with real systems – stuff that needs to be maintained, deployed, reviewed, etc., not proof of concepts or your Sundays experiments. For example, in the following picture we can see what a real-world architecture based on micro-services (at Netflix) looks like:

sl32

Saying that your system is responsive means that you have been able to solve lots of the challenges that responsiveness brings along, not always easy to solve: legacy systems still needed, more micro-services than needed, network latency, performance overhead in the software and technologies used, code not optimized, software running on old hardware. This is all part of responsiveness.

Old Hardware? What…?

Before we get into more details, let me share a little observation: since the Cloud has taken off, I have noticed that we developers are less and less careful about the resources that our software needs.

Nowadays, we think in terms of CPU units, which translate differently according to the different cloud vendor (1 vCPU can be a full core or just a half, more or less – there are lots of comparisons between AWS/Azure/GCP out there). However, with these machines on demand, we barely know what processors they have! Who cares? Just give it a t2.large instance and that’s it. Server-less architectures increased even more this disconnection between developers and machines.

This is certainly part of a broader topic that involves costs optimization, resource consumption, and so forth, yet I consider it important, because it has an impact on responsiveness as well. If you use old machines, you may be disappointed.

What Is Responsiveness About?

Responsiveness has the noble goal of providing the best usability – it’s not fun to have to wait for 2 minutes to do something that we think could or should take less. Responsiveness has at its base foundation reliability and availability, and these all serve the goal of creating a valid SLA.

SLA: you may have in your contract that certain API calls will not respond in more than 10 seconds on average per day. By having numbers that define upper boundaries (how long should a response take?), we can quickly decide whether some event is exceptional and deserves attention – for example, last 10 requests took more than 5 seconds? If so, send an alert.

Availability: the famous 99.many-nines% that lots of cloud vendors offer. This simply measures the uptime/total-time, or the % of operable state of your service.

Reliability: often confused with availability, this is more related to stability and fault-tolerance. It measures how long a system performs its function (given an interval). For example, if the service is systematically down for 6 minutes per hour, its availability is 90%. However, its reliability is less than one hour, which could be way more interesting than the overall availability percentage.

A responsive system should be available and reliable, otherwise it can’t stay responsive. Even responding with an error is certainly better than not responding at all. Also, when we have numbers we can act on error conditions, we can offer guarantees, and we can sell a service that returns always something.

Why Responsiveness?

In fact, responsiveness is often perceived as an optimization “feature”, like security. The infamous misunderstanding of Donald Knut’s words “premature optimization is the root of all evil” didn’t help here.

Now, I love quality and I strongly believe it’s the main differentiator between multiple products – why to choose X instead of Y, W, and Z. I see also the value in trying to have stuff done, though. So, why don’t we implement from the ground up a mentality leading to high quality products? A mentality that doesn’t procrastinate, that is not lazy and that believes that the product under development will take off and will be successful. I see more and more often that due to this time to market madness, products lack a lot of non-functional features. Security, quick responses, usability, of course, depending on the domain. For some reason, we tend to think that non functional requirements are useless. However, the fact that we have multiple search engines, multiple e-commerce, etc., should tell us that time to market is important, yes, but on long term what matters is also the set of non functional requirements. You can’t always think that your customers will use your products because you were the first. Eventually someone will do the same and better and add a non-functional feature to it, like security, which seems pretty important lately.

Responsiveness is also important, considering that most people are connected to the internet via a mobile, and when something is slow on a mobile phone, it looks twice as slow as on a laptop – probably due to the fact the focus is higher on the little screen.
Long story short: plan for responsiveness as early as possible in your product roadmap. Don’t procrastinate, trust developers and define a threshold with them – it can be as stupid as a simple “this call has to take up to X seconds”. Only if you have numbers you can brag about it, otherwise it’s pure speculation.

Reality Check – The Role of Technology

There is a sort of myth about responsiveness that tells us that one of the first steps to have responsive services is to choose a great technology. In fact, in the Web Services/SaaS world, it seems that those are often chosen by a trend. As if that wasn’t enough, there are tons of benchmarks online, like https://www.techempower.com, that are often considered as a starting point to choose next framework or whatsoever.

Now, it’s stupidly simple to say my API is responsive, if all your API does is to return a canned response. No framework will disappoint you here, even some old CGI script is able to handle gazillion calls per minute on a modern machine. There are also benchmarks offering some “dynamic” features – like querying. Still, the question I ask myself is how relevant are those benchmarks for what we want to achieve?

I still believe it’s good to have such informative websites, because they give a rough idea about the computing power needed (which could decrease costs, like how big your EC2 instances have to be), yet we have to evaluate properly a technology before falling for it just because it’s in the top-10 fastest/quickest/<superlative-positive-adjective> technology. If you look at the charts on the website mentioned above, as of today, django is in deep troubles compared to almost any other technology out there. However, there are dozens of highly responsive websites using Django, for example instagram, Disqus, pinterest – you can find more here: https://stackshare.io/.

How Do We Achieve Responsiveness?

Having good technologies helps here. Same applies to good code, good design patterns, and so forth. However, if we are able to implement elasticity and resilience we are through.

Next article will focus on those two principles.

 

S3-compatible cloud storage as maven and docker repositories

Let’s suppose you want to work on a personal project. Something small.

Time goes by, and you soon realize that you can’t just work on a local maven repository any longer. So, you start looking around for some Artifactory/Nexus as Service online. No way, it’s super-expensive. Then, a couple of answers there on Quora together with more Googling lead you to other cheaper alternatives: here you notice that they all ask for quite some money (49$ per month, for example!), that you frankly don’t want to spend for that tiny bunch of services you are playing with.

So, you think about hosting your own maven repository. Let’s see which one is the cheapest cloud provider. AWS, DigitalOcean, and so forth. It seems you need at least a micro instance. Then you need to make it public, so you need a few roles, subnets, internet gateways, security groups, and then you need to make the volume persistent. Holy hell!

A (relatively) Good Alternative

Since a few years, it seems that it’s possible to use s3 to store your jars/artifacts. For example, gradle supports publishing to s3 since version 2.4 (release notes). For maven there seems to be at least one plugin (maven-s3-wagon).

At this point, it becomes trivial:

  • Setup AWS account
  • Create IAM user
  • Create bucket `s3://my-mvn-repo.us-east-1.amazonaws.com`
  • Use this url into your gradle/mvn project and voila`

Now, what if you don’t want to use AWS, but an alternative service, like DigitalOcean, or Wasabi, or Dreamhost? They are all very valid alternatives. As of today, for example, DigitalOcean requires *only* 5$/month for ~250GB + a lot of egress requests. However, alternatives like Wasabi or Dreamhost may provide a more affordable price, depending on how much storage and egress you need. Please, also consider that this article doesn’t take into account availability-zones, number of data-centers, etc.

Are they really s3-compatible?

Of the three considered, only Wasabi claims to be 100% compatible with S3.

DigitalOcean and Dreamhost, on the other hand, cover the most common features. See for yourself:

What is also important to know is that the tooling part is not always clear about the support for S3-compatible alternatives. However, eventually, with some tutorials online and some good documentation it’s possible to make stuff work. Sometimes it may need a hack, or some workaround.

How to setup a gradle project with Wasabi, for example?

The following gradle fragment gives an idea on how to use Wasabi (or any other S3-compatible alternative):

apply plugin: 'maven-publish'

publishing {
    repositories {
        maven {
            url "s3://my-bucket/releases"
            credentials(AwsCredentials) {
                accessKey "$accessKey"
                secretKey "$secretKey"
            }
        }
    }
}

It’s quite simple, right? Well, if you use the AWS S3 service, your URL will look a bit different (e.g., s3://<bucket>.s3-<region>.amazonaws.com). Will this work right away? Nope. You’ll have to set a property (it only works from command line, it seems):

./gradlew publish -Dorg.gradle.s3.endpoint=https://s3.wasabisys.com

and it’ll work.

What about Docker?

If we can do this for jar files, why can’t we do the same for Docker? After all, Docker registry is “just” a wrapper around a filesystem (with lots of features, like APIs, authentication, and so on).

Docker is well designed, and this allows us to select a storage driver. More specifically, we are interested to the s3 storage driver.

So, how do we do that?

The following docker-compose will give you an idea:

version: '3'
services:
  registry:
    image: registry:2
    ports:
     - "5000:5000"
    environment:
     - REGISTRY_STORAGE=s3
     - REGISTRY_STORAGE_S3_ACCESSKEY=accessKey
     - REGISTRY_STORAGE_S3_SECRETKEY=secretKey
     - REGISTRY_STORAGE_S3_BUCKET=com.example.docker-registry
     - REGISTRY_STORAGE_S3_REGION=us-east-1
     - REGISTRY_STORAGE_S3_REGIONENDPOINT=https://s3.wasabisys.com/

This registry still needs to run somewhere, however, it can be a transient container that you run on your machine just to retrieve a specific image and run it. Until you have the need for a more thorough infrastructure, you could start with running the container each time you want to pull/push to your s3 bucket. Data will stay there, so, no worries.

$ docker-compose up
...

In another terminal:

$ docker pull hello-world
$ docker tag hello-world localhost:5000/hello-world
$ docker push localhost:5000/hello-world

Now you should have the hello-world image into your s3 bucket.

Conclusion

With the approach mentioned in the article, it is possible to have a cheap and quick-to-setup pipeline without hosting right from the start your Nexus/Artifactory or Docker Registry.

In my opinion, with the right people in the team, this approach can even scale up quickly – maybe not for large companies. For example, small teams could benefit from this even further by having a small droplet/EC2 instance for docker.

External References

Quickstart for a Django project with Docker

In this post we will see how to set up quickly a Django project with Docker, so that it will be less painful to set up a CI pipeline on any environment of your choice (AWS, DigitalOcean, etc.).

This is something that should be done as soon as possible when bootstrapping your project – stop doing things that require a set of pre-installed packages only available on your machine. The earlier you do this, the less painful it will be – yourself included!

1. Clone and/or Create a Repository

You will need a repository for your project, right? So, either clone one already existing or create a new one.

You can clone the repo I created for this tutorial so that we can refer to the same code:

$ git clone https://github.com/markon/django-rest-quickstart

2. Create environment with Pipenv

Pipenv is the recommended Python packaging tool.

Install it:

$ pip install pipenv

and create a new environment in your repo:

$ pipenv --three # if you want to use python 3

and verify that everything went fine:

Pipfile found at {...}/tutorials/django-rest-quickstart/Pipfile. 
Considering this to be the project home.

If you open the file, you will find the following content:

$ cat Pipfile
[source]
url = "https://pypi.python.org/simple"
verify_ssl = true

Essentially, a brand new project! However, we know we want to use Django, so, let’s add some dependencies:

[packages]
psycopg2 = ">=2.7.3.1"
Django = ">=1.11,<2.0"

[requires]
python_version = "3.6"

For this tutorial we don’t care right now about the specific Django version. Whatever is recent enough should be good – however, you should not do that on production! Many things can be updated between two versions, and having such a file checked in implies that whatever recent version will be taken – don’t use version ranges, unless you know what you are doing.

As we want to have deterministic builds, we could use Pipenv lock, that generates a file containing the specific versions we want to use.

$ pipenv lock
Locking [dev-packages] dependencies...
Locking [packages] dependencies...
Updated Pipfile.lock!

This way, we can easily “freeze” the dependencies to a specific version, so that everyone else will install exactly the same dependencies as we do – no more problems like “hey, I am using version 1.7.0.1 while you use 1.7.0.4“. One version for all of us!

3. Create a Dockerfile

Before starting a Django project, we really want to make sure not to skip this important step – creating a Dockerfile:

FROM python:3.6-alpine

ENV PYTHONUNBUFFERED 1

RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/v.3.6/main --no-cache postgresql-dev
RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/v.3.6/main --no-cache gcc
RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/v.3.6/main --no-cache python3-dev
RUN apk add --repository http://dl-cdn.alpinelinux.org/alpine/v.3.6/main --no-cache musl-dev

RUN mkdir /code
WORKDIR /code
ADD . /code/

RUN pip3 install pipenv
RUN pipenv --three
RUN pipenv install --deploy --system

EXPOSE 8000

CMD ["python", "manage.py", "runserver"]

To summarize what it does:

  • Use python:3.6-alpine
  • Install PostgreSQL (we will need it for our project)
  • Create a new directory /code inside the container and the current directory to it
  • Install Pipenv for Python 3
  • Expose port 8000 for our Django web service
  • Run manage.py runserver by default

4. Create a docker-compose.yaml file

Following the official documentation from Docker, let’s create now a docker-compose file:

version: '3'

services:
 db:
   image: postgres
 web:
   build: .
   command: python3 manage.py runserver 0.0.0.0:8000
   volumes:
    - .:/code
   ports:
     - "8000:8000"
   depends_on:
   - db

We mount the current directory as /code, so that we don’t have to build a new Docker image every time we change the code. However, be careful: if you update your dependencies (e.g., a little change in Pipfile), your Docker image needs to be updated, because they come pre-installed.

5. Create a Django project

Now we are finally able to create a Django project! Go to the root of your project

 docker-compose run web django-admin.py startproject djangorestapi .

You should now see a djangorestapi folder and a manage.py in your project.

6. Start coding your app

Finally, we can add some code to create some features. An example I would like you to refer to is based on the previous post – RBAC with django-rest-framework and django-guardian.

The full code is visible in the repository, as it’s a bit out of scope for this tutorial.

7. Test locally

Normally, you would want to run migrations first.

docker-compose run web python manage.py migrate

then you will be able to run

docker-compose up

and wait until your service is up and running. Then you can play with your app available on http://localhost:8000/api/profiles. Please follow the README file to see how to run the examples.

Note: sometimes you may get a ConnectionRefused error, because PostgreSQL is slower than Django to start up. In this case, you could just re-run the command twice.

 

RBAC + ACLs with django-rest-framework and django-guardian

Recently, I have been working on a personal project developed in Django. It was the first time I used django-rest-framework, and I got to say: it’s impressive how easy it has become to develop something with it.

Usually, RBAC is something not trivial to implement with the current technologies – lots of times you end up writing custom code here and there. Also, I see more and more often developers assigning roles to users and having a call to an ideal “has_role(whatever_role_here)” method to authorize a user. However, roles and their assignments can change over time: what needs to be checked is the particular permission needed to perform that action, not the role. A little hint here: try to go for a fine-grained, not a coarse-grained permission check, because you never know: an AdminRole may be allowed today to be able to view your billings, but tomorrow you’ll want to limit that action to an AccountingAdminRole – and this implies code changes!

Unfortunately, RBAC doesn’t allow to set specific permissions for individual objects, and sometimes you really need that – for example, you don’t want to allow any user to edit other users’ information. However, django-rest-framework and django-guardian, which is really the missing Django permission/authorization tool thanks to all the extremely useful shortcuts it provides, are two excellent tools that help you use RBAC effectively and overcome this limitation, so that you can extend the “role-delegation” behavior with custom ACLs (to have per-object permissions). In fact, by using the permissions API together with django-guardian, it’s really easy to implement multiple use cases to authorize your users. This way, you can benefit from RBAC, by assigning user/roles/permissions while at the same time you can use ACLs to assign individual permissions. Finally, although this is not really related to RBAC, there is support for third-party packages to do authentication, like JWT, or OAuth2.

In the following paragraphs we will set up authentication, authorization and filters for the entire project – meaning, for all our API endpoints/models exposed via django-rest-framework. The procedure to set custom authorization/filtering for only a few specific classes is described in the official documentation, and it’s really not difficult to set up.

1. Set Authentication type

For simplicity, we will assume we are using Basic/SessionAuthentication. Let’s start by writing down the following:


REST_FRAMEWORK = {
  'DEFAULT_AUTHENTICATION_CLASSES': (
    'rest_framework.authentication.BasicAuthentication',
    'rest_framework.authentication.SessionAuthentication',
  )
}

also, we need to add ‘guardian‘ and ‘rest_framework‘ to the list of INSTALLED_APPS.

After that, we can focus on the authorization part (django-rest-framework calls it permissions).

2. Set Authorization classes

One of the best features django-rest-framework provides is the so called “per-object permissions on models” implemented via the DjangoObjectPermissions class. For example, let’s suppose that User A creates a new Post 123, while User B creates a new Post 456, we want these two users to be able to perform actions only on the the Post they have created – we don’t want User A to mess up with Post 456.

By using DjangoObjectPermissions, we can easily map who can do what on which object. However, as the documentation says, you will need to install django-guardian.

Normally, in addition to the already existing operations that Django supports out of the box, add, change and delete, you’ll probably want to add some limitations on who can view specific objects – this works so good with the concept of filters!

You will need to add the following code somewhere in your project:


from rest_framework import permissions

class CustomObjectPermissions(permissions.DjangoObjectPermissions):
  """
  Similar to `DjangoObjectPermissions`, but adding 'view' permissions.
  """
  perms_map = {
    'GET': ['%(app_label)s.view_%(model_name)s'],
    'OPTIONS': ['%(app_label)s.view_%(model_name)s'],
    'HEAD': ['%(app_label)s.view_%(model_name)s'],
    'POST': ['%(app_label)s.add_%(model_name)s'],
    'PUT': ['%(app_label)s.change_%(model_name)s'],
    'PATCH': ['%(app_label)s.change_%(model_name)s'],
    'DELETE': ['%(app_label)s.delete_%(model_name)s'],
  }

At this point, we have to set this class in the REST_FRAMEWORK map we have declared above:

REST_FRAMEWORK = {
  ...,
  'DEFAULT_PERMISSION_CLASSES': (
    'module_containing.CustomObjectPermissions',
  ),
}

and have the following to enable django-guardian backend:

AUTHENTICATION_BACKEND = (
    'django.contrib.auth.backends.ModelBackend', # default
    'guardian.backends.ObjectPermissionBackend',
)

So far, we have added some configuration to tell django-rest-framework that we would like to have the possibility to use permissions on individual objects. However, we have not specified yet that we want to limit the objects a logged-in user is allowed to view.

3. Add filtering for ‘view’ operations

django-rest-framework recommends to use DjangoObjectsPermissionFilter. In order to do so, we need to add one more class to the REST_FRAMEWORK map:

REST_FRAMEWORK = {
  ...,
  ...,
  'DEFAULT_FILTER_BACKENDS': (
    'rest_framework.filters.DjangoObjectPermissionsFilter',
  ),
}

4. Add ‘view’ permissions to your models

As already mentioned, Django doesn’t come with a view permission on models. Therefore, we will have to add this manually for each model, like the following code shows:

class UserProfile(models.Model):
  user = models.OneToOneField(settings.AUTH_USER_MODEL,
                              on_delete=models.CASCADE,
                              related_name='profile')

  class Meta:
    permissions = (
      ('view_userprofile', 'View UserProfile'),
    )

Once we create a User and we give it a UserProfile, the logged-in user will only be able to retrieve his own UserProfile object. Also, don’t forget to create a migration for this and for the models used by django-guardian.

5. Create views and serializers

At the end, of course, we should not forget to create the serializers and views:

class UserProfileViewSet(viewsets.ModelViewSet):
  serializer_class = UserProfileSerializer
  queryset = UserProfile.objects.all()

I leave the serializer implementation up to you. 🙂

Note: a fully functioning sample can be found here: https://github.com/markon/django-rest-quickstart.

Further Readings