S3 Compatible Cloud Storage as Maven and Docker Repositories

Posted on Feb 4, 2018

Let’s suppose you want to work on a personal project. Something small.

Time goes by, and you soon realize that you can’t just work on a local maven repository any longer. So, you start looking around for some Artifactory/Nexus as Service online. No way, it’s super-expensive. Then, a couple of answers there on Quora together with more Googling lead you to other cheaper alternatives: here you notice that they all ask for quite some money (49$ per month, for example!), that you frankly don’t want to spend for that tiny bunch of services you are playing with.

So, you think about hosting your own maven repository. Let’s see which one is the cheapest cloud provider. AWS, DigitalOcean, and so forth. It seems you need at least a micro instance. Then you need to make it public, so you need a few roles, subnets, internet gateways, security groups, and then you need to make the volume persistent. Holy hell!

A (relatively) Good Alternative

Since a few years, it seems that it’s possible to use s3 to store your jars/artifacts. For example, gradle supports publishing to s3 since version 2.4 (release notes). For maven there seems to be at least one plugin (maven-s3-wagon).

At this point, it becomes trivial:

  • Setup AWS account
  • Create IAM user
  • Create bucket s3://my-mvn-repo.us-east-1.amazonaws.com
  • Use this url into your gradle/mvn project and voila`

Now, what if you don’t want to use AWS, but an alternative service, like DigitalOcean, or Wasabi, or Dreamhost? They are all very valid alternatives. As of today, for example, DigitalOcean costs only 5$/month for ~250GB + a lot of egress requests. However, alternatives like Wasabi or Dreamhost may provide a more affordable price, depending on how much storage and egress you need. Please, also consider that this article doesn’t take into account availability-zones, number of data-centers, etc.

Are they really s3-compatible?

Of the three considered, only Wasabi claims to be 100% compatible with S3.

DigitalOcean and Dreamhost, on the other hand, cover the most common features. See for yourself:

What is also important to know is that the tooling part is not always clear about the support for S3-compatible alternatives. However, eventually, with some tutorials online and some good documentation it’s possible to make stuff work. Sometimes it may need a hack, or some workaround.

How to setup a gradle project with Wasabi, for example?

The following gradle fragment gives an idea on how to use Wasabi (or any other S3-compatible alternative):

apply plugin: 'maven-publish'

publishing {
    repositories {
        maven {
            url "s3://my-bucket/releases"
            credentials(AwsCredentials) {
                accessKey "$accessKey"
                secretKey "$secretKey"
            }
        }
    }
}

It’s quite simple, right? Well, if you use the AWS S3 service, your URL will look a bit different (e.g., s3://.s3-.amazonaws.com). Will this work right away? Nope. You’ll have to set a property (it only works from command line, it seems):

./gradlew publish -Dorg.gradle.s3.endpoint=https://s3.wasabisys.com

and it’ll work.

What about Docker?

If we can do this for jar files, why can’t we do the same for Docker? After all, Docker registry is “just” a wrapper around a filesystem (with lots of features, like APIs, authentication, and so on).

Docker is well designed, and this allows us to select a storage driver. More specifically, we are interested to the s3 storage driver.

So, how do we do that?

The following docker-compose fragment will give you an idea:

version: '3'
services:
  registry:
    image: registry:2
    ports:
     - "5000:5000"
    environment:
     - REGISTRY_STORAGE=s3
     - REGISTRY_STORAGE_S3_ACCESSKEY=accessKey
     - REGISTRY_STORAGE_S3_SECRETKEY=secretKey
     - REGISTRY_STORAGE_S3_BUCKET=com.example.docker-registry
     - REGISTRY_STORAGE_S3_REGION=us-east-1
     - REGISTRY_STORAGE_S3_REGIONENDPOINT=https://s3.wasabisys.com/

This registry still needs to run somewhere, however, it can be a transient container that you run on your machine just to retrieve a specific image and run it. Until you have the need for a more thorough infrastructure, you could start with running the container each time you want to pull/push to your s3 bucket. Data will stay there, so, no worries.

Run this command to execute the fragment above:

$ docker-compose up
...

In another terminal:

$ docker pull hello-world
$ docker tag hello-world localhost:5000/hello-world
$ docker push localhost:5000/hello-world

Now you should have the hello-world image into your s3 bucket.

Conclusion

With the approach mentioned in the article, it is possible to have a cheap and quick-to-setup pipeline without hosting right from the start your Nexus/Artifactory or Docker Registry.

In my opinion, with the right people in the team, this approach can even scale up quickly - maybe not for large companies. For example, small teams could benefit from this even further by having a small droplet/EC2 instance for docker.

External References