Two years ago, a small team at Google started working on making Swift the first mainstream language with first-class language-integrated differentiable programming capabilities. The scope and initial results of the project have been remarkable, and general public usability is not very far off.

Despite this, the project hasn’t received a lot of interest in the machine learning community and remains unknown to most practitioners. This can be attributed in part to the choice of language, which has largely been met with confusion and indifference, as Swift has almost no presence in the data science ecosystem and has mainly been used for building iOS apps.

This is unfortunate though, as even a cursory glance at Google’s project will show that it’s a massive and ambitious undertaking, which could establish Swift as a key player in the area. Furthermore, even though we mainly work with Python at Tryolabs, we think that choosing Swift was a superb idea, and decided to write this short™ post to help spread the word about Google’s plans.

But before we get into Swift and what the term differentiable programming actually means, we should first review the current state of affairs.

Due to the popularity of the post,
we will be hosting a Swift for ML live webinar.
Register to be notified when date & time are confirmed.
Be updated

What is wrong with you, Python?!

Python is by far the most used language in machine learning, and Google has a ton of machine learning libraries and tools written in it. So, why Swift? What’s wrong with Python?

To put it bluntly, Python is slow. Also, Python is not great for parallelism.

To get around these facts, most machine learning projects run their compute-intensive algorithms via libraries written in C/C++/Fortran/CUDA, and use Python to glue the different low-level operations together. For the most part, this has worked really well, but as with all abstractions, it can create some problems. Let’s go over some of those.

External binaries

Calling external binaries for every compute-intensive operation limits developers to working on a small portion of the algorithm’s surface area. Writing a custom way to perform convolutions, for example, becomes off limits unless the developer is willing to step down into a language like C. Most programmers choose not to do so, either because they have no experience with writing low level performant code, or because jumping back and forth between Python’s development environment and some low level language’s environment becomes too cumbersome to justify.

This leads to the unfortunate situation in which programmers are motivated to write the least amount of sophisticated code they can, and default to calling external library operations. This is the opposite of what’s desirable in an area as dynamic as machine learning, where so much is still not settled, and new ideas are very much needed.

Library abstractions

Having your Python code call lower level code is not as easy as mapping Python’s functions to C functions. The unfortunate reality is that the creators of machine learning libraries have had to make certain development choices in the name of performance, and that has complicated matters a bit. For example, in Tensorflow graph mode, which is the only performant mode in the library, your Python code doesn’t generally run when you think it would. Python actually acts as a sort of metaprogramming language for the underlying Tensorflow graph.

The development flow is as follows: the developer first defines the network using Python, then the TensorFlow backend uses this definition to build the network and compile it into a blob whose internals the developer can no longer access. After compilation, the network is finally ready to run, and the developer can start feeding it data for training/inference jobs. This way of working makes debugging quite hard, as you can’t use Python to dig into what’s happening inside your network as it runs. You can’t use something like pdb. Even if you wish to engage in good old print debugging, you’ll have to use tf.print and build a print node into your network, which has to connect to another node in your network, and be compiled before anything can be printed.

More straightforward solutions exist, though. In PyTorch, your code runs imperatively as is expected in Python, and the only non transparent abstraction is that the operations that run on the GPU execute asynchronously. This is generally a non-issue as PyTorch is smart about this and waits for all the async calls that are dependencies of any user interactive operations to finish before ceding control. Still, this is something to keep in mind, especially with things such as benchmarking.

Industry lag

All these usability problems aren’t just making it more difficult to write code, they are unnecessarily causing the industry to lag behind academia. There have been several papers that tweak low level operations used in neural networks, introducing improvements in accuracy of a few percentage points in the process, and have still taken a long time for the industry to adopt.

One reason for this is that even though these algorithmic changes tend to be quite simple themselves, the tooling problems mentioned above make them extremely difficult to implement. Hence, they may not be deemed worth the effort for what often results in only a 1% improvement in accuracy. This is especially problematic for small machine learning dev shops that usually lack the economies of scale to justify paying for their implementation/integration.

Therefore, companies tend to ignore these improvements until they get added to a library like PyTorch or TensorFlow. This saves them the implementation and integration costs, but it also causes industry to lag behind academia by 1 or 2 years, as the library maintainers can’t be expected to immediately implement the findings of every new paper that is published.

One concrete example of this issue are Deformable Convolutions, which seem to improve the performance of most Convolutional Neural Networks (CNNs). An open source implementation appeared about 2 years ago. Nevertheless, the implementation was cumbersome to integrate into PyTorch/TensorFlow and the algorithm didn’t gain widespread use. Only just recently has PyTorch added support for it, and as of yet I am not aware of there being an official TensorFlow version.

Now, let’s say this happens for several papers that each contribute a performance enhancement of 2%; the industry could be missing out on significant accuracy improvements of 1.02^n% for no reason other than inadequate tooling. This is regrettable, considering the n could be quite high.

Speed

Using Python + fast libraries can still be slow in some cases. Yes, for CNNs running classification on images, using Python and PyTorch/TensorFlow will be really fast. What’s more, there is probably not much performance to be gained by coding your whole network in CUDA, as most of the inference time is spent on big convolutions that are already running in well-optimized implementations. This isn’t always the case though.

Networks that consist of many small operations are often the most susceptible to taking performance hits, if they are not fully implemented in a low level language. As an example, in a blogpost in which he professes his love for using Swift for deep learning, Fast.AI’s Jeremy Howard reports that despite using PyTorch’s great JIT compiler, he still couldn’t make a particular RNN work as fast as a version completely implemented in pure CUDA.

Furthermore, Python is not a very good language for cases where latency is important, nor for very low level tasks such as communicating with sensors. The way some companies choose to get around this is to start by developing their models in PyTorch/TensorFlow-Python. In this way, they take advantage of Python’s ease of use when experimenting with and training new models. After this, they rewrite their model in C++ for production purposes. I’m not sure if they rewrite it completely, or if they simply serialize it using PyTorch’s tracing functionality or TensorFlow’s graph mode, and then rewrite the Python code around it in C++. Either way, a lot of Python code would need to be rewritten, which oftentimes is too costly for small companies to do.

All these problems are well known. Yann LeCun, who is widely considered one of the godfathers of deep learning, has stated that there is a need for a new machine learning language. In a twitter thread PyTorch co-creator Soumith Chintala and him discussed several languages as possible candidates, with Julia, Swift, and even improving Python being mentioned. On the other hand, Fast AI’s Jeremy Howard seems to be decidedly on the Swift train.

Google accepts the challenge

Lucky for us, Google’s Swift for Tensorflow (S4TF) team took the matter into their own hands. What’s even better, they have been remarkably transparent about the whole process. In an extremely thorough document, they detail the journey that got them to this decision, explaining what languages they considered for the task and why they ended up using Swift.

Most notably, they considered:

  • Go: In the document, they state that Go relies too heavily on the dynamic dispatching that its interfaces provide, and that they would have had to make big changes to the language in order to implement the features they wanted. This goes against Go’s philosophy of being simple and having a small surface area. Conversely, Swift’s protocols & extensions afford a lot of freedom in terms of how static you want your dispatch to be. Also, the language is quite complex (and getting more complex every day), so making it a bit more complex to accommodate the features that Google wanted wouldn’t pose as big of a problem.

  • C++ & Rust: Google’s targeted user base is people who are used to working in Python for the most part, and who are more interested in spending their time thinking about the model and the data rather than thinking about things like the careful management of memory allocation or ownership. Rust and C++ have a level of complexity and attention to low level detail that is generally not justifiable when doing data science/machine learning development.

  • Julia: If you read any HackerNews or Reddit threads about S4TF, the first comment usually is, “Why didn’t they choose Julia?”. In the previously mentioned document, Google mentions that Julia looked promising too, but they didn’t really provide a solid reason as to why they didn’t go for it. They mentioned that Swift has a much larger community than Julia, which is true, but Julia’s scientific and data science communities are much larger than Swift’s, and these are arguably the communities that would make more use of S4TF. Something to keep in mind is that Google’s team had more expertise in Swift, given that Swift’s creator Chris Lattner started the project, so this probably played a big part in the decision.

  • A new language: I think they said it best in the manifesto: “Creating a language is a ridiculous amount of work”. This would take too long, and machine learning is moving way too fast.

What’s so cool about Swift, then?

In short, Swift allows you to program at a very high level, in an almost Pythonic way, while at the same time being really fast. A data scientist could use Swift in much the same way as they use Python, while someone working in an optimized machine learning library built in Swift could be more careful about how they manage their memory, and could even drop down to the pointer level of abstraction when idiomatic Swift is too restrictive.

Providing a detailed description of the language is probably overkill for the purpose of this article. The official documentation already does a much better job than I ever could. Instead, I’ll describe a few things that I found to be cool about Swift as a new fan of the language, in hopes that this will entice people to try it. The following chapters will be an assortment of random cool things about Swift, in no particular order, and with no particular attention paid to their overall significance. After these, I’ll delve into differentiable programming and Google’s big plan for Swift.

Cool thing number one

It’s fast. This is the first thing I tested when I got started with Swift. I wrote a few short scripts to evaluate how well it would fare against Python and C. These tests are not particularly sophisticated, to be honest. They just fill an array with integers and then add them all up. This by itself is not a very thorough way of testing how fast idiomatic Swift is, but I was curious if Swift could ever be as fast as C, not if Swift would always be as fast as C.

For the first comparison, I went with Swift vs Python. I took some artistic liberties with curly brace placement in Swift so that each line is basically doing the same thing in both cases.

import time                    | import Foundation
                               |
result = []                    | var result = [Int]()
for it in range(15):           | for it in 0..<15 {
    start = time.time()        |     let start = CFAbsoluteTimeGetCurrent()
    for _ in range(3000):      |     for _ in 0..<3000 {
        result.append(it)      |         result.append(it)}
    sum_ = sum(result)         |     let sum = result.reduce(0, +)
    end = time.time()          |     let end = CFAbsoluteTimeGetCurrent()
    print(end - start, sum_)   |     print(end - start, sum)
    result = []                |     result = []}

Although their syntax is very similar in this particular snippet, the Swift script proves to be 25 times faster than the Python one. Each outermost loop in the Python script completes in 360μs on average, vs 14μs for Swift. This is quite an improvement.

There are yet other interesting things to note. Namely, + is an operator as well as a function that gets passed to reduce (which I’ll elaborate on later), CFAbsoluteTimeGetCurrent reveals Swift’s quirks regarding legacy iOS namespaces, and the ..< range operator lets you choose if the range is inclusive, and on which end.

This test doesn’t really tell us how fast Swift can be, though. To find that out, we need to compare it to C. So, that’s what I did, and much to my disappointment, the initial results weren’t good. The version written in C took an average of 1.5μs, which is ten times faster than our Swift code. Uh oh.

To be fair, this isn’t a terribly honest comparison. The Swift script is using a dynamic array, which is getting repeatedly reallocated in the heap as it increases in size. This also means it’s performing bound checking on each append. To corroborate this, we can go look at its definition. Swift standard types like int, float, and array are not hardcoded into the compiler, they are structs defined in the standard library. Thus, according to the array’s append definition, we see there’s a lot going on. Knowing this, I evened the playing field by preallocating the array’s memory and using a pointer for filling the array. The resulting script is not much longer:

import Foundation
// Preallocating memory
var result = ContiguousArray<Int>(repeating: 0, count: 3001)
for it in 0..<15 {
    let start = CFAbsoluteTimeGetCurrent()
    // Using a buffer pointer for assignment
    result.withUnsafeMutableBufferPointer({ buffer in
        for i in 0..<3000 {
            buffer[i] = it
        }
    })
    let sum = result.reduce(0, +)
    let end = CFAbsoluteTimeGetCurrent()
    print(end - start, sum)

This new code takes 3μs, so it’s now half as fast as C, which is already a good place to be in. Just for the sake of completeness, though, I continued profiling the code in order to find what the difference with the C version was. It turns out that the reduce method I was using is performing some unnecessary indirection with the usage of a nextPartialResult function, which is providing nonessential generalization. After rewriting it utilizing a pointer, I finally got it to C speed. However, this obviously defeats the purpose of using Swift, since at this point we are just writing more verbose and uglier C. Nevertheless, it’s good to know that you can get C speed if you really need it.

To sum up: you won’t get C speed with a Python amount of work, but you will get a great tradeoff between the two.

Cool thing número dos

Swift has taken an interesting approach to function signatures. In their most basic form, they are relatively simple:

func greet(person: String, town: String) -> String {
    return "Hello (person)!  Glad you could visit from (town)."
}
greet(person: "Bill", town: "Cupertino")

The function signature consists of the parameter names followed by their types; nothing too fancy. The only unusual thing is that Swift requires you to provide the parameter names when calling the function, so you have to write person and town when calling greet, as evidenced by the last line of the snippet above.

Things get more interesting when we introduce something called argument labels into the mix.

func greet(_ person: String, from town: String) -> String {
    return "Hello (person)!  Glad you could visit from (town)."
}
greet("Bill", from: "Cupertino")

Argument labels are just what they sound like: they are labels for your function’s parameters, and they are declared before their respective parameter in the function’s signature. In the sample above from would be town’s argument label, and _ would be person’s argument label. I used _ for this last label because _ is a special case in Swift that means, “don’t provide any argument name when calling this parameter.”

With argument labels, every parameter gets 2 different names: an argument label, which is used for calling the function, and a parameter name, which is used inside the function’s body definition. This may seem a bit arbitrary, but it makes your code easier to read.

If you look at the function signature above, it’s almost like reading English: “Greet person from town.” The function call is just as descriptive: “Greet Bill from Cupertino.” Without argument labels, things become a bit more ambiguous: “Greet person town.” We don’t know what town stands for. Is that the town we are in now? Is that the town in which we are going to meet the person? Or is it the town where the person is originally from? Without argument labels we would have to read the function’s body to know what’s happening, or resort to making the function name or the parameter names longer and more descriptive. This can get complicated if you have lots of parameters, and in my opinion tends to result in uglier code and needlessly long function names. Argument labels are prettier and scale better, and luckily, they are used extensively in Swift.

The third of the cool things

Swift makes extensive use of closures. Therefore, it has some shortcuts to make their usage more ergonomic. This example taken from the language’s documentation highlights how concise and expressive these shortcuts can make Swift look.

Let’s take an array that we want to sort backwards:

let names = ["Chris", "Alex", "Ewa", "Barry", "Daniella"]

The less idiomatic way of doing this would be to use Swift’s sorted method for arrays, and employ a custom function that tells it how to do pairwise order comparison on the array’s elements, like so:

func backward(_ s1: String, _ s2: String) -> Bool {
    return s1 > s2
}
var reversedNames = names.sorted(by: backward)

The backward function compares two items at once, and returns true if they are in the desired order and false if they are not. The sorted array method expects such a function as an input in order to know how to sort the array. As a side note, here we can also see the usage of the argument label by, which is oh so beautifully terse.

If we resort to more idiomatic Swift, we find that there is a better way to do this using closures:

reversedNames = names.sorted(by: { s1, s2 in return s1 > s2 } )

The code between { } is a closure that is being defined and passed as an argument tosorted at the same time. If you’ve never heard of them, closures are just unnamed functions which capture their context. A good way to think about them is as Python lambdas on steroids. The keyword in in the closure is used to separate the closure’s arguments and its body. More intuitive keywords such as : were already taken for signature type definitions (the closure’s argument types get automatically inferred from sorted’s signature in this case, so they can be avoided), and we all know naming things is one of the hardest things to do in programming, so we are stuck with using the not so intuitive in keyword for this.

In any case, the code is already looking more concise.

We can, however, do better:

reversedNames = names.sorted(by: { s1, s2 in s1 > s2 } )

We removed the return statement here because, in Swift, single line closures implicitly return.

Still, we can go even deeper:

reversedNames = names.sorted(by: { $0 > $1 } )

Swift also has implicitly named positional parameters, so in the above case $0 is the first argument, $1 the second, $2 would be the third, and so on. The code is already compact and easy to understand, but we can do better yet:

reversedNames = names.sorted(by: >)

In Swift, the > operator is just a function named >. Therefore, we can pass it to our sorted method, making our code extremely concise and readable.

This also applies to operators like +=,-=, <, >, ==, and =, and you’ll find their definition in the standard library. The difference between these functions/operators and normal functions is that the former have been explicitly declared as operators using the infix, prefix or suffix keywords in the standard library. For instance, the += function is defined as an operator on this line of the Swift standard library. You can see that the operator conforms to several different protocols such as Array and String, as many different types have their own implementation of the += function.

Of further interest is that we can define our own custom operators. One great example of this is in the GPUImage2 library. The library allows users to load a picture, modify it with a sequence of transformations, and then output it in some way. Naturally, the definition of these sequences of transformations shows up repeatedly in the library, so the library’s creator decided to define a new operator called --> that would be used to chain these transformations together:

func -->(source:T, destination:T) -> T {
    source.addTarget(destination)
    return destination
}
infix operator --> : AdditionPrecedence

In the slightly simplified code above, the --> function is first declared, and then defined as an infix operator. Infix just means that to use the operator, you must place it between its two arguments. This allows you to write code such as the following:

let testImage = UIImage(named:"WID-small.jpg")!
let toonFilter = SmoothToonFilter()
let luminanceFilter = Luminance()
let filteredImage = testImage.filterWithPipeline{input, output in
    input --> toonFilter --> luminanceFilter --> output  // Interesting part
}

The above is shorter and easier to understand than a bunch of chained methods, or a series ofsource.addTarget(...) functions.

The fourth of the cool things

Previously, I mentioned that the basic Swift types were structs defined in the standard library, and not hardcoded into the compiler as they usually are in other languages. One reason this is useful is that it lets us use a Swift feature called extension, which allows us to add new functionality to any type, including the basic types. Here is how this can play out:

extension Double {
    var radians: Double {
        return self * (Double.pi / 180)
    }
}
360.radians // -> 6.28319

Though not particularly useful, this example shows how extensible the language is, as it lets you do things such as typing any number into a Swift interpreter, and call any custom method you want on it.

Last one

On top of having a compiler, Swift also has an interpreter and support for Jupyter Notebooks. The interpreter is particularly great for learning the language, as it allows you to type swift at your command prompt and start trying out code right away, much in the same way you would with Python. On the other hand, the integration with Jupyter Notebooks is awesome for visualizing data, performing data exploration, and writing reports. Finally, when you want to run production code, you can compile it and take advantage of the great optimization LLVM provides.

Google’s master plan

I mentioned quite a few features in the paragraphs above, but there’s one feature that stands apart from the others: Jupyter support is very new, and was in fact added by the S4TF team. This is noteworthy because it gives us an idea of what Google’s state of mind is when working on this project: they don’t just want to create a library for Swift, they want to deeply improve the Swift language itself, along with its tooling, and then build a new Tensorflow library using this improved version of the language.

This point is illustrated best by observing where the S4TF team has been spending most of its time. The majority of the work they’ve done has been on Apple’s Swift compiler repository itself. More specifically, most of the work Google has been doing lies in a dev branch inside the Swift compiler repo. Google is adding new features to the Swift language, first creating and testing them in their own branch and then merging them into Apple’s master branch. This means that the standard Swift language running on iOS devices all around the world will eventually incorporate these improvements.

Now, on to the juicy part: What are the features that Google is building into Swift?

Let’s start with the big one.

Differentiable programming

Lately, there’s been a lot of hype surrounding differentiable programming. Tesla’s director of AI, Andrej Karpathy, has called it Software 2.0, while Yan LeCun has proclaimed: “Deep Learning est mort. Vive Differentiable Programming.” Others claim there will be a need for the creation of a whole new set of new tools, such as a new Git, new IDEs, and of course new programming languages. Wink wink.

So, what is differentiable programming?

In a nutshell, differentiable programming is a programming paradigm in which your program itself can be differentiated. This allows you to set a certain objective you want to optimize, have your program automatically calculate the gradient of itself with regards to this objective, and then fine-tune itself in the direction of this gradient. This is exactly what you do when you train a neural network.

The most compelling thing about having a program tune itself is that it allows us to create the sorts of programs we seem to be completely incapable of programming by ourselves. An interesting way to think about this is that your program using its gradients to tune itself for a certain task is better at programming than you are. These past few years have shown that this is indeed true for an ever increasing number of cases, with no clear end to that growth in sight.

A differentiable language

After that really long introduction, it’s finally time to introduce Google’s vision of how native differentiable programming will look in Swift:

func cube(_ x: Float) -> Float {
    return x * x * x
}
let cube𝛁 = gradient(of: cube)
cube(2)   // 8.0
cube𝛁(2)  // 12.0

Here we start by defining a simple function named cube, which returns the cube of its input. Next comes the exciting part: we create the derivative function of our original function, merely by calling gradient on it. There are no libraries or external code being used here, gradient is simply a new function that is being introduced by the S4TF team into the Swift language. This function takes advantage of the changes the team made to Swift’s core, in order to automatically calculate gradient functions.

This is Swift’s big new feature. You can take arbitrary Swift code and, as long as it’s differentiable, automatically calculate its gradient. The code above has no imports or weird dependencies, it’s just plain Swift. If you’ve ever used PyTorch, TensorFlow, or any of the other big machine learning libraries, they all support this feature, but only if you’re using their particular library specific operations. What’s more, working with gradients in these Python libraries is not as lightweight, transparent, or well integrated as it is in plain Swift.

This is a massive new feature of the language and, as far as I can tell, Swift is the first mainstream language that has native support for such a thing.

To further illustrate how this would look in the real world, the following script is a more thorough example of this new feature, applied to a standard machine learning training workflow:

struct Perceptron: @memberwise Differentiable {
    var weight: SIMD2<Float> = .random(in: -1..<1)
    var bias: Float = 0
    @differentiable
    func callAsFunction(_ input: SIMD2<Float>) -> Float {
        (weight * input).sum() + bias
    }
}
var model = Perceptron()
let andGateData: [(x: SIMD2<Float>, y: Float)] = [
    (x: [0, 0], y: 0),
    (x: [0, 1], y: 0),
    (x: [1, 0], y: 0),
    (x: [1, 1], y: 1),
]
for _ in 0..<100 {
    let (loss, 𝛁loss) = valueWithGradient(at: model) { model -> Float in
        var loss: Float = 0
        for (x, y) in andGateData {
            let ŷ = model(x)
            let error = y - ŷ
            loss = loss + error * error / 2
        }
        return loss
    }
    print(loss)
    model.weight -= 𝛁loss.weight * 0.02
    model.bias -= 𝛁loss.bias * 0.02
}

Again, the above code is all plain Swift with no external dependencies. In this snippet, we see that Google has introduced two new Swift features: callAsFunction and valueWithGradient. The first one is quite simple, it lets us instantiate classes and structs, and then call them as if they were functions. Here the Perceptron struct gets instantiated as model, and then model gets called as a function in let ŷ = model(x). When you do this, callAsFunction is the method actually being called. If you’ve ever used Keras or PyTorch models, you know that this is quite a common way of handling models/layers. While these two libraries use Python’s __call__ method to implement their call and forward methods, respectively, Swift had no such feature, and thus Google had to add it.

The other interesting new feature in the above script is valueWithGradient. This function returns the resulting value and gradient of a function or closure, evaluated at a particular point. In the case above, the closure we define and use as input for valueWithGradient is actually our loss function. This loss function takes our model as an input, so when we say that valueWithGradient will evaluate our function at a particular point, we mean that it will evaluate our loss function with our model in a particular weight configuration. After we have calculated the aforementioned value and gradient, we print the value (which is our loss), and update our model’s weights using the gradients. Repeat this a hundred times and we have a trained model. You’ll notice that we can access andGateData inside our loss function, which is an example of Swift closures being able to capture their enclosing context.

Differentiating external code

Another fantastic feature is that not only can we differentiate Swift operations, but we can also differentiate operations in external, non-Swift libraries, if we manually tell Swift what their derivatives are. This means you can use a C library with a very fast implementation of some operation not currently present in Swift, import it into your project, code the derivative, and then use this operation in your big neural network and have things like backpropagation work seamlessly.

What’s more, making this happen is really simple:

import Glibc  // we import pow and log from here
func powerOf2(_ x: Float) -> Float {
    return pow(2, x)
}
@derivative(of: powerOf2)
func dPowerOf2d(_ x: Float) -> (value: Float, pullback: (Float) -> Float) {
    let d = powerOf2(x) * log(2)
    return (value: d, pullback: { v in v * d })
}
powerOf2(3),               // 8
gradient(of: powerOf2)(3)  // 5.545

Glibc is a C library, so the Swift compiler doesn’t know what the derivatives of its operations are. We can give the compiler this information by using @derivative and then use these external operations along with our native operations to form big differentiable networks very easily. In the example, we import pow and log from Glibc and use them to create a powerOf2 function and its derivative.

The current incarnation of the new TensorFlow library for Swift is being built using this feature. The library imports all of its operations from the C API of the TF Eager library, but instead of plugging into TensorFlow’s automatic differentiation system, it specifies the derivative of each basic operation and lets Swift handle it. This isn’t required for all operations, though, as many are compositions of more basic operations, and therefore Swift can automatically infer their derivatives. Basing the current version of the library on TF Eager does, however, have one big downside: TF Eager is really slow, and therefore the Swift version is too. This seems to be a temporary problem which is getting fixed with the incorporation of XLA (through x10) and MLIR.

Having said this, using this temporary solution is allowing Google’s devs to work on the Swift TensorFlow API, which is really starting to take shape. This is how a simple model training job looks:

import TensorFlow
let hiddenSize: Int = 10
struct IrisModel: Layer {
    var layer1 = Dense<Float>(inputSize: 4, outputSize: hiddenSize, activation: relu)
    var layer2 = Dense<Float>(inputSize: hiddenSize, outputSize: hiddenSize, activation: relu)
    var layer3 = Dense<Float>(inputSize: hiddenSize, outputSize: 3)
    @differentiable
    func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
        return input.sequenced(through: layer1, layer2, layer3)
    }
}
var model = IrisModel()
let optimizer = SGD(for: model, learningRate: 0.01)
let (loss, grads) = valueWithGradient(at: model) { model -> Tensor<Float> in
    let logits = model(firstTrainFeatures)
    return softmaxCrossEntropy(logits: logits, labels: firstTrainLabels)
}
print("Current loss: (loss)")

As you can tell, it’s very similar to the no-import model training script we previously saw. It has a very PyTorch-like design, which is great.

Python interoperability

One issue that Swift will have to deal with is that its current machine learning and data science ecosystems are still in their infancy. Fortunately, Google is addressing this issue with the inclusion of Python interoperability in Swift. The idea is to make it possible to write Python code inside Swift code, and in this way have access to the huge quantity of great Python libraries.

A typical use case for this would be to train a model in Swift and use Python’s matplotlib to plot it:

import Python
print(Python.version)
let np = Python.import("numpy")
let plt = Python.import("matplotlib.pyplot")
// let time = np.arange(0, 10, 0.01)
let time = Array(stride(from: 0, through: 10, by: 0.01)).makeNumpyArray()
let amplitude = np.exp(-0.1 * time)
let position = amplitude * np.sin(3 * time)
plt.figure(figsize: [15, 10])
plt.plot(time, position)
plt.plot(time, amplitude)
plt.plot(time, -amplitude)
plt.xlabel("Time (s)")
plt.ylabel("Position (m)")
plt.title("Oscillations")
plt.show()

It looks like plain old Python with the addition of let and var statements. This is a code sample provided by Google. The only modification I made was to comment out one Python line and rewrite it in Swift, to be able to see how well they interface together. It’s not as clean as doing it all in Python, since I had to use makeNumpyArray() and Array() but it works, which is awesome.

Google managed to pull this off by introducing the PythonObject type, which can represent any object in Python. The Python interop project is contained in a single Swift library, so the S4TF team only needed to make a few additions to the Swift language itself, such as the addition of some improvements to accommodate for Python’s extreme dynamism. With regards to how much Python it supports, I’ve yet to find out how they expect to manage more idiomatic Python elements such as with statements, and I am sure there are some other corner cases to be considered as well, but still, this is already an amazing feature as-is.

While on the subject of Swift’s integration with other languages, one of my initial interests in Swift was to determine how well it would fare with a real-time computer vision task. For this reason, I ended up looking for a Swift version of OpenCV, and through FastAI’s forum I ended up finding a promising OpenCV wrapper called SwiftCV. This library is peculiar, though: OpenCV is built in C++ (and has just deprecated its C API), and Swift doesn’t currently support C++ (though it is coming). Hence, SwiftCV has had to resort to wrapping OpenCV’s code in a C compatible subset of C++ code, and then importing it as C. Only after this could they wrap it in Swift.

I decided to add video support to SwiftCV, as I needed it and the project didn’t have it at the time. I also wanted to test Swift’s C interop capabilities in a more complex situation than what the tutorials describe. Therefore, I submitted this pull request, which is a useful self-contained example of how Swift’s interop with C++ through a C wrapper looks. The process was painless, even for a Swift beginner such as myself, so props to the Swift devs for that.

Current state of the project

Even after all the praise I have showered the S4TF project with, I have to admit that it is still not ready for general production usage. The new APIs are still changing, performance of the new TensorFlow library is still not great, and even though its data science ecosystem is growing, it’s still in its infancy. On top of that, Linux support is still flaky, with only Ubuntu being officially supported at the moment. With all that in mind, there is a lot of work going into ensuring all of these issues are promptly fixed.

Google is working hard on performance gains, including the recent addition of x10 and the efforts being made on getting MLIR up to par. Also, there are several projects aimed at replicating a lot of the Python data science ecosystem in Swift that originated at Google, such as SwiftPlot, the Pandas-like Penguin, and the Scikit-learn-like swiftML, to name a few.

What is most surprising, though, is that Apple is moving Swift in the same direction as Google is. On their roadmap for Swift’s next major version, they’ve established growing the Swift software ecosystem on non-Apple platforms as their primary objective. This is reflected by Apple’s support for several projects like the Swift Server Work Group, the numpy like Numerics, an official language server which runs on Linux, and the work being done to port Swift to Windows.

Furthermore, Sylvain Gugger from Fast.ai is currently building a Swift version of FastAI, and Jeremy Howard has included lessons in Swift to their massively popular online course. Also, the first academic papers built on S4TF based libraries are starting to get published!

Conclusion

In my personal opinion, while Swift has a huge chance of becoming a key player in the machine learning ecosystem, there are still risks. The biggest risk being that, in spite of its flaws, Python really is good enough for a huge portion of machine learning tasks. The inertia might be too large for many people who are already comfortable with Python and see no reason to switch over to another language. Additionally, there is the issue of Google having a well-deserved reputation for dropping large projects, and somekeydepartures from the S4TF project are leaving people worried.

Having provided these disclaimers, I still think that Swift is a great language, and the new additions are so innovative that it’s bound to eventually find its place in the machine learning community. Therefore, if you want to contribute to a project with enormous growth potential, now is a great time to start. Things are still not very established, there are a lot of tools that still need creating, and a small personal project now could become a huge community project in the future as the Swift machine learning ecosystem continues to grow.

Source Article