Let's learn about Rust C bindings and FFI
Bridging worlds
I have for a while now wondered about how Rust and C interoperate, not for any other reason than intrigue and trying to figure out how the damn thing works.
Therefore I ventured into the exciting world of the Foreign Function Interface, or FFI. It's a powerful feature that allows Rust to communicate with other programming languages, and a great way to understand this is to write an interface yourself. And so I did. I will be deconstructing the work I did over midsummer this year, creating a simple C-library for "Hello, World!" messages and then writing Rust bindings for it.
Since I really did not know what the hell I was doing initially, I ended up using Gemini Code Assist to bridge the gaps, or chasms, in my knowledge during the excercise.
FFI
In simple terms, the Foreign Function Interface (FFI) is a mechanism that allows code written in one programming language to call, and be called by code written, in another language. In the context of Rust, FFI is most commonly used to interoperate with C libraries. This is because C has a well-defined Application Binary Interface (ABI), which makes it a common denominator for interoperability between many languages.
Why is FFI a Big Deal?
Rust's FFI opens up a world of possibilities:
- Leveraging Existing Ecosystems: You don't have to reinvent the wheel. There are countless high-quality and battle-tested libraries written in C. With FFI, you can use them directly in your Rust projects.
- Performance-Critical Code: While Rust is already incredibly fast, you might have a legacy C library that's highly optimized for a specific task. FFI allows you to call that library from your Rust code, getting the best of both worlds.
- Gradual Modernization: FFI is perfect for gradually migrating a legacy codebase to Rust. You can start by rewriting small parts of your application in Rust and use FFI to bridge the old and new code.
A "Hello, World!" Example
My "hello world" code is available in its entirety on GitHub. While FFI interface in my understanding could also provide bi-directional function calls e.g. C-code could call a Rust function, in this excersise we only implement exposed C-functions to Rust code.
A C-library
First, we need a simple C-library and a function that the C-library will expose.
In src/hello.h
we define a simple C-function called "hello":
// Writes a greeting into the provided buffer.
// Returns the number of bytes written (excluding the null terminator), or -1 on error.
int ;
And in src/hello.c
we implement the "hello" function itself:
int
This function takes in a pointers for name and buffer to modify. Then with sprintf
we construct a string "Hello Name" and push that into the buffer after which function returns with the number of bytes written, or -1 in case of an error. Pretty standard thread safe C-code stuff I think.
Rust "sys" crate
In Rust we usually have a crate with suffix -sys
that implements the FFI interface to C-code via bindings, in this case hello-sys
. These bindings can be manually created I guess, but Rust also provides a bindgen
crate that can automate this process.
To simplify proceedings a bit I created a wrapper.h
header file that #include
s the hello.h
header file. In real life projects this would be the place where you would include all the other headers you need. I guess depending on your use case you could use many other paradigms here, but this seems to be somewhat "standard" way of managing the inclusion of C-functions.
Then on hello-sys/build.rs
file we tell to the Cargo buildtool how to generate the bindings via bindgen
crate:
Lots of things to unpack here, but most importantly:
- It compile the
hello.c
and its function ashello_c_lib
artifact viacc
crate. - We generate the bindings for our C-library and with
.allowlist_function()
define that we wish to only includehello
function instead of potentially everything present in the headers. This is somewhat redundant as we only have a single function defined in thehello.h
header, but in real life you might want to limit the bindings generated to a just a subset of C-functions for example. - Annotate cargo build process with
cargo:rustc-link-lib=static=hello_c_lib
so that rustc will link thehello_c_lib
artifact to the resulting Rust binary. - Also we annotate Cargo build process to rebuild the code if contents of
.c
or.h
files change.
To incorporate the generated bindings to our Rust crate we, as a final step, need to include the generated code as part of our library.
hello-sys/src/lib.rs
therefore contains:
// This brings the generated FFI declarations into our crate.
// The code will be located at $OUT_DIR/bindings.rs
// We must disable some lints that are commonly triggered by C code.
include!;
Now when the hello-sys
crate is built it will contain very unsafe, generated, code and includes the compiled C-library that does not bring any benefits of Rust to the table yet. This is handled by creating a safer library crate as an interface between the developer, sys crate and C-code.
Rust crate
Naming convention aside this crate now uses the hello-sys
crate created earlier to implement via unsafe {}
code user friendly function implementations towards our C-code. So instead of using the bindgen
generated functions the developer uses more "safe" and vetted code from this hello-rs
crate instead.
In hello-rs/src/lib.rs
:
As we can see we still need to use unsafe {}
code, but regardless of what you might have read in the Internet its not wrong to use unsafe
in Rust when you do it right. Hell, the Rusts standard library uses a lots of unsafe code. Most importantly in cases like this there is no way around it without fully re-implementing the C-library functionality in Rust natively, which misses the point of this excersise altogether. Real life is ugly and sometimes you just need to be reasonably unsafe.
So, if we are unsafe how can we mitigate the risks then? As we can see from the code snippet above I have created a Rust function hello
that uses the sys crate provided unsafe hello
function, but wraps it with sanity checks and transforms between C datatypes and native Rust datatypes while maintaining interoperability with the C-library functionality as much as possible. This way any potential documentation written for the C-code that describes parameters and function behavior stay the same across C and Rust implementations.
In the end when we make the calls to C-library via the unsafe code we should be doing things in a sane way, but how can we make sure that our assumptions are correct about "safeness"? Well, lets add some test cases of course:
There is approximately 2x more test code than actual implementation code for the Rust hello
function. Writing unsafe
code is only unsafe if you dont know what your code is doing or you dont care enough to validate your assumptions. But since we can only write test cases for situations that we are aware of no test suite is never perfect and this is how we get the blue screens, core dumps and sad Macs from time to time. When this happens fixes are applied and test cases written to cover that error state and to safeguard against future regressions.
Bonus: Fuzzing
To over simplify what fuzzing is to say that its a technique of sequentially feeding garbage to a datastructure that is processed in someway or another in hopes of getting a crash and then analyzing the error state, creating a fix and test case for it.
Since we can write test cases for inputs and outputs we know of at time of writing your code our test suites are only as good as our imaginations or guidelines are, but it does not hurt to double check. So, in addition to creating comprehensive test suite in Rust I also added fuzzing support for the hello-rs
crate.
hello-rs/fuzz/fuzz_targets/fuzz_hello.rs
implements a fuzzing target for our hello
Rust function:
fuzz_target!;
This simple fuzzer should, to my understanding, catch a situation where some user input to the Rust hello
function that calls hello
in the sys crate that binds to C-library function called hello
causes a panic or a crash we become aware of it. I left the implementation running for few days and was unable to get a crash out of my code; I guess its relative good then or my fuzzing target is just broken or I got lucky with the generated data not triggering an error.
Conclusion
I think I now understand the FFI interface on a superficial level and can create a more complex real life example next or at the very least am able to debug existing crates a little better. Also I need to check if I can use this approach in emdedded development too; I have camera module that I would like to use from Raspberry Pi Pico W, but they only provide C-library for interfacing with the module and so far manually initializing and using it from embassy-rs
environment has been a tricky proposition.
Fun times were had, new things learned, so a win-win.