For quite some time, I have been bothered by this thought: Individual programming languages (C++, Rust, Go, etc.) are traditionally viewed as walled gardens. If your main()
function is written in C++, you had better find yourself C++ libraries like Qt to build the rest of your codebase with. Do you want to use Flutter to build your app’s user interface? Get ready to build the logic in Flutter, too. Do you really want to use that Rust library to make your application safer? You get to either rewrite the whole app in Rust or build an ugly extern "C"
wrapper around it that won’t fit well in your object-oriented C++ code.
This has been the standard view on using multiple programming languages for many years. However, I’ve decided that this view is fundamentally flawed, because every compiled language uses the same set of concepts when it is compiled:
- Code is split up into functions that can be reused.
- Functions are identified by a string generated from the function name in the source code. For example, g++ generates
_Z3foov
as the identifier forvoid foo()
. This string is always reproducible; for example, both Clang and GCC on Linux follow the Itanium C++ ABI convention for mangling function names. - Functions are called by storing all parameters to that function at a specific location in memory and then using a
call
instruction or equivalent to move control to the function. For example, to callvoid foo()
from earlier, the compiler converts a C++ statementfoo();
into the assemblycall _Z3foov
. The assembler then replacescall
with the appropriate opcode and replaces_Z3foov
with the location of the first instruction identified by_Z3foov
. - Functions return by storing their return value (if they have one) at a specific location and then using a
ret
instruction or equivalent. - Classes and structs can be boiled down to a collection of primitive types (although some classes do have vtables).
- Class methods are just another function that happens to take a pointer to the class object as the first parameter. In other words, when you write this:
class Foo{ void foo(int bar); int baz;};
your code actually compiles to something that is better represented this way:
class Foo{ int baz;};void foo(Foo *this, int bar);
Since every compiled programming language uses the same concepts to compile, why can’t they just interact?
Example
Before we go any further, I’d like to give an example of what we want to achieve:
// file: main.cpp#include "rustmodule.h"// or in an ideal C++ 20 world:// import rustmodule;int main(){ foo(); return 0;}
// file: rustmodule.h#pragma once// this is defined in Rustvoid foo();
// file: rustmodule.rspub fn foo() { println!("Hello from Rust");}
We want to be able to compile those files and get an executable file that prints Hello from Rust
to stdout
.
Now let’s look at why this won’t just work out of the box.
Name mangling, data layout, and standard libraries
The most obvious reason that compiled programming languages can’t just interact with each other is the most obvious one: syntax. C++ compilers don’t understand Rust, and Rust compilers don’t understand C++. Thus neither language can tell what functions or classes the other is making available.
Now, you might be saying “But if I use a C++ .h file to export functions and classes to other .cpp files, certainly I could make a .h file that tells C++ that there is a Rust function fn foo()
out there!” If you did say (or at least think) that, congratulations! You are on the right track, but there are some other less obvious things we need to talk about.
The first major blocker to interoperability is name mangling. You can certainly make a .h file with a forward declaration of void foo();
, but the C++ compiler will then look for a symbol called _Z3foov
, while the Rust compiler will have mangled fn foo()
into _ZN10rustmodule3foo17hdf3dc6f68b54be51E
. Compiling the C++ code starts out OK, but once the linking stage is reached, the linker will not be able to find _Z3foov
since it doesn’t exist.
Obviously, we need to change how the name mangling behaves on one side or the other. We’ll come back to this thought in a moment.
The second major blocker is data layout. Put simply, different compilers may treat the same struct declaration differently by putting its fields at different locations in memory.
The third and final blocker I want to look at here is standard libraries. If you have a C++ function that returns an std::string
, Rust won’t be able to understand that. Instead, you need to implement some sort of converter that will convert C++ strings to Rust strings. Similarly, a Rust Vec
object won’t be usable from C++ unless you convert it to something C++ understands.
Let’s investigate how we can fix the first problem, name mangling.
extern "C"
and why it sucks
The easy way is to use the extern "C"
feature that nearly every programming language has:
// file: main.cpp#include "rustmodule.h"// or in an ideal C++ 20 world:// import rustmodule;int main(){ foo(); return 0;}// file: rustmodule.h#pragma onceextern "C" void foo();
// file: rustmodule.rs#[no_mangle]pub extern "C" fn foo() { println!("Hello from Rust");}
This actually will compile and run (assuming you link all the proper standard libraries)! So why does extern "C"
suck? Well, by using extern "C"
you give up features like these:
- Function overloads
- Class methods
- Templates
It’s possible to create wrappers around the extern "C"
functions to crudely emulate these features, but I don’t want complex wrappers that provide crude emulation. I want wrappers that directly plumb those features and are human readable! Furthermore, I don’t want to have to change the existing source, which means that the ugly #[no_mangle] pub extern "C"
must go!
Enter D
D is a programming language that has been around since 2001. Although it is not source compatible with C++, it is similar to C++. I personally like D for its intuitive syntax and great features, but for gluing Rust and C++ together, D stands out for two reasons: extern(C++)
and pragma(mangle, "foo")
.
With extern(C++)
, you can tell D to use C++ name mangling for any symbol. Therefore, the following code will compile:
// file: foo.cpp#include <iostream>void bar();void foo(){ std::cout << "Hello from C++\n"; bar();}
// file: main.dimport std.stdio;extern(C++) void foo();extern(C++) void bar(){ writeln("Hello from D");}void main(){ foo();}
However, it gets better: we can use pragma(mangle, "foo")
to manually override name mangling to anything we want! Therefore, the following code compiles:
// file: main.dimport std.stdio;pragma(mangle, "_ZN10rustmodule3foo17h18576425cfc60609E") void foo();pragma(mangle, "bar_d_function") void bar(){ writeln("Hello from D");}void main(){ foo();}
// file: rustmodule.rspub fn foo() { println!("Hello from Rust"); unsafe { bar(); }}extern { #[link_name = "bar_d_function"] fn bar();}
With pragma(mangle, "foo")
we can not only tell D how Rust mangled its function, but also create a function that Rust can see!
You might be wondering why we had to tell Rust to override mangling of bar()
. It’s because Rust apparently won’t apply any name mangling to bar()
for the sole reason that it is in an extern
block; in my testing, not even marking it as extern "Rust"
made any difference. Go figure.
You also might be wondering why we can’t use Rust’s name mangling overrides instead of D’s. Well, Rust only lets you override mangling on function forward declarations marked as extern
, so you can’t make a function defined in Rust masquerade as a C++ function.
Using D as the glue
We can now use D to glue our basic example together:
// file: main.cpp#include "rustmodule.h"// or in an ideal C++ 20 world:// import rustmodule;int main(){ foo(); return 0;}// file: rustmodule.h#pragma once// this is in Rustvoid foo();
// file: rustmodule.rspub fn foo() { println!("Hello from Rust");}
// file: glue.d@nogc:// This is the Rust function.pragma(mangle, "_ZN10rustmodule3foo17h18576425cfc60609E") void foo_from_rust();// This is exposed to C++ and serves as nothing more than an alias.extern(C++) void foo(){ foo_from_rust();}
In this example, when main()
calls foo()
from C++, it is actually calling a D function that can then call the Rust function. It’s a little ugly, but it’s possibly the best solution available that leaves both the C++ and Rust code in pristine condition.
Automating the glue
Nobody wants to have to write a massive D file to glue together the C++ and Rust components, though. In fact, nobody even wants to write the C++ header files by hand. For that reason, I created a proof-of-concept tool called polyglot that can scan C++ code and generate wrappers for use from Rust and D. My eventual goal is to also wrap other languages, but as this is a personal project, I am not developing polyglot very quickly and it certainly is nowhere near the point of being ready for production use in serious projects. With that being said, it’s really amazing to compile and run the examples and know that you are looking at multiple languages working together.
Next up
I originally planned to write on this topic in one blog post, but there are a lot of interesting things to cover, so I will stop here for now. In the next installment (part 2) of this series we will take a look at how we can overcome the other two major blockers to language interoperability and here you can find part 3.