cowlark.com :  Linux Binaries on Windows :  Stuff That Is Broken

Linux Binaries on Windows

Stuff That Is Broken

Published: 2016 November 3

Lots of stuff in LBW is broken. Quite a lot of it I don't know how to fix. Can you help?

Windows Vista & Windows 7

Right now LBW only works on Windows XP with Interix 3.5.

LBW is known not to work on Windows 7 (with Interix 6.0). I do not know why. I have done some debugging, and it would appear that a whole bunch of stuff doesn't work --- mmap() producing EIO errors randomly but consistently, memory corruption when starting new processes, etc.

My only real development machine is Windows XP. I would dearly love for someone to look into why LBW doesn't work on other versions. Given that Interix is a pain to install on Windows XP and trivial to install on Windows 7, it's a pity it doesn't work there.

Plus, I have no access to any Vista machine, so have no idea how it stands there...

%gs and segmentation

Linux uses the %gs register to identify the currently running thread. It does this by creating a 4GB-long GDT with base address at the thread's descriptor block.

This allows the process to do things like:

mov [gs:0], eax

...to load the quad at the start of the descriptor block into %eax. On the register-starved ia32 architecture this improves performance drastically over other ways of doing it.

Unfortunately Windows won't let me create GDTs. It will let me create LDTs using miscellaneous undocument Windows NT kernel calls, but they're not quite good enough --- Windows enforces a size limit on them to stop them extending above about $7ff00000.

The issue here is that Linux processes also do things like:

mov [gs:0xfffffffc], eax

...to load the quad immediately before the start of the thread descriptor block. It can do this because address arithmetic in a 4GB-long segment wraps round, so adding 0xfffffffc is equivalent to subtracting 4.

But Windows won't let me create a 4GB LDT.

What I'm doing instead is leaving %gs set to 0. This causes a page fault to occur every time the Linux process tries to execute an instruction that involves %gs. I can examine the code that it tried to execute, generate a fragment of equivalent code that does not use %gs, and run that instead, before returning to the process.

This works, but it's dog slow --- page faults are not fast, and cripplingly, Linux assumes that %gs references are fast, so it thinks nothing of doing them in inner loops.

I am attempting to patch the code with the translated fragments where possible, but I need five bytes to make this possible (the size of a jump instruction), and frequently it's not.

Does anyone know a way to make Windows create a 4GB GDT or LDT? Preferably one that doesn't involve a custom kernel driver.

mmap()

Linux makes huge use of the mmap() system call. This attaches a file to the VM, causing a section of memory to become a view of the file. It's used all over the place, from loading code to copying files.

Interix supports mmap() --- I would not even have attempted this if it hadn't. Unfortunately, Windows and Linux have rather different mmap() semantics.

The big issue is: Linux allows mmap()ing on 4kB boundaries. Windows requires 64kB boundaries.

This becomes a big problem when it comes to loading code. Linux applications are loaded at 0x0804800, which is not 64kB-aligned. Therefore I cannot mmap() it. It gets worse when it comes to shared libraries; ld.so assumes that it can map a file to an arbitrary address and then map a 4kB page immediately after it.

What I've got, therefore, is a ghastly mess of code that attempts to work out whether it's possible to mmap() the file directly or whether it has to allocate RAM and physically load the file data into it. While it currently appears to work, there are certain combinations of flags that won't work --- MAP_SHARED|MAP_FIXED to an address that is not 64kB-aligned, for example. It's also slow and uses lots of RAM.

Does anyone know a way to make Windows map files using 4kB granularity?

clone() and futex()

Linux' threading primitives all boil down to just two system calls: clone(), which starts a new thread or process, and futex(), which is a basic synchronisation primitive.

Interix has neither of these.

Right now I only support clone() enough to make fork() work. Trying to create a thread will fail. futex() contains just enough stub support to make glibc start up, and no more.

I believe that it is not possible to implement futex() on Windows, simply due to mismatches between the differing way synchronisation works on the two platforms --- there is no Windows NT primitive that is equivalent, and I cannot emulate futex() due to needing to be able to do stuff atomically.

Can anyone prove me wrong?

I do have a backup plan, which is to provide a replacement Linux pthreads library that calls out to the Interix pthreads library to do the work; but this is ugly and won't help with static binaries.

Signals

LBW's signal handling is a broken mess. Right now it only works by accident.

Linux supports 64 signals (32 conventional ones and 32 real-time signals). Interix supports only 32, and what's more Interix doesn't support any of the signal-handling extensions that Linux does such as sigaltstack() or SA_SIGINFO.

I have a horrible feeling I'm going to have to implement a complete interprocess signal handling layer on top of Interix'.

However, I don't actually know much about signals. Can anyone offer insight?

File handles

Linux supports large files, with 64-bit lengths and offsets.

Interix does not, even though Windows NT does. As a result, trying to use files bigger than 4GB (and probably 2GB) is going to work very badly.

I could use the Windows NT kernel file manipulation functions directly, thus working around the Interix limit... if I knew the Windows NT file handle.

Does anyone know how to get the Windows NT file handle from an Interix file descriptor?