C++11
One thing I liked with the LISP, was the possibility to generate code at the runtime. In C++, without some black magic (and LLVM + JIT) is unable to do the same. However, things are getting better with the new standard.
One of the most powerful and useful feature of the new standard are: constant expressions. Constant expressions allows the developer writing expressions that will be evaluated during the compilation. You can see it as an evolved form of inlining.
For example:
constexpr max(int i, int j)
{
return i > j ? i : j;
}
int main(int, char**)
{
int m = max(1, 4); // will translate to: int m = 4;
return 0;
}
The variable m will have its value directly affected to 4 and not comparison will be done at the runtime.
How does it help us?
By greatly improving readability.
Do you remember these kind of macros:
#define GET_ARRAY_SIZE(array) (sizeof(array) / sizeof(*(array)))
#define STATIC_ASSERT(expression) typedef char static_assertion[(expression)?1:-1]
STATIC_ASSERT(GET_ARRAY_SIZE(anArray) > 10);
In C++11, you may prefer:
template<typename T, size_t SIZE>
static constexpr size_t getArraySize(T (&) [SIZE]) { return SIZE; }
static_assert(getArraySize(anArray) > 10, "The array is too small !");
Much more sexy, in my opinion.
Constant expressions can’t be complicated, they can only consist in a return. If C++ had not template, they would have been pointless. However, with templates they allow constructions that were not possible before like in the previous example.
The C++ ABI
The C++ has some requirements we don’t usually care about in software development. For example, GCC adds some code when you create a static variable local to a function, to ensure the atomicity of its accesses.
I won’t explain how to do it, I simply followed the C++ article on OSDEV to set everything up.
I use a C++ function to call all the global variables constructors:
typedef void (*CTOR)();
extern "C"
{
extern void* start_ctors;
extern void* end_ctors;
}
void __call_constructors()
{
size_t nb_ctors = (size_t(&end_ctors) - size_t(&start_ctors))
/ sizeof(void*);
CTOR* addr = reinterpret_cast<CTOR*>(&start_ctors);
for (size_t i = 0; i < nb_ctors; ++i, ++addr)
{
(*addr)();
}
}
The __call_constructors function has to be called from the main.
Kernel types
I’m a clumsy person and I easily do mistakes.
However, since I know it, I’m trying to put safeguards wherever I can. That’s why the two first classes I created after having an “Hello world” on my board were: Address and Register.
Address
The Address class holds an address and checks every access we’re doing on it.
Basically, an address has 4 properties:
- The size of the data at the address;
- Is it readable;
- Is it writable;
- Is it volatile.
With these information, we can start to generate the actual pointer type:
The 4 properties are template values:
template<size_t PointedSizeInBytes = 4, bool Writeable = true, bool Volatile = false, bool Readable = true>
class Address
{
Address(size_t address)
: address_(address)
{}
// ....
};
Is it readable?
typedef typename Traits::static_if<Readable, T, void>::type is_readable_type;
Is it volatile?
typedef typename Traits::static_if<Volatile, typename Traits::add_volatile<is_readable_type>::type, is_readable_type>::type is_volatile_type;
Is it writable?
typedef typename Traits::static_if<Writeable, is_volatile_type, typename Traits::add_const<is_volatile_type>::type>::type is_writable_type;
And we create our pointer:
typedef typename Traits::add_pointer
When we instantiate an Address instance, we can check the last property, the pointee size:
template <typename T>
Address::Address(T* pointer)
: address_(size_t(pointer))
{
static_assert(sizeof(T) == PointedSizeInBytes, "Incompatible pointer type");
}
Or, more likely, we can check it when we get out the pointer:
template<typename T>
typename GetPointerType<T>::type Address::getAddress()
{
static_assert(sizeof(T) == PointedSizeInBytes, "Cannot cast the pointer, invalid size");
return reinterpret_cast<typename GetPointerType<T>::type>(address_);
}
Register
A processor wouldn’t be as useful without peripheral devices. Devices are designed to interact with the environment. However, they need to be instructed to do so.
To communicate with the devices we need to access their registers. Registers are mapped in the address space, which is specified in the OMAP4460 documentation (remember the UART3?).
Let’s take the UART as an example.
The UART allows reading and writing data from and to another device (in our
case, the rs232 connection between my computer and the board). First, it needs
to be initialized: the BAUD rate, the parity, and some other things need to be
set. But I took the lazy way, and I let U-Boot do it for me (Actually, I didn’t
manage to reinitialize the UART device properly
)
So, we can use it right away!
As we saw, we need to write in an 8bits register to send data, but if we write to much information the device won’t send anymore data (I think the device enter in some error state and needs to be re-initialized). If we take a look at the UART documentation (included in the OMAP4460 pdf), we can see that there is a register that indicates the state of the transmit queue: UART_LSR. The documentation also indicates at which address this register is available: 0×48020014.
Furthermore, it says:
5 TX_FIFO_E
Read 0x0: Transmit hold register (TX FIFO) is not empty.
Read 0x1: Transmit hold register (TX FIFO) is empty.
So, we must wait until we can read a 1 before sending any characters.
Let’s rewrite our puts function:
void puts(const char* data)
{
volatile uint8* UART3_BASE = (volatile uint8*) 0x48020000;
volatile uint8* UART3_LSR = (volatile uint8*) 0x48020014;
while (*data)
{
while (((*UART3_LSR) & (1<<5)) == 0);
*UART3_BASE = *data++;
}
}
I don’t know for you, but I easily do mistake when I’m programming with binary mask. (To be entirely honest I did not try the above code again, I took it from my mercurial history).
I rewrote this code using an abstraction for managing registers. Now I can rewrite the above code like this:
enum LSRRegisterPart
{
RX_FIFO_E,
RX_OE,
RX_PE,
RX_FE,
RX_PI,
TX_FIFO_E,
TX_SR_E,
RX_FIFO_STS
};
typedef KLib::Register<unsigned char,
UARTBaseAddress
, KLib::RegisterPart<RX_FIFO_E, 0>
, KLib::RegisterPart<RX_OE, 1>
, KLib::RegisterPart<RX_PE, 2>
, KLib::RegisterPart<RX_FE, 3>
, KLib::RegisterPart<RX_PI, 4>
, KLib::RegisterPart<TX_FIFO_E, 5>
, KLib::RegisterPart<TX_SR_E, 6>
, KLib::RegisterPart<RX_FIFO_STS, 7>> LSRRegister;
enum DataRegisterPart { THR = 0, RHR = 0, DLL = 0 };
typedef KLib::Register<unsigned char,
UARTBaseAddress,
KLib::RegisterPart<THR, 0, 8>> DataRegister;
void puts(const char* data)
{
static DataRegister data(0x48020000);
static LSRRegister lsr(0x48020014);
while (*data)
{
while (lsr.getValue<LSRRegisterPart::TX_FIFO_E, uint8>() == 0);
data.writeValue(*data++);
}
}
How does it work?
A register is defined by an address where we can read or/and write data of a given size. Quite often, a register is also divided in several parts, defined by names, start and end offsets; pretty much like a plain structure has different fields of different size.
The RegisterPart argument defines one of those parts and makes it easy to manipulate the underlying range of bits. Combined with the Address class, this is a solid tool to handle most of the register accesses, including via mask/logical arithmetics.
RegisterPart is defined as follow:
template<size_t ID, size_t OFFSET_START, size_t OFFSET_END = OFFSET_START + 1>
struct RegisterPart
{
static constexpr size_t RegisterID = ID;
static_assert(OFFSET_END > OFFSET_START, "OFFSET_START > OFFSET_END");
template<typename ReturnType, typename T>
static ReturnType getValue(T reg)
{
static_assert(OFFSET_START < getSizeInBits(T()), "OFFSET_START > reg");
static_assert(OFFSET_END <= (getSizeInBits(T())), "OFFSET_END > reg");
static_assert(((OFFSET_END - OFFSET_START) / 8) <= sizeof(ReturnType), "Invalid return type");
return (ReturnType)sliceByte(reg, OFFSET_START, OFFSET_END);
}
template <typename T, typename U>
static void setValue(T reg_addr, U value)
{
static_assert(Traits::is_const<typename Traits::remove_pointer<T>::type>::value == false,
"The pointer is const, the register is not writable");
auto old_value = *reg_addr;
constexpr unsigned int mask = ((1 << (OFFSET_END - OFFSET_START)) - 1) << OFFSET_START;
*reg_addr = old_value ^ ((old_value ^ (value << OFFSET_START)) & mask);
}
template <typename T, typename U>
static void setSlicedValue(T reg_addr, U value)
{
value = sliceByte(value, OFFSET_START, OFFSET_END);
RegisterPart::setValue(reg_addr, value);
}
};
The sliceByte function is simply:
template <typename NumericType>
constexpr size_t sliceByte(NumericType value, size_t start, size_t end)
{
return (value >> start) & ((1 << (end - start)) - 1);
}
The complex bitmask arithmetic in setValue is borrowed this
website. I
modified it to allow the OFFSET_START shift.
The next step is to put all the RegisterPart together in one Register, with the least possible runtime overhead.
To do so, I decided to separate the Register class in two parts:
- one for the C++ stuff;
- one for the actual public interface.
The part that handles “the C++ stuff”, is basically a proxy towards RegisterPart. It is responsible for select the proper RegisterPart and call the desired function.
The Register class is declared as follow:
template<typename RegisterUnderlyingType, typename Address, typename... RegisterValues>
struct Register : private detail::Register<RegisterUnderlyingType, RegisterValues...>
, private NonCopyable
{
typedef detail::Register<RegisterUnderlyingType, RegisterValues...> Base;
typedef typename Address::template GetPointerType<RegisterUnderlyingType>::type PointerType;
//...
public:
Register(Address addr)
: address_(addr)
{}
template <size_t RegisterValueIdx, typename ReturnType>
ReturnType getValue();
template <size_t RegisterValueIdx, typename U>
RegisterHolderRAII setValue(U value);
template <size_t RegisterValueIdx, typename U>
RegisterHolderRAII setSlicedValue(U value);
void writeValue(RegisterUnderlyingType value)
{
static_assert(Traits::is_const<typename Traits::remove_pointer<PointerType>::type>::value == false,
"The pointer is const, the register is not writable");
auto reg_addr = address_.template getAddress<RegisterUnderlyingType>();
*reg_addr = value;
}
};
The set/get*Value member functions are intentionally left blank.
As I said, the code responsible for selecting the proper register part is in a
separated class. Please, welcome detail::Register:
namespace detail {
template<typename RegisterUnderlyingType, typename... RegisterValues>
struct Register : public RegisterValues...
{
template <size_t RegisterValueIdx, typename ReturnType, typename T>
ReturnType getValue(T value)
{
//...
}
};
}
I decided to inherit from RegisterValue. The trick, is to be able to cast
this in the proper type to select the wanted part.
To address this problem the C++ implicit conversion kicks in! Only a 3 line function can trigger the cast!
template <size_t idx, size_t S, size_t E>
static constexpr RegisterPart<idx, S, E>& get(RegisterPart<idx, S, E>& ref)
{ return ref; }
Basically, if you can specify at least 1 template value, the compiler is able
to lookup the remaining values. Here, we know idx, so the compiler will
retrieve the others for us. The code going into the
detail::Register::getValue implementation finally boils down to:
// value is the value we are going to split according to the offset specified in
// the RegisterPart instance we are selecting
return detail::get<RegisterValueIdx>(*this).template getValue<ReturnType>(value);
The others are all very simple:
template <size_t RegisterValueIdx, typename T, typename U>
void setValue(T reg, U value)
{
detail::get<RegisterValueIdx>(*this).setValue(reg, value);
}
template <size_t RegisterValueIdx, typename T, typename U>
void setSlicedValue(T reg, U value)
{
detail::get<RegisterValueIdx>(*this).setSlicedValue(reg, value);
}
And then we can finish the implementation of the Register class:
template <size_t RegisterValueIdx, typename ReturnType>
ReturnType getValue()
{
return Base::template getValue<RegisterValueIdx, ReturnType>(*address_.template getAddress<RegisterUnderlyingType>());
}
template <size_t RegisterValueIdx, typename U>
RegisterHolderRAII setValue(U value)
{
RegisterHolderRAII ret(*this, address_.template getAddress<RegisterUnderlyingType>());
ret.template setValue<RegisterValueIdx>(value);
return ret;
}
template <size_t RegisterValueIdx, typename U>
RegisterHolderRAII setSlicedValue(U value)
{
RegisterHolderRAII ret(*this, address_.template getAddress<RegisterUnderlyingType>());
ret.template setSlicedValue<RegisterValueIdx>(value);
return ret;
}
RegisterHolderRAII permits atomic writing: the setValue/setSlicedValue will be cached and written to the register only when RegisterHolderRAII is destroyed.
Few remarks
While the code behind Register is certainly not trivial, it removes a lot of complexity in a lot of other places. I think this is a, more than acceptable, trade-off.
Moreover, the compiler catch one more possible mistake: if you create two RegisterPart with the same id, and try to access it; the compiler will raise an error indicating that there is an ambiguity triggered by the implicit cast.
I did not implement a writeValue in RegisterPart (I don’t need it for now), but for read-only addresses, we will try to dereference a void\* and the compiler will issue an error.
I kept Address and Register in a library aside the kernel, that allowed me to write unit test.
Conclusion
I hope this was clear enough and that you understood why I’ve chosen C++11 to experiment with kernel programming.
The compiler does a really good job on inlining the calls; in the end almost only the actual register write remains in the assembly code.
Of course, that’s not the only performance factor, and I didn’t try to run any benchmark (performances aren’t my current concern). But, I had pretty decent results as far.
The next post will be about setting up the MMU. I’m currently motivating myself to write a page allocator and a memory allocator, but the design isn’t clear in my head yet.
After that, I will try to implement interruptions and post about it too.
Where can I find the code?
I will publish the whole source code a bit later, that’s a bad excuse, but I’ve to clean it up a little bit, most notably the logging part.
Thanks
I would like to thank Louis for the time he takes to review this post!
Little remarks
Since I decided to write my post with markdown, the inlined code will be less sexy. It is easier for me to work on it offline and easier for the reviewers.


