My Public Notepad

Friday, 16 December 2011

Finite State Machine in C++

In my article "Finite State Machine in C" I gave a short introduction to Finite State Machines (FSM) and demonstrated two possible implementations of the Car state machine model.

In this article I will show C++ implementation of the same model, by using State Pattern. All state classes are derived from an abstract base class CState and each of them implements public method HandleEvent(EVENT evt). CSMManager class is a State Machine Manager and represents the core of our State Machine: it receives events and dispatches them to the current state for handling. Its private member m_pCurrState is a pointer to the CState base class but it always points to actual state objects. State transition is implemented as its reassignment to a different state object (object's address). This happens when some event occurs. Behaviour of this model is event-driven and, in contrast to C implementations, state transition control is not centralized here - it is not the State Manager who takes care of which state is going to be the next one. Current state decides on that itself, depending on its current conditions and events it receives.

Events.h:

States.h:

State.h:

StateTurnedOff.h:

StateTurnedOff.cpp:

StateIdle.h:

StateIdle.cpp:

StateMoving.h:

StateMoving.cpp:

StateInvalid.h:

StateInvalid.cpp:

SMManager.h:

SMManager.cpp:

main.cpp:

Output:

CStateTurnedOff::OnEntry()
Event received: EVENT_IGNITE
   Whoooa! I'm turned on!
CStateTurnedOff::OnExit()

CStateIdle::OnEntry()
Event received: EVENT_ACCELERATE
   Yipee! I'm accelerating!
CStateIdle::OnExit()

CStateMoving::OnEntry()
Event received: EVENT_BRAKE
   Whoops! Was I too fast?
CStateMoving::OnExit()

CStateIdle::OnEntry()
Event received: EVENT_TURN_OFF
   That was probably enough...
CStateIdle::OnExit()

CStateTurnedOff::OnEntry()
CStateTurnedOff::OnExit()

Note that State Manager's member which refers to the current state (m_pCurrState) is not of reference type (CState&) but a pointer (CState*). This is one of the cases where we MUST use pointer instead of reference because we want to have a variable which reffers to different objects throughout the execution and as re-seating the reference is not allowed, the only option is using a pointer. Please refer Parashift's FAQ on References and this SO question.

The reason for introducing m_prevStateID is that sometimes state machine (or some of its states) needs to know what was its previous state. Variable which keeps track of the previous state should not be of type reference (m_prevState : CState&) or pointer (m_pPrevState : CState*) as current state should not be able to access (members) of other states. It is therefore enough if it holds only the ID of the previous state.

Note that base abstract class CState declares OnEntry() and OnExit() methods - those which are executed when entering and leaving state. Current state executes HandleEvent(EVENT evt) each time it receives some event.

Friday, 9 December 2011

Endianness: how does CPU read bytes from memory?

The simplest example is enough to give you the idea: CPU needs to read 2-byte data type from memory. The following example shows how do Big and Little endian machines read variable of type unsigned short from memory:

main.cpp

Output:

Memory layout:
arr[0] = 0xab
arr[1] = 0xcd
val (simple cast on little endian host) = 52651
val (as interpreted by little endian cpu) = 52651
val (as interpreted by big endian cpu) = 43981

Thursday, 8 December 2011

Host endianness and data transfer over the network

Network components talk to each other by sending messages which are simply arrays of bytes. In order to understand them, parties in conversation need to know the communication protocol which defines message format and the length, order and the meaning of its parts.

Typically, message would comprise header and payload. Header can contain information about message itself, protocol version and information about the sender and receiver. Payload is actually information that sender wants to pass to receiver.

The simplest and shortest message one host can send is a message of a 1-byte length. In this case, protocol only needs to define how is this byte treated - as a character, signed or unsigned number. For example, if protocol says that message contains value of type unsigned char, and the message is 0x8b, receiver will treat this as a positive integer, of value 139. If that was a value of signed char, receiver would understand that this is a negative integer, -117.

There is one problem for messages made of two or more bytes. Bytes are send and received in the same order they are written in the sending buffer. But the way how are bytes copied from register to memory (buffer) and vice versa can be different on different hosts and this depends on their endianness. If sender has a big endian (BE) CPU and receiver has a small endian (SE) CPU, receiver might interpret received values in a wrong way.

Let's look at the case when the message comprises of 2-byte integer value, let's say of type unsigned short. This type has a range of values between 0 and 65535 (0x0000 and 0xffff). If BE sender wants to send value 0xabcd (43981) it will copy this value from registry to buffer keeping the same byte order and buffer will be like this: | 0xab | 0xcd |. Most significant byte (MSB) is at the lower address in memory. The other side will receive bytes in the same order. When copying bytes to the registry, BE receiver will treat the byte from the lowest memory address as the MSB and put it first so the registry will filled with bytes in the same order they are in the memory (0xabcd) and everything would be fine. But LE receiver will treat byte from the lowest address as the Least Significant Byte (LSB) and put it at the last position in the registry - it would swap the order and read received value as 0xcdab (52651), which is wrong! Sender should know the endianness of the client so can send bytes in the correct order, but that is impractical.

Solution to this is a simple rule: sender should always send bytes in big endian order (network byte order) and receiver should always convert received bytes from network to its own byte order. This makes sending and receiving code portable.

Both Windows and *NIX networking frameworks offer helper functions which are able to convert integers from host to network byte order and vice versa. They are:

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

Obviously, if host has network byte order (big endian), no conversion would take place, no matter whether it is on the sending or receiving side.

Sending and receiving buffers can be declared as char or unsigned char arrays. Values transported could be of signed or unsigned types. I made a set of several utility functions that insert and export values of desired integer types into/out from sending/receiving (probably socket) buffers. Prior to inserting, values are converted to network byte order (big endian) and after extraction, values are converted from network to host byte order. Tests prove that hton/ntoh functions can be applied both to signed and unsigned types as all they do is actually swapping bytes (if necessary).

NetBuffUtilCore.h:

NetBuffUtil.h:

NetBuffUtil.cpp:

main.cpp:

Output:

unsigned char buff
Original (unsigned short): 43981
Received val = 43981

char buff
Original (unsigned short): 43981
Received val = 43981

unsigned char buff
Original (short): -31234
Received val = -31234

char buff
Original (short): -31234
Received val = -31234

unsigned char buff
Original (unsigned long): 2882343476
Received val = 2882343476

char buff
Original (unsigned long): 2882343476
Received val = 2882343476

unsigned char buff
Original (long): -1107401523
Received val = -1107401523

char buff
Original (long): -1107401523
Received val = -1107401523

To avoid dependency on Winsock library, I implemented a function which swaps bytes for a given type (well, template should be constrained to only integer types...):

EndiannessUtil.h:

main.cpp:

So far, we were focused on transfer of integer types. What if message payload needs to contain strings, or, mashup of strings and integers?

Let's say that we need to send some ASCII string and some unsigned long number. Protocol should define message payload like this:

|L0|L1|S1|S1|S2|........|SK|N0|N1|N2|N3|

|L0|L1| - 2 bytes for unsigned short value that defines string length (K bytes)
|S1|S1|S2|........|SK| - string (K bytes)
|N0|N1|N2|N3| - 4 bytes for unsigned long number

Both integers should be converted to the network byte order prior to writing into sending buffer. But string does not need to be changed - that is ASCII string and each character is placed in a single byte. Receiving side will first read 2 bytes of payload, extract string length (K), allocate memory for string (K bytes) and then read (copy) next K bytes from receiving buffer into the string buffer. After that, receiver will read next 4 bytes and convert them from network byte order before passing it for further processing.

If sending Unicode string, we need to take care about endianness again as some of its characters use two or more bytes. Our protocol will define encoding applied (e.g. UTF-8 or UTF-16) but this time sender needs to send additional information as well - its endianness. This information is contained in Byte Order Mark (BOM) sequence which is prepended to our string. BOM helps Unicode decoder on the client side to decide whether to swap or not bytes for multi-byte characters.

Links and references:
htons(), htonl(), ntohs(), ntohl() (Beej's guide)
htons function (MSDN)
htonl function (MSDN)
ntohl function (MSDN)
ntohs function (MSDN)
Linux functions
Encodings and Unicode (Python)
Byte Order (Codecs)