My Public Notepad: string

Showing posts with label string. Show all posts

Friday, 18 December 2015

String interpolation in C# 6

With C# 6 we can finally wave goodbye to the cumbersome, C-style way of injecting values into a string. Before, target string and arguments were separated and indexes were used to place the right argument at the right place inside the string. That was an error prone process.


var id = 123;
var name = "abc";
var s = string.Format("id = {0}, name = {1}", id, name);

New interpolation syntax allows direct injection of variables into the string:


var id = 123;
var name = "abc";
var s = $"id = {id}, name = {name}";

It is possible to inject string result of some more complex expressions:


var task = new Task();
var taskExceptionReport = 
   $"Task.Exception: {((task.Exception == null) ? "null" : task.Exception.ToString())}";

Further reading:
Interpolated Strings (MSDN)
Bill Wagner: "A Pleasant New C# Syntax for String Interpolation"

Thursday, 8 December 2011

Host endianness and data transfer over the network

Network components talk to each other by sending messages which are simply arrays of bytes. In order to understand them, parties in conversation need to know the communication protocol which defines message format and the length, order and the meaning of its parts.

Typically, message would comprise header and payload. Header can contain information about message itself, protocol version and information about the sender and receiver. Payload is actually information that sender wants to pass to receiver.

The simplest and shortest message one host can send is a message of a 1-byte length. In this case, protocol only needs to define how is this byte treated - as a character, signed or unsigned number. For example, if protocol says that message contains value of type unsigned char, and the message is 0x8b, receiver will treat this as a positive integer, of value 139. If that was a value of signed char, receiver would understand that this is a negative integer, -117.

There is one problem for messages made of two or more bytes. Bytes are send and received in the same order they are written in the sending buffer. But the way how are bytes copied from register to memory (buffer) and vice versa can be different on different hosts and this depends on their endianness. If sender has a big endian (BE) CPU and receiver has a small endian (SE) CPU, receiver might interpret received values in a wrong way.

Let's look at the case when the message comprises of 2-byte integer value, let's say of type unsigned short. This type has a range of values between 0 and 65535 (0x0000 and 0xffff). If BE sender wants to send value 0xabcd (43981) it will copy this value from registry to buffer keeping the same byte order and buffer will be like this: | 0xab | 0xcd |. Most significant byte (MSB) is at the lower address in memory. The other side will receive bytes in the same order. When copying bytes to the registry, BE receiver will treat the byte from the lowest memory address as the MSB and put it first so the registry will filled with bytes in the same order they are in the memory (0xabcd) and everything would be fine. But LE receiver will treat byte from the lowest address as the Least Significant Byte (LSB) and put it at the last position in the registry - it would swap the order and read received value as 0xcdab (52651), which is wrong! Sender should know the endianness of the client so can send bytes in the correct order, but that is impractical.

Solution to this is a simple rule: sender should always send bytes in big endian order (network byte order) and receiver should always convert received bytes from network to its own byte order. This makes sending and receiving code portable.

Both Windows and *NIX networking frameworks offer helper functions which are able to convert integers from host to network byte order and vice versa. They are:

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

Obviously, if host has network byte order (big endian), no conversion would take place, no matter whether it is on the sending or receiving side.

Sending and receiving buffers can be declared as char or unsigned char arrays. Values transported could be of signed or unsigned types. I made a set of several utility functions that insert and export values of desired integer types into/out from sending/receiving (probably socket) buffers. Prior to inserting, values are converted to network byte order (big endian) and after extraction, values are converted from network to host byte order. Tests prove that hton/ntoh functions can be applied both to signed and unsigned types as all they do is actually swapping bytes (if necessary).

NetBuffUtilCore.h:

NetBuffUtil.h:

NetBuffUtil.cpp:

main.cpp:

Output:

unsigned char buff
Original (unsigned short): 43981
Received val = 43981

char buff
Original (unsigned short): 43981
Received val = 43981

unsigned char buff
Original (short): -31234
Received val = -31234

char buff
Original (short): -31234
Received val = -31234

unsigned char buff
Original (unsigned long): 2882343476
Received val = 2882343476

char buff
Original (unsigned long): 2882343476
Received val = 2882343476

unsigned char buff
Original (long): -1107401523
Received val = -1107401523

char buff
Original (long): -1107401523
Received val = -1107401523

To avoid dependency on Winsock library, I implemented a function which swaps bytes for a given type (well, template should be constrained to only integer types...):

EndiannessUtil.h:

main.cpp:

So far, we were focused on transfer of integer types. What if message payload needs to contain strings, or, mashup of strings and integers?

Let's say that we need to send some ASCII string and some unsigned long number. Protocol should define message payload like this:

|L0|L1|S1|S1|S2|........|SK|N0|N1|N2|N3|

|L0|L1| - 2 bytes for unsigned short value that defines string length (K bytes)
|S1|S1|S2|........|SK| - string (K bytes)
|N0|N1|N2|N3| - 4 bytes for unsigned long number

Both integers should be converted to the network byte order prior to writing into sending buffer. But string does not need to be changed - that is ASCII string and each character is placed in a single byte. Receiving side will first read 2 bytes of payload, extract string length (K), allocate memory for string (K bytes) and then read (copy) next K bytes from receiving buffer into the string buffer. After that, receiver will read next 4 bytes and convert them from network byte order before passing it for further processing.

If sending Unicode string, we need to take care about endianness again as some of its characters use two or more bytes. Our protocol will define encoding applied (e.g. UTF-8 or UTF-16) but this time sender needs to send additional information as well - its endianness. This information is contained in Byte Order Mark (BOM) sequence which is prepended to our string. BOM helps Unicode decoder on the client side to decide whether to swap or not bytes for multi-byte characters.

Links and references:
htons(), htonl(), ntohs(), ntohl() (Beej's guide)
htons function (MSDN)
htonl function (MSDN)
ntohl function (MSDN)
ntohs function (MSDN)
Linux functions
Encodings and Unicode (Python)
Byte Order (Codecs)

Tuesday, 23 August 2011

Lexical string comparison in NSIS

Standard (built-in) string comparison operators are:

case-insensitive: ==, !=, StrCmp
case-sensitive: StrCmpS

They are only capable of telling whether two strings are equal or not. They don't provide information on which string is greater or less, if strings differ.

LogicLib plug-in extends this set of comparison operators with following ones:

case-insensitive: <, <=, >, >=
case-sensitive: S==, S!=

Note that operators with capital letter S are case-sensitive.

Here are some tests:

StrCpy $0 "abc"

StrCpy $1 "abc"

StrCpy $2 "Abc"

StrCpy $3 "bbc"

${If} $0 == $1 

   DetailPrint "Equal to $1" 

${Else}

   DetailPrint "Not equal to $1"

${EndIf}

${If} $0 == $2 

   DetailPrint "Equal to $2"

${Else}

   DetailPrint "Not equal to $2"

${EndIf}

; case sensitive comparison (LogicLib operator)

${If} $0 S== $2 

   DetailPrint "Equal to $2"

${Else}

   DetailPrint "Not equal to $2"

${EndIf}

${If} $0 == $3

   DetailPrint "Equal to $3"

${Else}

   DetailPrint "Not equal to $3"

${EndIf}

${If} $0 S< $2

   DetailPrint "Less than $2"

${Else}

   DetailPrint "Greater or equal than $2"

${EndIf}

${If} $0 S< $3

   DetailPrint "Less than $3"

${Else}

   DetailPrint "Greater or equal than $3"

${EndIf}

Output:

Equal to abc.
Equal to Abc.
Not equal to Abc.
Not equal to bbc.
Greater or equal than Abc.
Less than bbc.

Wednesday, 10 August 2011

MFC and strings

MFC framework uses CString and LPCTSTR as string data types. Code that uses MFC should therefore follow this convention and use these data types. Parts of code (possibly data/model or engine/controller parts) might be shared and portable and they should be using standard C++ string type (STL string). Conversion between MFC and standard strings is easy.

There is a MFC convention of how to pass strings as function arguments and how to return them:
- string as function input parameter: use LPCTSTR. e.g. SetName(LPCTSTR pszName);
- string as function out parameter: use CString&. e.g. GetName(CString sName);
- string as function return type: use const CString&. e.g. const CString& GetName();

Useful links and references:
Strings: CString Argument Passing
CString (MFC)

Pages