Newline neutrality with ifstream

I recently got distracted and started porting my old game engine to use a new platform independent rendering interface I devised. I'm pretty psyched about the rendering system, it fully encapsulates a graphics sdk and all it's associated resources (vertex and index buffers, textures, etc). Deriving from the pure virtual renderer and resource base classes, I can write a new renderer class that targets a specific platform/SDK, plug it in, and just have it work without modifying a single line of code.

Part of this old game engine was a resource management class that could load 3D Studio MAX ASCII exported scenes/objects. These files are .ASE files, which is a versatile and simple format that has a surprisingly wide usage by various game engines. ASE files can be used when building Doom3 and Quake3 / Quake4 levels, since the format boasts native support in ID and Loki software's GtkRadiant design tool, so it is a generally accepted format for storing game content. The ASE format specification itself is quite versatile, and can contain material specifications, verticies, faces (vertex indexes), vertex colors, texture coordinates, and even animations with full interpolation parameters just to name a few. Being ASCII, it is rather easy to parse and modify.

In the ASE format, each line of the file serves a single purpose. It is either an opening or closing tag for a field, or an element in a field. It made sense then when I first wrote the import code to parse the file on a line-by-line basis using ifstream::getline(...). This worked great in Windows, and I had no issues with my implementation. Flash forward to now when I am rewriting this code to work in Linux, but using files created in Windows. I never thought the disparities of a 'newline' between platforms would ever cause such a problem.

The ifstream::getline function was specifically designed to get a single line from a file without including the newline or delimiting character(s). The newline character ('\n') is the delimiting character by default, as it serves its purpose most of the time. ifstream::getline(...) reads in all characters up to the delimiter, then extracts and discards the delimiter. The problem arises when considering the definition of a newline on different platforms.

In Windows, the newline for iostream operations is a carriage-return,line-feed combo. It is actually two characters with ASCII decimal values 13, 10 (0x0D, 0xOA in hex), often referred to as a CRLF combo. In Linux, a newline is simply a line-feed (dec 10, hex 0x0A). So when performing ifstream::getline(...) in Linux, what you are doing is reading a line up to the delimiter--newline by default--and if your file was created in Windows only the last half of the CRLF combo is discarded, while the first half is read in as the last character in your string variable. This makes any literal comparisons made against this string variable fail in Linux while working fine in Windows.

I thought for sure there was some elegant way to overcome this and attain platform independent file parsing without forcing a complete rewrite of the import code. This turned out to not be the case. The ifstream::getline function is handy, but by doing you the favor of automatically extracting and discarding the delimiter, it somewhat cripples it's versatility. My impetus was two fold. Obviously first I wanted to be able to read these ASE files with 0x0D,0x0A line delimiters into Linux without having the 0x0D on the end of my read lines. This would be simple enough, but I wanted also code that would work regardless of platform combinations. This meant reading Windows files in Linux, Linux files in Windows, Windows files in Windows and Linux files in Linux.

So this added a bit of complexity to the requirements. First off, it would be impossible to use ifstream::getline(...) because it would extract and discard different characters from the file stream depending on platform. When reading a Linux file in Windows, it would keep reading from the stream, looking for a CRLF combo, and would likely read the entire file without finding one. When reading a Windows file in Linux as I found out, it would read the line including 0x0D, then stop at and discard the 0x0A. If it didn't consider reading Linux files in Windows, the following implementation was elegant, and worked in all other situations:

 
char szLineBuff[256];
ifstream fsIn("test.ase", ios::in);
 
// Note that failbit would be set if this
//failed to extract any chracters before
//the delimiter, and the loop would end.
while(fsIn.get(szLineBuff, 128, '\n').good())
{
	// This extracts 2 characters in
	//windows, just 1 in Linux.
	fsIn.ignore(2,'\x0A');
	// Cleans Windows line in Linux
	if(szLineBuff[strlen(szLineBuff)-1] == '\x0D')
		szLineBuff[strlen(szLineBuff)-1] = '\0';
 
	// Process szLineBuff...
 
}
 

Now I recognize that opening the file in binary mode (ios::binary) would make the '\n' be interpreted as simply '\x0A' on both Windows and Linux. My problem with this is that the file isn't binary. I'm stubborn and think this ASCII text file should be read in text mode, and that there should be a simple way to interpret newlines from any platform on any other platform. There must be a simple and elegant solution out there somewhere...

I'll be honest and say I stopped there, as my code worked for 3/4 of the scenarios I laid out. I have yet to find a Linux native source of ASE files, so the last scenario (Linux files on Windows) hasn't become important for me yet. If anyone has a better solution for this, or knows one that would work in all scenarios while still being somewhat elegant, I'd love to hear about it.

Also, I know I promised a follow-up on the NDS WiFi programming. I only got a little distracted, the example applications are nearly finished. You'll definitely like it when it comes.

Tags: , , , ,

Leave a Reply