Portable Memory Mapping C++ Class
posted by Stephan Brumme
Parsing Files The Easy Way
Recently I had to do a lot with file loading and parsing. It can be especially tricky to come up with a fast solution if you have to jump around within these files. Seeking and proper buffering were my biggest problems.Memory mapping is one of the nicest features of modern operating systems: after opening a file in memory-mapped mode you can treat the file as a large chunk of memory and use plain pointers. The operating system takes care of loading the data on demand (!) into memory - utilizing caches, of course. When using my C++ class
MemoryMapped
it's really easy:
Windows is completely different from Linux when it comes to opening a file in memory-mapped mode.
The class MemoryMapped
hides all the OS specific stuff in only two files:
MemoryMapped.h
and MemoryMapped.cpp
.They compile without any warning with GCC 4.7 and Visual C++ 2010. I haven't tried other compilers but they should be able to handle it, too, even when they are a bit older.
Download
Latest release: September 17, 2013, size: 2552 bytes, 100 linesCRC32:
5d202964
MD5:
6efd1a7cea536fbd88cf4f02b4c95bcf
SHA1:
f7d0c73a035262f9264724e1ba5d31b50c504c98
SHA256:
2fe563f3d9c24d563ce25c5cc2ffb9c7d2115782fe3a9ebf465d0a9a9a22c9f3
Latest release: November 4, 2015, size: 6.0 kBytes, 322 lines
CRC32:
6aab600a
MD5:
643a883c9aa720a3f39f068f9dcaf463
SHA1:
0ecfc2cb380a7c9b1e023d3889bd3d9a05375fe6
SHA256:
d9cad2e388bae4cc2a00105f4841b785a306c78817b2a85fa943a5059aa4eb73
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com
License
This code is licensed under the zlib License:This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution.zlib License
Changelog
- version 2
- latest and greatest
- November 4, 2015
- fixed bug in
close()
- Git tag
portable_memory_mapping_v2
- version 1
- September 17, 2013
- initial release
- Git tag
portable_memory_mapping_v1
Pro and Cons
The code can be used in a variety of environments:- it supports Linux and Windows
- it supports 32 and 64 bit CPUs
- it supports large files (>2GB)
- Read-only access to files
Interface At A Glance
You can open a file in theMemoryMapped
constructor or by calling the open
method.
The file is automagically closed in the destructor or by calling close
.
Note: it's a good habit to verify that isValid
returns true
after the desired file has been opened.Here is a shortened version of
MemoryMapped.h
:
Large Files On Small Computers
Since memory mapping loads pages only on-demand you can usually map the whole file. However, this is not possible for large files (>2GB) on 32 bit systems. Then you have to implement your own algorithm and callremap
whenever the file position you are looking for
is not currently mapped into memory. For example:Of course, you don't have to worry about that on 64 bit Linux or Windows.
Demo Program mywcl
I need the Unix tool wc
daily at work. Well, to be precise, I use wc -l
.
The idea behind wc -l
is pretty simple: count all line endings.If your file is completely mapped to memory, the core routine becomes a simple
for
-loop:
The full program is only 45 lines long:
Maybe you have noticed the #pragma
in front of the for
-loop.
This simple line (in addition to the -fopenmp
compiler option) enables multi-core line counting:
If the file is already cached in memory then my code outperforms the good old wc -l
:
(test data: first 1 GByte of Wikipedia)
To be fair, this situation is the only one where my code is faster than wc -l
...Whenever the file has to be read from disk or on single-core machines
wc -l
beats me easily.Moreover,
wc
accepts data from STDIN (standard input) which is handy for piping.
mywcl
on the other hand only works with files.Here the performance timings on my Raspberry Pi: (test data: first 100 MByte of Wikipedia)
Download
Latest release: September 18, 2013, size: 980 bytes, 45 linesCRC32:
bcf9eca4
MD5:
9a20c6af8e41c92b601397632dbf795e
SHA1:
8dc3d5e35cfe54c4958f2bcd96973db8ddcab4e2
SHA256:
216f321fd2ec55752b19cd7ccdb7e3c6bf804c55a5b9fdb56ad1ed4fa15cea98
If you encounter any bugs/problems or have ideas for improving future versions, please write me an email: create@stephan-brumme.com