Unsafe

There is few ways to work with memory in java:

  • Byte buffer
  • Memory mapped file
  • Unsafe
  • SHM (on linux)

And which one is faster?

First of all lets describe problem what we’re trying to solve with ‘raw’ memory. It can be some cache or data storage for large set of objects. That objects has complicated lifecycle so we don’t want GC to know about them. Or it’s just a stream that we want to write directly to the socket. Anyway we can split our data into fixed size chunks (like hdd does) and deal with every chunk ‘atomicly’.

Chunk size: 1KB (1024 byte array)
Chunks: 1 000 000
Target: Amazon large ubuntu 12.04 instance
Will count every chunk read and write operation at once.

Byte buffer

Direct and heap byte buffers. At first we set cursor to actual position than read / write array.

private final ByteBuffer map;
public int putChunk(byte[] chunk, long offset) throws IOException {
    int freeChunk = findFreeChunk();
    long pos = ((long)freeChunk * (long)chunkSize);
    map.position((int) pos);
    map.put(chunk, (int) offset, (int) ((int) chunk.length - offset));
    return freeChunk;
}
public byte[] getChunk(long id) throws IOException {
    byte[] bytes = new byte[chunkSize];
    long pos = ((long)id * (long)chunkSize);
    map.position((int) pos);
    map.get(bytes);
    return bytes;
}
Random access file

In RAF we work with channels:

private final FileChannel channel;
public int putChunk(byte[] chunk, long offset) throws IOException {
    int freeChunk = findFreeChunk();
    long pos = ((long)freeChunk * (long)chunkSize);
    channel.write(ByteBuffer.wrap(chunk), pos);
    return freeChunk;
}
public byte[] getChunk(long id) throws IOException {
    byte[] bytes = new byte[chunkSize];
    long pos = ((long)id * (long)chunkSize);
    channel.read(ByteBuffer.allocate(chunkSize), pos);
    return bytes;
}   
Random access file (mmap)

Same RAF but maped to memory:

private final MappedByteBuffer map;
public int putChunk(byte[] chunk, long offset) throws IOException {
    int freeChunk = findFreeChunk();
    long pos = ((long)freeChunk * (long)chunkSize);
    map.position((int) pos);
    map.put(chunk, (int) offset, (int) ((int) chunk.length - offset));
    return freeChunk;
}
public byte[] getChunk(long id) throws IOException {
    byte[] bytes = new byte[chunkSize];
    long pos = ((long)id * (long)chunkSize);
    map.position((int) pos);
    map.get(bytes);
    return bytes;
}
Random access file (SHM)

Same RAF but mapped to linux SHM system (/dev/shm/shm1). Access to mem as file.

Unsafe

Copy memory directly to/from byte array:

private Unsafe unsafe;
public int putChunk(byte[] chunk, long offset) {
    int freeChunk = findFreeChunk();
    long pos = memStart + ((long)freeChunk * (long)chunkSize);
    unsafe.copyMemory(chunk, Unsafe.ARRAY_BYTE_BASE_OFFSET + offset, null, pos, chunkSize);
    return freeChunk;
}   
public byte[] getChunk(long id) {
    byte[] bytes = new byte[chunkSize];
    long pos = memStart + ((long)id * (long)chunkSize);
    unsafe.copyMemory(null, pos, bytes, Unsafe.ARRAY_BYTE_BASE_OFFSET, chunkSize);
    return bytes;
}

Result

Warmups and stuff care by JMH

Test Ops
BufferDirectTest 2076.198 ops/ms
BufferHeapTest 2080.842 ops/ms
RAFDirectTest 147.714 ops/ms
RAFMMapTest 2024.531 ops/ms
RAFSHMTest 2015.644 ops/ms
UnsafeTest 2242.726 ops/ms

As you can see unsafe is little faster, but it’s small difference. The final answer for me is:

If you don’t need to handle set large than 2GB choice ByteBuffer (it’s api looks very well). If you also need persistance – mmap file can dump your mem for single api call. Unfortunately, perfomance of mmap file is pretty unexpected so you can’t predict time to read/write ops.

If you need a large set of data (more than 2GB) and you’re using linux – choice SHM without any doubts. It’s offer you good speed, persistance across java restarts and nice RAF api.

And, of cource, the fastest and the most flexible way to operate with mem is unsafe. But it should be you last choice because it also offers you the hardest debug and support.

Stuff on github.

2 thoughts on “Unsafe

  1. Hi Alexey,
    Thanks for the article. This is exactly what I was looking for to decide what should be the default method for buffer management in wonderdb.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax