Saturday, July 30, 2011

RaleighFS to enter in the in-memory key-value store market

A couple of guys asked me about RaleighFS, and why is called File-System instead of Database, and the answer is that the project is started back in 2005 as a simple Linux Kernel File-System, to evolve in something different.

Abstract Storage Layer
I like to say that RaleighFS is an Abstract Storage Layer, because is main components are designed to be plugable. For example the namespace can be flat or hierarchical, and the other objects don't feel the difference.
  • Store Multiple Objects with different Types (HashTable, SkipList, Tree, Extents, Bitmap, ...)
  • Each Object as it's own on-disk format (Log, B*Tree, ...).
  • Observable Objects - Get Notified when something change.
  • Flexible Namespace & Semantic for Objects.
  • Various Plain-Text & Binary Protocol Support (Memcache, ...)

A New Beginning...
Starting weeks ago, I've decided to rewrite and refactor a bit of code, stabilize the API and, this time, trying to bring the file-system and the network layer near to a stable release.
First Steps are:
  • Release a functional network layer as soon as  I can.
  • Providing a pluggable protocol interface.
  • Implement a memcache capable and other  protocols.
So, these first steps are all about networking, and unfortunately, this means dropping the sync part and keep just the the in-memory code (the file-system flush on memory pressure).

Current Status:
Starting from today, some code is available on github under raleighfs project.
  • src/zcl contains the abstraction classes and some tool that is used by every piece of code.
  • src/raleighfs-core contains the file-system core module.
  • src-raleighfs-plugins contains all the file-system's pluggable objects and semantics layers.
  • src/raleigh-server currently contains the entry point to run a memcache compatible (memccapable text protocol), and a redis get/set interface server. The in-memory storage is relegated in engine.{h,c} and is currently based on a Chained HashTable or a Skip List or a Binary Tree.

How it Works
As I said before the entry point is the ioloop, that allows clients to interactot through a specified protocol with the file-system's objects. Each "protocol handler" parse it's own format, convert it to the file-system one, and enqueue the request to a RequestQ that dispatch the request to the file-system. When the file-system has the answer push the response into the RequestQ and the client is notified. The inverse process is applied the file-system protocol is parsed and converted into the client one.


Having the RequestQ has a couple advantages, the first one is that you can wrap easily a protocol to communicate with the filesystem, the other one is that the RequestQ can dispatch the request to different servers. Another advantage is that the RequestQ can operate as a Read-Write Lock for each object allowing the file-system to have less lock...


For more information ask me at theo.bertozzi (at) gmail.com.