SiriDB

  • Posted on: 29 September 2017
  • By: TransceptorTech...

SiriDB is an open source time series database, developed by a Dutch startup company.

Jeroen, one of our software developers, started developing SiriDB and created the skeleton for SiriDB in December 2014. For our infrastructure monitoring system we needed a system that could make data such as memory, disk and cpu usage clearly visible. At that point we were using google’s BigQuery, however, google’s BigQuery is good at processing large amounts of data, showing this data in graphs was not a strong point. We needed a database that was able to process large amounts of input data and was equally skilled at retrieving data (and the possibility to aggregate this data).
The proof-of-concept was developed during Christmas 2014, we used Python for this version. After this development we continued working on SiriDB for a year. In this year we developed an own query language and improved the scalability of the database. After this year we had a version that was scalable on the fly (multiple pools) and it was possible to save the data in a robust way. We created the robustness of the database by always having two servers in one pool. This creates extra performance when both servers are online and when one is turned of the database is still functioning. This is useful because it allows us to install SiriDB updates while running without disrupting operations.

Now that we had this version, we started thinking about how we could further increase the performance and how to get the memory use down. Python has a considerable overhead on both the performance and memory usage. Where speed was really important, for example the aggregation functions, we had already written the code in native C. Then we decided to write the entire database in C as well. After one year, we completed the first version of SiriDB completely written in C.

How does SiriDB work?
A SiriDB cluster consists of a minimum of one server. When your database grows, you can set up a new server that lets you expand your current database. When creating a new pool, SiriDB automatically distributes the existing data over these two pools evenly. This all happens in the background while the database remains functioning. This process can be repeated and SiriDB will distribute its data each time. The algorithm works so that data is never moved back to its 'previous' pool. When you go from two to three pools, each of the two pools will be moving a third of its data to the new pool.
A SiriDB server only has knowledge of the series in their own pool. The server only knows that if a series exists, in which pool that these should be present. That way we can send queries and inserts efficiently to the correct server. In order to make SiriDB robust we can provide each pool with two servers. The moment you decide to add a second server to a pool, all required data will be synchronized in the background. When this process is finished, the new server is fully functional and will be used to answer queries and handle inserts.
We parse queries with libcleri (a parser that we developed ourselves). Data is sent internally by using QPack (a self-developed message serializer).

Stand
We’d like to share our knowledge with the community, inform people about how we started developing SiriDB, what decisions we faced and discussing with other open source fanatics what they would have done and listen to their ideas.
At the stand we’ll break down some popular measurements for time series and see how they can be implemented in SiriDB. We'll look at the various ways that SiriDB can be applied in different environments. A SiriDB datasource plugin for Grafana will also be explained and demonstrated. We feel like the information we have come across in the process of developing SiriDB can be very useful for other (open source) projects as well.

Undefined
Contact: 
Pea Smeets
Website: 
http://siridb.net/
Edition Year: 
2017