Welcome!

Cloud Event Processing - Analyze, Sense, Respond

Colin Clark

Subscribe to Colin Clark: eMailAlertsEmail Alerts
Get Colin Clark via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: CEP on Ulitzer

Blog Feed Post

Building a Back Testing Platform for Algorithmic Trading

Let’s first examine what market data looks like

On this continuing series, I am examining thoughts and specific implementation details around building a back-testing platform for algo trading.  Eventually, we’ll see where complex event processing plays and how to implement it.

Appendix to Part One – The Data Format

Rather than looking at various database solutions first and then trying to define the problem in terms of those solutions, let’s first examine what market data looks like.  In its most simple form, market data looks like this (there’s usually a little more, but this is fine for our purposes):

  • Date: The date of the market data,
  • Time: When did the quote occur during the date,
  • Sequence #: Most quote or trade streams include a sequence #,
  • Symbol: What security is this data for?
  • Best Bid: The best bid (we’re going to concern ourselves with BBO data for this series, it’s easier),
  • Best Bid Size: How much does someone want to buy at the Best Bid,
  • Best Offer: The best offer,
  • Best Offer Size: How much does someone want to sell at the Best Offer.

Consider this chart:

Market Data


If we break down data, we can successively see how data might be arranged on disk for subsequent reading.  We want to read the data very quickly.  If we were using a standard relational database, it’s easy to see that we might be replicating some unnecessary data during the reads.  And we if use a typical columnar database, we can see that there are chunks that could be read together increasing throughput.

For example, for any given millisecond (Time) in a quote feed, there may be more than one symbol with a quote.  In fact, that’s quite common.  So replicating the time stamp is superfluous.  So if we had a table for a date’s worth of data, then we’d have a Time column that was replicated throughout the table.  No reason to do that.

Looking again at the data, we can see that, for a given time, there might be multiple quotes available for multiple symbols.  We’d like to read those in order as a little group.  By organizing the data on disk as a flattened multi-dimensional map of maps, we would:

  1. Start with a given day (our table),
  2. Start with a time (our row),
  3. Read each quote in sequence # order (our column)
  4. Process (do something)
  5. Increment the time, and go to #2 above until we run out of data (lather, rise, repeat)
  6. Put the $ in the bank

If we could write this data structure to disk as we get it from the quote feed, and had fast enough disk, we could keep up with the feed.  If we needed to create some indexes on the data, we could easily do that as well.  We’d simply create another table that would hold an inverted list of time and sequence #’s by symbol.  If we want to process a day’s worth of data, we’re all set.  If we want to process a symbol, or group of symbols, we’re all set.

So, to summarize, we need a hybrid approach.  In some places, we want rows of data – storing columns of data via a unique key.  In our case, that’s the Time column above for a given day.  The row above is Time, the column (or Super Column) is the Quote for a Symbol.  The Super Column’s key is the Sequence #.  Can anyone guess which database might fit nicely for this use case?

In my next post, I’ll describe a formalized data structure and it’s implementation.  I might even include a little code for all you #NoSQL guys and gals out there.

Thanks for reading!

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to colin@cloudeventprocessing.com

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.