Cloud Event Processing - Analyze, Sense, Respond

Colin Clark

Subscribe to Colin Clark: eMailAlertsEmail Alerts
Get Colin Clark via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Data Services Journal, CEP on Ulitzer

Blog Feed Post

Standalone CEP Is Dead - Long Live the Database

If your CEP solution doesn’t have a good persistence story, you’re toast

In days of old, when CEP didn’t exist, and we called it ESP, or Event Stream Processing, the whole value proposition that most vendors in the space espoused was, “We don’t have to write stuff to the database to process it.  And that makes us really fast!”


What made me start thinking about this was all the stirring lately in the in-memory database area.  Hasso Plattner’s (SAP’s chairman) been working on this for quite some time and last week announced some fairly startling news.  SAP is planning on using both row/column, in-memory store for everything.  Because the underlying database is so fast, a lot of stuff that had to be pre-calculated previously, doesn’t have to be any more.  And that can cut down a database size by an order of magnitude.  Small enough to fit in memory. How?  Let me explain.


A company’s OTP (online transaction processing) environment is where the money is made.  It’s typically a row-based, transaction oriented, ACID compliant store.  Vendors like Oracle, Sybase, & Microsoft dominate here with a growing segment of PostGreSQL and MySQL use.  A business needs speed and they need transactions here – you want to know if someone has bought something or not and it needs to happen quickly so that customers don’t get upset and go somewhere else.  The data models are normalized, key-based, and sparse.

When a company wants to analyze sales, or costs, or whatever, they typically extract all the data from their OTP environment, transform and load it into their data warehouse and may update their OLAP environment at the same time.  Updating the OLAP environment involves taking all of the transaction data from the OTP environment and exploding it into huge fact tables with corresponding dimension tables.  And lots and lots of pre-aggregated results. This is all so that end user OLAP tools can spin the data to provide analysts with way to analyze all the OTP data.  Questions like “let me see sales by product by region by quarter,” or planning questions like “what happens if we raise salaries by 5%?”  This creates a lot of data.  And all that data takes more space.  All because companies don’t want to mess with the OTP environment – and because disk is slow.


Most of the database technology so-called innovations in the relational world exist because disk is slow.  Normalized tables, indexes, etc. all exist so that data can be moved around as fast as possible because disk is slow.  And when relational databases were invented, disk was really slow.  But if an in-memory database was fast enough to not only handle the needs of an OTP environment, but also produce exploded fact tables and compute dimensional analysis on the fly, we don’t need all that extra disk space.  Sure, we still need transactions because we need to know when something happened.  But that can be done with memory now too.


But if in-memory databases are that fast now, what happens to CEP’s value proposition?  Every meaningful CEP based system I’ve worked on in the last 5 years has involved some form of persistence somewhere.  Seems like you can’t really separate the two.  So why not combine them if in-memory databases are fast enough now to support this type of behavior?


CEP is fine for solving some problems, but typically not those problems involving either a lot of data or a lot of compute.  Most CEP systems involve taking a high velocity data stream, decorating it with some fairly simple calculations, and then doing something when a trigger fires.  A perfect example of this is algo, or High Frequency Trading.  We haven’t seen CEP in sophisticated derivative or fixed income environments because of the compute required – that doesn’t fit the CEP model of yester-year.

Using massively parallel processing databases & map/reduce solves a very big problem in this area – data affinity.  In a grid or cloud based compute model, it’s easy to saturate the network getting data from where it lives to where it needs to be processed.  If a compute process is broken down using a map/reduce foundation, and the compute is run where the data lives, and results are then bubbled up, not only does the compute get done faster, but there’s less of a chance of saturating the network as well.


That seems like a good idea to me.  Maybe the resulting latency isn’t low enough for things like algo trading, but then most applications don’t require that kind of latency.  On the plus side, you get to take advantage of virtual resource, cloud based things like compute, storage, and network.  That way, you can dynamically add more compute when you need to; as your business grows or to accommodate spikes.  And my bet is that the latency problem isn’t that far from being solved; even for algo trading applications.


So if you’re CEP solution doesn’t also have a good persistence story, you’re toast.  And if your database solution doesn’t have a good CEP story, you’re toast.  I know vendors in both spaces who are not being considered for opportunities because they’re missing one of those critical components.  Vendors like Oracle, Sybase, and now SAP agree with me.  And customers do too.  Customer’s are always right, right?


Thanks for reading.

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to colin@cloudeventprocessing.com