Welcome!

Cloud Event Processing - Analyze, Sense, Respond

Colin Clark

Subscribe to Colin Clark: eMailAlertsEmail Alerts
Get Colin Clark via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Latest Blogs from Colin Clark
As much as I disagree with much of what Curt Monash writes, he did actually ask a good question recently in his post, “Renaming CEP… or not” Without getting into a rehash of the hash over there, let’s look at … Continue reading →
Google started using MapReduce about 10 years ago.  Somewhere between there and now, Doug Cutting decided that he could copy it while at Yahoo and Hadoop was born.  Doug now works at a company named Cloudera who bills themselves as … Continue reading →
TweetFirst off, the names: Cloud Event Processing & DarkStar.  Not your typical Wall Street technology naming convention.  Add to that the chance to team up again with Colin Clark on his decades’ long quest to make sense out of massive volumes of “big data” – in real time.  B...
I recently became aware of an emerging practice most likely being implemented by clearing companies at the low end of the capitalization sprectrum offering a unique solution to the recent Market Access Rules. NO UN-FILTERED DIRECT ACCESS What the SEC is trying to do is remove, or reduc...
As we prepare to implement our Market Data repository to facilitate algo development and back-testing, you should have downloaded Cassandra and installed it by now.  What, you haven’t?  Well, click here, get it done and then come back for some fun.  To get things up and running o...
On this continuing series, I am examining thoughts and specific implementation details around building a back-testing platform for algo trading.  Eventually, we’ll see where complex event processing plays and how to implement it. Rather than looking at various database solutions ...
In this series, I’m going to outline in general, how to build a back-testing platform for the creation, tweaking, and subsequent execution of algorithms used in electronic trading. Part One – The Data I recently made some comments on Vertica’s blog in regards to what I cons...
Well, it would appear that Michael Stonebraker may have hung up his research hat and joined the marketing team. First, read this, “Will the Real Column Stores Please Stand Up?” And now read my reply, which Vertica has yet to approve on their blog. Mike, You’re describ...
Some predictions for 2011.  In no particular order or importance. 1. CEP – The Feature There’s a couple of things going on here.  The most important being that Mark Palmer is writing blog posts about Richard Tibbetts writing blog posts on the Tabb Group’s site about w...
I spend a significant amount of my time keeping up with advances in processing high velocity big data.  Over the last year, I’ve watched the NoSQL camp grow a lot.  And now, some folks are even forecasting a market approaching $2 Billion USD by 2015. The last time I saw that kind...
Much to the SEC’s consternation, the recent report detailing the causes of the May 6th, 2010 Flash Crash has failed to indict High Frequency Trading as the cause. BUT WE WANT THE MONEY How does all of this tie in with the SEC’s bid to build a huge consolidated audit trail? ...
Here are the slides from my recent presentation at the OMG group’s Capital Markets Symposium.
I was recently asked “What problems does CEP solve that cannot be solved with smart coding, a columnar database or a whopping great grid?” via a Linkedin group for Complex Event Processing.  Here’s the link.  I think I understand the question, and if I do, it’s ...
“Pirates and startups?” you ask.  ”Colin’s definitely lost it – too much time in ND.”  But wait, read a little bit of this and then tell me what you think. I love to scuba dive.  And I love history.  Much of the scuba diving I do is in the Caribbean ...
In this video I show how to use DarkStar to filter the Twitter stream.
This from the book, “Hadoop – the Definitive Guide,” “This, in a nutshell, is what Hadoop provides: a reliable shared storage and analysis system.” BUT I THOUGHT IT WAS ABOUT BIG DATA? It is, but Hadoop is not designed, at least today, for anything other t...
In the past year or so, I’ve heard from many skeptics – people who didn’t believe that Event Processing could be successfully deployed in the cloud.  Granted, most of these folks represented firms actively engaged in providing the High Frequency Trading (Algo Trading)...
I’ll be providing a demonstration of high speed data mining at OMG’s Event Processing Symposium focusing in Capital Markets this upcoming October 6, 2010 in NYC.  Here’s the link. Overview Using SAX, we’re going to simulate a high velocity market data feed using...
The examples that I've shown so far have been illustrated using Excel. But if we were serious about using SAX in a real world scenario, we'd most probably be processing some type of streaming data. SAX has application anywhere there's a bunch of highly dimensional, continuous data bein...
Once we've normalized the data, we can apply PAA,. I picked time divisions of an hour, and averaged the normalized price information. You can see the normalized price data and resulting buckets, as as computed via PAA in the chart at the right. There's something important to notice ...
Lately, I’ve been working on some interesting projects involving not just the usual suspects of stream processing, but data mining within high velocity time series. In conjunction with that effort, I’ve been doing a lot of research in the areas of symbolic representation, dimension re...
How many times have you thought to yourself, “Self, I’d really like to take a look at that wonderful, does everything that I need, server-based product” only to realize that you don’t have a machine, and if you did have a machine, you don’t have the OS bec...
We’re working with a customer who’d like to send us information using the FIX protocol.  FIX is used in electronic trading for sending orders and receiving executions from brokers, ecn’s, and exchanges. DARKSTAR SPEAKS FIX DarkStar, our cloud based, distributed event ...
CEP isn’t really about low latency.  The ability to do things quickly is important, just as in any system – especially those systems that grow and need to handle a lot of information.  Doing things quickly means doing things efficiently.  And doing things efficiently means ...
In days of old, when CEP didn’t exist, and we called it ESP, or Event Stream Processing, the whole value proposition that most vendors in the space espoused was, “We don’t have to write stuff to the database to process it. And that makes us really fast!” What made me start thinking a...
In a recent blog post, Tim Bass blames CEP for much of the world’s problems.  You can read his post here. You can read my response here: Tim, Lot’s of misinformation and knee jerk reactions are out there regarding HFT. Unfortunately, this is another one of them. Your posts are us...
There is a potential need for using a NoSQL database for storing some of the information above. Several come to mind - MongoDB & CouchDB for document storage, Cassandra for inverted indices, etc. We've even got some of these running with DarkStar right now ,consuming raw informa...
Everyone’s busy abstracting resources in the cloud – making resources like compute, storage and network available dynamically, based upon demand. But we need more services on top of that. When an application is deployed, wouldn’t it be neat if things like messaging, protocol hand...
In our last post, we looked at how to make bad map/reduce code better map/reduce code.  A natural fallout from breaking tweets down into words is the ability to build an inverted index to facilitate searching tweets by key words. It’s All in the Tweet Given the tweets, “@e...
In the Twitter project we’ve been working on, one of the map’s we’re running breaks the text of a Tweet down into words.  Because we can’t assume that any data will be available for access via a database, etc, we attach a couple of values that we’re intere...
Our recent work with Twitter will be our first deployment in this cloud infant. Using DarkStar, we’re able to analyze the Twitter feed to sense and respond to opportunities and threats. This relatively simple project includes components of event driven agents, NoSQL storage, CEP slidin...
So you can run your app in a data center – does this make your app cloud aware? I guess so but it certainly doesn’t take advantage of elastic resources. And chances are, that ‘cloud aware’ app you’ve got running in your private, public, or hybrid cloud doesn’t have a pricing model ...
I reread this article from time to time just to make sure that I stay within some boundaries – 21 Experts Define Cloud Computing. Among the 21, there are a couple that I really like; I’m going to cite a few of them over the next few days, and tell you what I like and don’t like about ...
I’ll be speaking at the OMG’s Event Processing Symposium 2010 this coming May.  We’ll be demonstrating streaming map/reduce, complex event processing, and advanced visualization (DarkStar & Telescope) with at least one specific use case to demonstrate how to use E...
This is a treemap visualization (from Panopticon) hooked up to some static data as crunched by our TwitYourl project, our first foray into Cloud Event Processing together.
Cloud Event Processing – Where’s The Data? There are several things left to cover in our #TwitYourl proejct.  One of the most glaring absences so far is storage – where do we put all of these Tweets?  What happens if we’d like to make changes to our RuleBots and...
First, we need a tool that will configure and provision any number of nodes in our cloud. There are several vendors that have products in this space and I’m not going to talk about them here (yet). Secondly, and more importantly, we need an architecture that is layered on top of the ...