Welcome!

Cloud Event Processing - Analyze, Sense, Respond

Colin Clark

Subscribe to Colin Clark: eMailAlertsEmail Alerts
Get Colin Clark via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Top Stories by Colin Clark

Well, it would appear that Michael Stonebraker may have hung up his research hat and joined the marketing team. First, read this, “Will the Real Column Stores Please Stand Up?” And now read my reply, which Vertica has yet to approve on their blog. Mike, You’re describing an implementation, not an algorithm.  An analogy would be saying that anything that didn’t look like Hadoop wasn’t map/reduce. Column stores exist because disk is slow – not because they’re some new and magical way to store data.  As a researcher, you’re well aware of the fact that it is impossible to prove that any solution is optimal given any problem set.  All one can do is optimize (potentially) the application of a particular implementation to a problem set and hope for the broadest application. Vertica’s product has been optimized to a particular problem.  Other vendors have chosen potentially ... (more)

Real CEP – Cloud Event Processing

Ok, so if you’ve been following along, you’ll already know: Installed RabbitMQ, and Written a Twitter OnRamp that subscribes to a sample Twitter feed and publishes it on RabbitMQ, and Written a RuleBot that takes the tweet and breaks it into chunks, putting the chunks back onto the bus. And now I’m going to show the Esper code that gives us the top URL’s in the last X seconds found in the Twitter feed.  Here’s a screenshot. Here’s some output in text mode (easier to read): Event received: {URL=http:\\cnn.com, Total=731} Event received: {URL=http:\\colinclarkeventprocessing.com, Tot... (more)

Writing Bad Code in Map/Reduce

In the Twitter project we’ve been working on, one of the map’s we’re running breaks the text of a Tweet down into words. Because we can’t assume that any data will be available for access via a database, etc, we attach a couple of values that we’re interested for later analysis to the word, attach a 1, and emit the tuple. This is an example of what the tuple looks like: "TimeZone", "Location", "ScreenName", "Word", 1 This map is produced from a tweet that contains many words, so, crunching the Tweet down will result in many of the above values being duplicated. "TimeZone", "Locati... (more)

Why CEP in the Cloud Makes Sense

CEP isn’t really about low latency.  The ability to do things quickly is important, just as in any system – especially those systems that grow and need to handle a lot of information.  Doing things quickly means doing things efficiently.  And doing things efficiently means less money spent on hardware.  Theoretically anyway. SO WHAT IS REALLY COOL ABOUT CEP? CEP gives one the to submit queries like “select symbol, avg(shares) from trade_stream group by symbol over 5 minutes emit every 1 minute.”  The CEP engine would consume this query, and then start returning an average of sh... (more)

DarkStar Filters Twitter Stream in Real Time

In this video I show how to use DarkStar to filter the Twitter stream ... (more)