Logging: Unsexy, Important, and now Usable.

Using logs is now as easy as producing them.

Logging is perhaps the least sexy part of writing software, yet it’s surprisingly important. There’s lots of boring stuff that causes heated arguments (like whitespace, build systems, and version control). But no-one has impassioned, snarky blogs or tweets about what format your logs should be in. Much like cell phones or the Internet, we don’t realize how we need them until it’s too late.

We’re drowning in information from our servers. Especially when dealing with distributed/Big Data systems, which are growing incredibly fast. Even when building small server clusters, my co-founder Nick and I found it hard to learn what was going on. So we chatted, “Well, what’s the biggest step forward in web accessin the past 15 years? Search!” In a few brief moments, we set about sketching up some ideas for a generalized “log search” engine, and ended up naming it “LogSearch” (we’re literalists.)

Logging is something engineers have always needed, because software (no matter how experienced you are) is notoriously unpredictable. When you’ve got more than 10 computers, it’s difficult to keep track of what’s going on day-to-day, and even worse when Something Goes Wrong. When the inevitable crisis comes, you don’t want to spend a few hours poking through log files on 30 different machines. You want to be a hero and find the problem in 5 minutes, then get on with your life. Or maybe you want to take a longer view of things — run some interesting statistics on your logs, and study usage patterns.

Let’s ponder the implications of that for a second. What if you could log every action taken in your application, and analyze it to exactly how your application was used? Delicious.

All of us at Drawn to Scale have dealt with distributed systems for a while. We like to build stuff we would use. That’s why we built something that lets us:

  • Be absurdly easy to use in the cloud or in a datacenter
  • Easily ingest any amount of log data (hundreds of TBs)
  • Find structured fields (like timestamps)
  • Search these logs (we usually don’t know what we’re looking for when something goes wrong)
  • Provide the actual text from the log itself
  • Build trend charts (i.e., how many errors/day?)
  • Send alerts when something looks “strange”
  • Run Hadoop/MapReduce scripts to provide interesting analytics

If you had a log search engine, with various utilities built on top of it, you’d have an easy way to see what’s going on in the apps that drive your business. You could…

  • Be aware of problems *before* they cause critical downtime
  • Monitor trends so alerts can be sent for odd activities
  • Examine long-term utilization patterns to optimize software
  • Quickly diagnose when machines crash or become flaky
  • Look smart and “in touch” with your systems!
  • Save piles of money!
  • Make piles of money!

Isn’t Monitoring Software Enough?

Engineers and ops people have a fantasy of coming into work, pulling up a dashboard on their web browser, and understanding everything that is going on. We want to be like Geordi in Star Trek, who can see everything on the Enterprise with a few finger presses. I wish there was such a thing! The closest we can get is ganglia, which monitors OS-level events (it can be hacked for apps). It’s pretty cool, but only so useful. Monitoring requires serious engineering effort – even if planned from the very beginning (it rarely is), there’s quite a bit of code that must be written and maintained. Monitoring services need to run on each box, and the central servers become a system of their own that needs to be reliable, redundant, and robust.

Logs, however, are a piece of cake. Here’s what usually happens.

  1. Put a line of code similar to Log.Error(“Something horrible happened!”); in your application
  2. The message and a Timestamp appear in a /logs directory on the disk
  3. Logs are cleaned up every few days by an automated process

Since logs are so simple to implement, it’s easy for us lazy engineers to toss them into everything we do. The problem is that when they grow beyond a few hundred MB, it takes forever to search them for what you want, and it’s even harder if spread across multiple machines!

Why it’s Hard

So It’s easy to fill logs with lots of tasty info. But it’s hard to make sense of it all. It’s semi-structured text. This means in order to find what you want, you need to explore the data: you need to search it. The only tool most of us can use is grep — tedious and manual if you need to troubleshoot several dozen logs on each box. There’s also interesting information in your log that isn’t just raw text — dates, server addresses, error types, and more. So you have to build something to parse meaning out of it.

You end up having to design a distributed system just to parse handle logs. There’s commercial tools out there which claim to do this, but they fail at large scale.

Draw to Scale has built LogSearch from the beginning to handle Big Data.

How We Do It

The Drawn to Scale BigSearch platform is built on a ton of distributed and NoSQL tools, combined with quite a bit of Secret Sauce. Our goal is to provide an end-to-end data storage, search, and serving platform. To do this, however, we need to know what the system does, and prove its capabilities. That’s why LogSearch is a great tool to put on top of BigSearch.

We run in the cloud or in your datacenter. To run our system in the cloud (specifically, AWS), it’s rather simple: spin up a few instances with our machine image, and install a script on each of your boxes to find the logs and push them onto the LogSearch cluster. Since every component of our infrastructure is scalable, it can handle as much data as is thrown at it.

Everything is seamlessly processed with Hadoop, stored in HBase (a clone of Google’s BigTable), indexed, and made searchable in a few minutes. All you have to do is go to a webpage and search for what you want to find in the logs. In the near future, you can upload MapReduce, Pig, or Hive scripts to provide more complex analytics.

Are you excited? (I am, I wrote a whole friggin’ article). Want to learn more? Follow me on twitter at @lusciouspear, or drop me a line at info@drawntoscalehq.com. We’d love to show you a demo, or get your ideas on what would make this tool Really Awesome and Indispensable.

Share and Enjoy:
  • Reddit
  • Digg
  • Google Bookmarks
  • Technorati
  • del.icio.us
  • Facebook
  • Twitter
  • StumbleUpon
  • E-mail this story to a friend!
  • RSS
  • HackerNews
  • Slashdot

12 Responses to “Logging: Unsexy, Important, and now Usable.”

Leave a Reply