Monitoring with Ganglia

Monitoring with Ganglia


View All Available Formats & Editions
Members save with free shipping everyday! 
See details


Written by Ganglia designers and maintainers, this book shows you how to collect and visualize metrics from clusters, grids, and cloud infrastructures at any scale. Want to track CPU utilization from 50,000 hosts every ten seconds? Ganglia is just the tool you need, once you know how its main components work together. This hands-on book helps experienced system administrators take advantage of Ganglia 3.x.

Learn how to extend the base set of metrics you collect, fetch current values, see aggregate views of metrics, and observe time-series trends in your data. You’ll also examine real-world case studies of Ganglia installs that feature challenging monitoring requirements.

  • Determine whether Ganglia is a good fit for your environment
  • Learn how Ganglia’s gmond and gmetad daemons build a metric collection overlay
  • Plan for scalability early in your Ganglia deployment, with valuable tips and advice
  • Take data visualization to a new level with gweb, Ganglia’s web frontend
  • Write plugins to extend gmond’s metric-collection capability
  • Troubleshoot issues you may encounter with a Ganglia installation
  • Integrate Ganglia with the sFlow and Nagios monitoring systems

Contributors include: Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Peter Phaal, and Daniel Pocock. Case study writers include: John Allspaw, Ramon Bastiaans, Adam Compton, Andrew Dibble, and Jonah Horowitz.

Product Details

ISBN-13: 9781449329709
Publisher: O'Reilly Media, Incorporated
Publication date: 11/30/2012
Pages: 256
Sales rank: 1,075,871
Product dimensions: 6.90(w) x 9.10(h) x 0.60(d)

About the Author

Matt Massie open-sourced Ganglia in 2000 while working as a Staff Researcher at the University of California, Berkeley. He designed ganglia to monitor a shared computational grid of clusters distributed across the United States for scientific research. In 2010, he contributed a chapter on cluster monitoring for the O'Reilly book "Web Operations: Keeping the Data On Time" by John Allspaw and Jesse Robbins. Matt is currently a software engineer at Cloudera focused on Apache Hadoop enterprise management and monitoring.

Bernard Li is a High Performance Computing (HPC) Systems Engineer at Lawrence Berkeley National Laboratory. He is currently one of the maintainers of the Ganglia project. He has been involved with HPC since 2003 and has worked on Open Source projects such as OSCAR, SystemImager and Warewulf.

Brad Nicholes is a member of the Apache Software Foundation and is currently working as a Consultant Software Engineer for NetIQ. In addition to being a committer on the Apache HTTPD and APR projects, Brad is also a developer as well as one of the administrators of the Ganglia project. As a developer on the Ganglia project, Brad developed and introduced the C/C++ and Python metric module interface into Gangla 3.1.x. He also developed and contributed several of the initial metric modules that currently ship with Ganglia. Brad attended school at the University of Utah and Brigham Young University and holds a degree in Computer Science.

Vladimir Vuksan (Broadcom) has worked in technical operations, systems engineering and software development for over 15 years. Prior to Broadcom he has worked at Mocospace, Rave Mobile Safety, Demandware, University of New Mexico implementing high availability solutions and building tools to make managing and running infrastructure easier.

Table of Contents

Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Introducing Ganglia;
1.1 It’s a Problem of Scale;
1.2 Hosts ARE the Monitoring System;
1.3 Redundancy Breeds Organization;
1.4 Is Ganglia Right for You?;
1.5 gmond: Big Bang in a Few Bytes;
1.6 gmetad: Bringing It All Together;
1.7 gweb: Next-Generation Data Analysis;
1.8 But Wait! That’s Not All!;
Chapter 2: Installing and Configuring Ganglia;
2.1 Installing Ganglia;
2.2 Configuring Ganglia;
2.3 Postinstallation;
Chapter 3: Scalability;
3.1 Who Should Be Concerned About Scalability?;
3.2 gmond and Ganglia Cluster Scalability;
3.3 gmetad Storage Planning and Scalability;
Chapter 4: The Ganglia Web Interface;
4.1 Navigating the Ganglia Web Interface;
4.2 The gweb Search Tab;
4.3 The gweb Views Tab;
4.4 The gweb Aggregated Graphs Tab;
4.5 The gweb Compare Hosts Tab;
4.6 The gweb Events Tab;
4.7 The gweb Automatic Rotation Tab;
4.8 The gweb Mobile Tab;
4.9 Custom Composite Graphs;
4.10 Other Features;
4.11 Authentication and Authorization;
Chapter 5: Managing and Extending Metrics;
5.1 gmond: Metric Gathering Agent;
5.2 Base Metrics;
5.3 Extended Metrics;
5.4 Extending gmond with Modules;
5.5 Extending gmond with gmetric;
5.6 How to Choose Between C/C++, Python, and gmetric;
5.7 XDR Protocol;
5.8 Java and gmetric4j;
5.9 Real World: GPU Monitoring with the NVML Module;
Chapter 6: Troubleshooting Ganglia;
6.1 Overview;
6.2 Useful Resources;
6.3 Monitoring the Monitoring System;
6.4 General Troubleshooting Mechanisms and Tools;
6.5 Common Deployment Issues;
6.6 Typical Problems and Troubleshooting Procedures;
Chapter 7: Ganglia and Nagios;
7.1 Sending Nagios Data to Ganglia;
7.2 Monitoring Ganglia Metrics with Nagios;
7.3 Displaying Ganglia Data in the Nagios UI;
7.4 Monitoring Ganglia with Nagios;
Chapter 8: Ganglia and sFlow;
8.1 Architecture;
8.2 Standard sFlow Metrics;
8.3 Configuring gmond to Receive sFlow;
8.4 Host sFlow Agent;
8.5 Troubleshooting;
8.6 Using Ganglia with Other sFlow Tools;
Chapter 9: Ganglia Case Studies;
9.1 Tagged, Inc.;
9.2 SARA;
9.3 Reuters Financial Software;
9.4 Lumicall (Mobile VoIP on Android);
9.5 Wait, How Many Metrics? Monitoring at Quantcast;
9.6 Many Tools in the Toolbox: Monitoring at Etsy;
Advanced Metric Configuration and Debugging;
Module Metric Definitions;
Advanced Metrics Aggregation and You;
Debugging with gmond-debug;
Ganglia and Hadoop/HBase;
Introducing Hadoop and HBase;
Configuring Hadoop and HBase to Publish Metrics to Ganglia;

Customer Reviews