Distributed Testing--How Does Your Project Do It
From Google Summer of Code Mentor Wiki
Contents |
[edit] Problem/Overview
Validate correct behavior and semantics (e.g., change visibility, cache consistency) in a complex distributed application, ours is a filesystem.
Typical deployment at institutional sites involves 10s of fileserver and thousands of clients. Largest known deployment within an organization has hundreds of fileservers and up to 2000 clients per file server, with expected growth to 5000 per file server over the next 1-2 years.
Seems to introduce the problems of orchestrating the activities of (lots of) clients, and synthesizing results from many of them (and from the servers, which we're also interested in).
Admission: we aren't testing at this scale currently, with much sophistication. Would like to learn about techniques or frameworks that projects have used to solve related problems, or discuss suggestions for development.
[edit] Objectives
- Large scale validation under realistic conditions. Would like to correlate views of the system on many clients, and verify assertions across groups of clients participating in distributed operations.
- To a lesser degree, distributed load generation and performance measurement--maybe mostly as a "ground state" of activity while specific transactions (distributed test cases) are verified
[edit] Plausible-Sounding Approaches
- agent based?
- message queue approach (eg, AMQP, Erlang [Tsung], something else)?
- job queueing (eg, GridEngine and relatives)?
[edit] Stuff I Know About
- GoogleFS testing framework (http://googletesting.blogspot.com/2008/05/performance-testing-of-distributed-file.html)
- I don't know _much_ about it, and it appears specific to Google environment
- Ad-hoc tools at LLNL, probably used batch submission for activation and results collection, because target were cluster nodes using Lustre
- PlanetLab
- not clear it's solving this problem
- Tsung - a rather interesting tool written in Erlang, mostly an HTTP and XMPP test framework currently
- Would $clients allow tons of their hosts to run it? Would I?
---
[edit] CTest / CDash
Subscription system (Jim) for distributed build and test cycles, with very intuitive gui.
CDash dashboard based on CTest (i.e., CMake) nc
- coordinates reporting on test cycles running on distributed clients
- continuous build
- supports Windows and many Unix clients
Matt Looks very interesting, esp. because there is interest (a real need to) rework the openafs build system--which is different on Unix and Windows and not scaling to handle multiple build targets on the same platform (lwp vs. pthreads, 32 and 64 bit compiles and libraries, etc)
Open issues for us--need for "synchronized tests"--tests that involve coordination between multiple clients at one or more points during the course of tests, eg, propagation of changes, consistency tests, etc
Bart
Showed CMAke on Windows, using one of his projects in eclipse.
[edit] CMake
- has eclipse and visualstudio integration
- has NSIS drive
- MSI driver? No
Links:
[edit] Synchronized testing
CTest does not have any specific provisions for testing a distributed system. Any program that returns with success or failure can be used as a test, however. Handling the synchronized testing of the distributed system is then a matter of writing appropriate test programs that can be called using CTest. The procedure for adding a simple test is described here

