Catching Elephant is a theme by Andy Taylor
From time to time, software will fail to perform and scale as desired. The Developers usually have an idea of the problem but may not have precise details. More importantly, the Developers may not have a good idea of what to fix. In such cases, a running Benchmark is a good idea.
A Benchmark is a test of the performance and scalability of the software system. Benchmarks answer questions like:
For small-scale software, Benchmarks are pretty simple. A performance profile e.g. Profile from Dynamic Memory Solutions, can be used to determine where a particular process is spending its time. For scalability, the system can be flooded with records until something snaps.
Large-scale software is much more complex. A software system may contain dozens of major pieces. Consider all the complexity of using a mobile phone to make a purchase via the web browser. At least three giant system are involved, the telephone network, the internet and the credit card processing. Each of those system is composed of hundreds of subsystems. If your browsing is slow and times out often, which system is to blame? Benchmarks can help to answer this question because they can determine maximal load.
When setting up a Benchmark there are several elements to consider:
When setting up a Benchmark for a complex system, keep these caveats in mind.
Benchmarks are a great way to gain focus on a project. Gaining deep insight into performance, scalability and Architecture, often results in modest changes with large ROI.
In my last article, we were discussing some elements of low latency, highly scalable systems. Such systems are very complicated but have some common elements.
We covered latency and scalability in the last article. Lets dig into Robustness.
Robustness is the ability of a software system to continue operating under adverse conditions. In hardware systems, Robustness is termed Fault-Tolerant. There are many parallels between hardware and software but big differences as well.
Robustness under predictable error conditions is a relatively straightforward. Modern realtime systems must be robust under all conditions which is much more complicated. Some examples:
As Software Engineers, we are used to thinking about processes having trouble. Network trouble is often considered in designs. I’ve never seen a software design document that discussed the more catastrophic errors. In a very real sense, there is nothing that can be done in software about these issues. How is a software system supposed to cope under such duress?
The answer lies in the complex field of Disaster Recovery(DR). It used to be that DR was a reactive strategy for failure of some part of a software/hardware system. In modern system, DR and Robustness are merging into a proactive strategy. In this model, Robustness is used to avoid a disaster altogether.
Some common strategies:
Robustness is a key facet of any modern real time system. As Effective Software Engineers, we should strive to consider Robustness at all points in the Architecture.
An overview of the Process of creating software.
I’ve just started a new project in the Finance sector. The goal is to lower the latency of a trading system into the low microseconds range. The goal is very challenging when compared to traditional Telecom sector latencies.
I’ve found out about some new software and hardware that I’ve found very interesting.
I’m interested to see what Architecture decision will be required to meet the stiff latency goals. The hardware is cool stuff but the software is my key interest. I’ll keep you updated as the project progresses.
As software system become more integrated into every aspect of our lives, there is increasing demand for low latency, highly scalable systems. Telecommunications, gaming, social networking and financial trading systems are good examples
Many of these system are ‘messaging’ based. In this context, a ‘message’ is a small bit of work that the system must process. Depending on the market segment, the contents of the message will vary. Typically, a particular message doesn’t require much processing but the volume of messages is huge.
Lets explore some of the facets of such systems.
When we throw all these requirements into a single project, Effective Software Engineers start to get excited! Whole books are written on each of these areas but lets hit the highlights.
How is Low Latency achieved?
How is High Scalability achieved?
The best case is hardware based scalability. In this model, new hardware is added as the volume grows. In a perfect world, the software system integrates the new hardware into the pipelines automatically. The scalability problem has largely been solved for comparatively simple cases e.g. a web search. Based on these working models, even more complex software systems are being fielded with massive scalability. This is one area where Software Engineering has made big strides.
Enough for now, I’ll hit the other areas in my next rant :)
C and C++ have a serious flaw in heap management. The whole concept of programmer driven, globally managed heap is broken. The design of malloc libraries have lead to countless hours of debugging and billions of dollars of loses. Random heap corruption is the worst issue but Memory Leaks are a major irritant as well.
Before digging in, lets define the parts. The heap is a large block of memory that may be used dynamically during a process lifetime. Allocations in the heap are variable sized and may be non-sequential. Unlike the stack, the heap may become fragmented as interior blocks are deallocated.
In C, blocks in the heap are allocated with the malloc library. The ‘new’ in C++ function is typically a wrapper on malloc with some additional code to call the constructor. The allocation strategy is actually fine. The compiler handles the details so it is hard to screw up a call to malloc/new.
A memory leak occurs when the last pointer to a block in the heap is lost before free or delete is called. Overtime, a leak will cause a program to run out of memory and/or start page swapping. Leaks can be very hard to fine as there is no immediate consequence.
A big problem is ownership. When a block is allocated, the programmer must understand the deallocation strategy. This maxim must be true every time. Memory Leaks are generally created when this rule is broken. If ownership is unclear then the block will be leaked or worst deallocated twice. Understanding ownership is the key to avoiding memory leaks.
There are several cases for ownership.
In function ownership is the simplest. The programmer just needs to call the delete at the end of the function/method.
int foo()
{
Object* obj = new Object();
// other stuff …
delete obj;
}
Loop ownership gets a bit more complicated. In the fragment below a leak will occur when an error is encountered.
int foo()
{
Object * obj = NULL;
while (true)
{
obj = new Object;
…
if(error)
break;
delete obj;
}
Return value ownership can be more complicated. The value can be returned via parameter or return value.
Object * foo(Object* & returnObj)
{
Object* obj = new Object();
returnObj = new Object();
return obj;
}
void main()
…
Object * otherObj = NULL;
Object* localObj = foo(otherObj);
…
return; // both objects leaked
In both of these cases, the ‘main’ function must deallocate the objects at some point. This case is probably the most common cause of leaks. Oftentimes the function is in a library and the programmer doesn’t clearly understand the ownership situation.
A smaller but common case is a complex ctor/dtor combination.
class Object
{
public:
Object() { subObject = new SubObject; }
~Object() { };
private:
SubObject* subObject;
};
In this case, subObject will leak every time the destructor is called for an Object instance. The error is obvious in this case but real world examples can be confounding in complexity.
Another common case for memory leaks is multiple execution paths. In complex code, the deallocation of a block may be inadvertently skipped. Error handling is often involved in this case. If the error is very uncommon then only a small leak will result. A high volume system can exhaust or overtax memory resources in just a few minutes however in a repetitive error case.
For complex code that has poor Cohesion, leaks via multiple execution paths can be a major problem to find and fix. Error cases must be carefully evaluated to avoid this situation. The various ‘smart’ classes can help but are no silver bullet.
In complex programs finding memory leaks via code inspection is basically hopeless. An automated tool like Dynamic Memory Solutions Leak Check is a must. The tool should be used in Unit Test cases and on full end to end testing. This step is not taken usually and leaks show up in production environments.
There is a computer science concept called a Memory Pool. In a Memory Pool, allocation is done as a heap but deallocation is done all at once.
Structuring your C++ classes to allocate from a Memory Pool can be a great way to avoid leaks. This idea works if a bunch of object are created, used and then can be destroyed as a group. The key is that a pointer in Memory Pool is always maintained until the objects are destroyed.
For C, it is even easier since no dtors need be called. In this case, the programmer can allocate from the pool but does not call a deallocate routine. Once the pool is ready to be discarded, a single call free’s all the blocks at once.
Memory Leaks in C/C++ are a real irritant. Use the techniques described here and your team can mitigate the problem.
I’ve always disliked exception handling in C++. The language gives only a half baked implementation. The implementation has several shortcomings
I now have another reason to dislike exceptions. I recently ran across a way to bumble a call to new. It involves the broken exception handling in C++ constructors.
It seems that a pointer assignment can be skipped if a ctor throws an exception. I am sure this is documented somewhere in the specification but it is clearly counter-intuitive. ’new’ should guarantee assignment of NULL on failure perhaps by assigning the pointer twice, once to NULL and once to the return value.
Here is an example:
#include <stdlib.h>
#include <iostream>
using namespace std;
class broken
{
public:
broken() { throw string(“ctor failed”); }
int value;
};
int main(int argc, char* argv[])
{
broken * brokenPtr = (broken*)0xdeadbeef;
try
{
brokenPtr = new broken; // assignment is skipped on the throw!
}
catch (…)
{
cout « brokenPtr « endl;
if(brokenPtr)
{
// brokenPtr still set to 0xdeadbeef!
brokenPtr->value = 10; // may crash
delete brokenPtr; // corrupts heap
}
}
return 0;
}
Following the maxim, ‘initialize all variables’ will avoid this trouble. In the above example, I should have set the brokenPtr to NULL.
In my real world example, the situation was more complex with the variable being used in a loop. In this case, the stale value was the previous (free’d) value of the pointer. I fixed the issue by explicitly assigning the pointer to NULL just before the call to new. It looks odd in the code to have consecutive lines assigning to the same variable.
Java’s looking better everyday :)
The steps in a properly engineered software project are truly astonishing. It really is no surprise that steps are skips and projects fail. Here is a list of most of the steps that I have been involved in:
Books have been written on each of these steps. No person can be an expert in the full Software Engineering Process. The Effective Software Engineer is familiar with all of pieces however and works to ensure success at every point.