<< Previous | Home

Testing and concurrency


Our team is currently working with a client on a medium sized, medium complexity Java application which has quite low test coverage. We are introducing characterisation tests
to snapshot functionality. These will give us the confidence to refactor away technical debt and extend the application without regression. One of the problems we are experiencing is the concurrent nature of the application. I have have worked on applications in the past which supported very high concurrency without issue but this application is different. I have not fully thought through why this application does differ but there are some obvious points:

  • This application spawns threads in Java code a lot. In previous applications we have always avoided this complexity by utilising somebody else's thread pool code.
  • I am used to stateless service classes which operate on domain objects. The stateless service classes obviously have no concurrency issues and the domain objects can be protected using synchronisation blocks. This application seems to have a lot more stateful objects that interact (this is anecdotal, I am have not analysed the code specifically for this attribute).

One of the first refactorings we are looking at is to remove all the Thread.sleep calls from test classes. The CI server reports significant number of test failures which turn out to be false positives. In a significant number of cases the use of Thread.sleep is to blame. I have seen two slightly different uses of Thread.sleep in the test code.

  1. The test spawns a thread which is calling some method of the class under test whilst the main test thread interacts with the class under test in some other way. The test thread calls Thread.sleep to ensure that the second thread has time to complete its processing before the test verifies the post conditions.
  2. The class under test contains some internal thread spawning code. The test thread again needs to execute a Thread.sleep to remove the chances of a race condition before firing the asserts.

Both these approaches suffer from the same problems.

  • The Thread.sleep might be long enough to allow the second thread to complete processing on one machine (e.g. the developers high spec workstation) but it is not long enough to allow the thread to complete its processing on a heavily loaded, differently configured, usually more resource constrained CI server. Under certain load situations the test fails. It works in others. The use of Thread.sleep has made the test non-deterministic.
  • Often the response to the above problem is to make the sleep longer. Yesterday I saw a very simple test which took over thirteen seconds to execute. Most of that test duration was sleeps. Refactoring to remove the sleeps resulted in a test that executed in 0.4 seconds. Still a slowish test but a vast improvement. The last application I worked on had 70% coverage with 2200 tests. If each one had taken thirteen seconds to execute then a test run would have taken almost eight hours. In reality that suite took just over a minute on my workstation to complete. You can legitimately ask a developer to run a test suite which takes one minute before every checkin and repeat that execution on the CI server after checkin. The same is not true of a test suite that takes eight hours. You are probably severely impacting the teams velocity and working practices if the build before checkin takes eight minutes. There are very few excuses for tests with arbitrary delays built into them.

To resolve both issues we introduce a count down latch.

Where the test spawns a thread, the latch is decremented inside the spawned thread and where the test code had a sleep a latch.await(timeout) is used. We always specify a timeout to prevent a test that hangs in some odd situation. The timeout can be very generous, e.g. ten seconds where before a one second sleep was used. The latch will only wait until the work is done in the other thread and the race condition has passed. On your high spec workstation it might well not wait at all. On the overloaded CI server it will take longer, but only as long as it needs. A truly massive delay is probably not a great idea as there is a point where you want the test to fail to indicate there is a serious resource issue somewhere.

Where the class under test spawns a thread (an anti-pattern I suspect) then we amend the code so it creates a latch which it then returns to callers. The only user of this latch is the test code. Intrusive as it is, it is often the only way to safely test the code without more significant refactoring. 

There are some larger issues here. Is the code fundamentally wrong in its use of threading? Should it be recoded to use a more consistent and simple concurrency model and rely more on third party thread pool support?

At risk of straying from my comfort zone of simple, pragmatic, software delivery, deep down, I have never been very happy about the implications of complicated multi-threaded code and automated testing. You can write a class augmented with a simple and straightforward test class which verifies the classes operation and illustrates its use. You can apply coverage tools such as Emma and Cobutura which can give a measure of the amount of code under test and even the amount of complexity that is not being tested. I am not convinced it is always possible to write simple tests that 'prove' that a class works as expected when multiple threads are involved (note I say always and simple).

I do not know of any tools that can give you an assurance that you code will always work no matter what threads are involved. Perhaps a paradigm shift such as that introduced by languages such as Scala and Erlang will remove this issue?

There is some good advice available regarding testing concurrent code and I am sure lots of very clever people have spent lots of time thinking this through but its certainly not straight in my head yet.

Importance of real time performance monitors

Decent log entries are essential. On our current project we aim to write enough data to allow off-line analysis of performance and usage plus errors. I am also an advocate of more immediate and accessible runtime information. Log analysis is great but sometimes you need empirical data right away. On our current project we use the Java MBean facility. These MBeans can be easily accessed in a graphically rich way using tools like JConsole or VisualVM.

We have a couple of different types of analyzer which we expose through MBeans. One simply records how many times an event has occurred in a short time period. Another calculates an real time average, again across a short time period. For example, we have analyzers which record the length of time it takes to make a call to a particular downstream application. Each duration is recorded and an average over the last ten seconds is reported via the MBean. This calculation has been implemented to make it very efficient from a CPU perspective since 99.999% of the time the average discarded before anybody bothers to look at it. Originally we were only using two or three of these average analyzers in the system. As developers found them useful they were placed around every single external interaction and we suddenly found ourselves with several thousand per application. These used about 25% of the heap and consumed significant CPU resource. The analyzer was then optimized and now consumes negligible resources.

I have been personally a little disappointed that our operations team have not made as much as of this facility as I expected. They are happy with their existing log analysis tools. As a team, we have questioned whether our investment in MBeans is worthwhile. We concluded that it was as even though the Ops team don't use it in Production the development group rely on the data exposed through JMX for trouble shooting, especially in system test, monitoring load tests and as a quick way to gauge the health of Production.

Last week I was reminded again how useful this immediately accessible data was. After a system restart Production was doing 'something funny'. We had ambiguous portents of doom and various excited people considering some fairly drastic remedial action, including switching off a production system which was serving several thousand users. The fear was that something in the affected system might be placing unbearable demands on downstream applications. This seemed unlikely as we have many layers of throttles and queues to prevent just such an occurrence but there was something odd going on. The first port of call for the developers were the log files. With several thousands transactions being performed a second there was a lot of log lines whizzing past. Panic began to creep in as it was impossible to discern what, if anything was going on in the explosion of data. I was able to walk over to my workstation and bring up VisualVM. In about thirty seconds I could see that right at that very moment we were sending a great many messages but well within the tolerances we had load tested against. I was able to use VisualVMs graphing function to track various data and within a minute or so could see that there was an unexpected correlation between two sets of events. (The number of messages sent to mobile phones and the number of identification requests made to a network component were drawing the same shaped graph, with a slight lag between the first and second sets of data and an order of magnitude difference in volume). Again these events were both within tolerances. Yes something unexpected was occurring. No it was not going to kill the system right now. We went to lunch instead of pulling the plug.

The data we collected pointed us in the right direction and we were able to find, again using VisualVM, that a database connection pool had been incorrectly set to a tenth of its intended size. The Ops guys made some tuning changes to the configuration based on what we had discovered. The application stayed up through the peak period.

In summary, log files are essential but there is still a need for real time, pre-processed data available via a easy to access channel. MBeans hit the spot in the Java world. Developers should not be scared of calculating real time statistics, like average durations, on the fly. They do need to make sure that the system does not spend a disproportionate amount of resources monitoring itself rather than delivering its function.

Concrete problems when developers opt out of TDD

We have two major classifications of automated test in common use:
  • Acceptance tests which execute against the application in its fully deployed state.
  • Unit tests which typically target a single class and are executed without instantiating a Spring container.
The acceptance tests are written in language which should make them accessible outside of the development team. They are used to measure completeness, automatically test environments and provide regression tests. Their usefulness is widely accepted across the team and they tend to be very longevid, i.e. tests that were written a year ago against a particular API are relevant today and will continue to be relevant as long as that API is supported in production. The unit tests are written by developers and will almost certainly never be read by anybody other than the developers or possibly the technical leads. I program using TDD as I find it a natural way to construct software. I personally find that the tests are most useful as I am writing the code, like scaffolding. Once the code is stablized the tests still have a use but are no longer as critical. A refactoring of the application in some future sprint may see those tests be heavily amended or retired. They are not as longevid as the acceptance tests.

I have been reflecting on the usefulness and investment in test code for as long as I had been doing TDD. I had come to a conclusion that whilst acceptance tests are non-negotiable on projects where I have delivery responsibility, perhaps unit tests for TDD are not mandatory in certain situations. I have worked with several developers who are very very good and simply do not see the value in TDD as it is contrary to their own, very effective, development practices. I know in my team right now a couple of the very best developers do not use TDD the way everybody else does. Education and peer pressure has had no effect. They are delivering high quality code as quickly as anybody else. Its hard to force them to do differently - especially when some of them pay lip service to TDD and do have a high test coverage count. I know that they write those tests after they write their code.

In the last few weeks I came across a couple of concrete examples where TDD could have helped those developers deliver better code. In the future I will try and use these examples to persuade others to modify their practice

1. Too many calls to downstream service.

The application in question has a mechanism for determining identity of a client through some network services. Those network services are quite expensive to call. The application endeavors to call them infrequently as is safe and cache identity when is is resolved. We recently found a defect where one particular end point in the application was mistakenly making a call to the identity services. It was not that the developer had made a call in error, it was that the class inheritance structure effectively defaulted to making the call so did so without the developer realizing. The identity returned was never used. I suspect that this code was not built using TDD. If it had been then the developer would have mocked out the identity service (it was a dependency of the class under construction) but would not have set an expectation that the identity service would not have been called. The use of mocks not only to specify what your code should be calling but what it should not be calling is extremely useful. It encourages that top down (from the entry point into the system) approach where you build what you need when you need it.

Its likely that the defect would never have been introduced had the developer been using TDD. As it is we have a application which is making a large number (and it is a large number) of irrelevant calls to a contentious resource. We now have to schedule a patch to production.

Coincidentally, there was an acceptance test for this service, which was passing. This highlights a deficiency in our acceptance tests we have to live with. They test the 'what' but not the 'how'. The tests were running against a fully deployed application which had downstream services running in stub mode. The test proved that functionally the correct result was returned but it had no way of detecting that an additional spurious call to another service had been made during the process.

2. Incorrect error handling

In a recent refactoring exercise we came across a piece of code which expected a class it was calling to through an exception whenever it had an error processing a request. The error recovery in the code in question was quite elaborate and important. Unfortunately, the class being called never threw an exception in the scenarios in question. It had a status object it returned which indicated if corrective action needed to be taken. (It was designed to be used in conjunction with asynchronous message queues where throwing an exception would have introduced unnecessary complexity). The developer could have easily used mock objects and set an expectation that the exception would be thrown and the problem would have remained. But, if TDD was being used and the developer was working top down then the expected behavior of the mocks would have guided the implementation of downstream classes. Nothing is foolproof but I think this manner of working should have caught this quite serious error.

More subjective problems

I have also noted two other potential consequences of having some developers opt out of TDD. I do note that some developers on the team produce code that is more complex than others. It is fine from a cyclomatic complexity perspective but when you try and understand what it is doing you find yourself with a higher WTF count than you would expect. I think (again this is subjective, I have not gathered any empirical evidence) that a lot of the complexity comes from a lack of cohesion in the code. Logic is spread around in a way which made sense to the original developer as they had internalized all the classes concerned. That logic is not obvious to a new pair of eyes. If you are using TDD then this encourages cohesion in classes because it focuses the mind on what the class is responsible for before the developer has to worry about how it delivers those responsibilities.

This is a very subjective point and I would happily agree that several of the team members who do use TDD occasionally produce nasty code. My gut feeling however, is that it happens less often.

One final problem with some of the high flyers not using TDD is that bad practices tend to propagate through the team just as quickly as good ones. I have caught a couple of new joiners following a bad example or simply not using TDD becuase the developer they look to as a mentor is not evangelizing about the technique because they themselves do not buy into the practice. This is a shame as those new joiners often have a greater need of the rigor that TDD imposes than the more experienced developers.

Anti-pattern: The release vehicle.

At my current client site you cannot get a piece of compiled code into production unless you can find an appropriate 'release vehicle', i.e. a planned high ceremony release of the component which has been officially prioritised, scheduled and funded. (Note: The same does not apply to non-compiled code such as JSPs or XML templates containing complex XPath expressions).

Somebody very clever, who probably had a beard (Grady Booch?), once said that "Regular releases into production are the lifeblood of the software development process.". I agree. My current client also seems to be in agreement but cannot extract themselves from the constraints their existing processes.

The client in question has a successful agile adoption. Walking round the development teams you see task boards, burn downs and SCRUM meetings. Go to a management meeting and you'll hear them talk about two week iterations and the importance of continuous integration. At a strategic level, the organisation (which is very large) is still waterfall orientated. This has implications for the way in which work is financed. Funds for the development, testing and deployment of a certain application are released on waterfall inspired milestones. This, in conjunction with a legacy of long development cycles has led the this 'release vehicle' anti-pattern.

The organisation has an unwillingness to make a deployment of a component into production unless there is named and funded change request which covers its release. Activities within development, possibly funded internally as 'business as usual' do not have such CRs. Therefore, a development activity such as refactoring for technical debt reduction or improving performance might get engineering buy in but will not get released into production until some CR happens to touch the same application.

It is common to see refactorings made which then sit in source control for literally months as they wait for an excuse to go live. Medium to low priority defects or useful CRs which lack very high prioritisation from marketing never get executed because the programme manager does not have a release identified for the change.

The application suite can appear inert to external parties as it takes a considerable period for changes to make it through the full release cycle. This erodes confidence. If I was a product owner and saw that a team was taking six months to execute my minor change I am not going to be inclined to believe that the same team can turn around my big important changes quickly. I am going to be looking for other mechanisms to get my changes into production and earning money quickly. Once I find a route that works I am going to keep using it.

Why do people like the release vehicle?
  • It is the way the whole software lifecycle as exposed to the rest of the organisation works. The QA team don't test a component unless they have funding from marketing. Marketing won't be paying for something that has no role in a prioritised proposition. The Operations team won't support the deployment actives for our component if they don't have the cash from the same marketing team.
  • It looks like it is easier to manage for PMs. Releases (because they are infrequent) are a big deal, involve lots of noise, planning, disruption to everyday working pattern.
  • It reduces the infrastructure costs. It costs resource to make a release unless every aspect including testing and operational deployment is fully automated (and even then there is potential cost, dealing with failures etc.). It costs resource to automate a manual build process. Engineers appreciate that fully automated build processes are a priority because in the end they reduce costs and increase agility. It is that age old problem of trying to convince not just the build team, but the build team's manager and the build team's manager's manager that it is worth diverting resource in the short term to fix a problem in order to make a saving in the long term.
** This is a symptom of our strategic failure to get agile adopted beyond the development group. Until we do so, we will continue to hit these sort of issues.

What we should do instead:

We should schedule frequent (bi-weekly, ideally more frequent) updates in production from the trunk of source control for every component. We should not need an excuse for a release. The release process should be as cheap as possible, i.e. automated build, regression test, deployment and smoke test. The code in the trunk is supposed to always be production ready and the automated tests should keep it that way.

If we achieve this we should:
  • Reduce complexity in branch management (no merging changes made months ago).
  • Avoid a massive delay between development and deployment which is not cost effective and makes support very hard.
  • Increase our perceived agility and responsiveness.
  • Enable refactoring to improve non-functionals (stability, latency, dependency versions, capacity).
  • Prevent a release from being a 'special occasion' which requires significant service management ceremony.
If you release all the time everybody knows how to release. If you release twice a year every release involves re-education of the teams involved on deployment, load testing, merging etc.. etc. This increases the cost and risk that it fails.

Note: Having frequent, regular, low ceremony releases is greatly eased by having a fully automated build and deploy process but you can have one without the other. As stated above, having such a build process makes regular deployments to production cost effective but is an enabler rather than the justification for this change to working practice.

Tale of two SCRUM stand ups

I walked past two teams doing their daily SCRUM standup today. Both teams claim to be agile. I didn't join in (even as a chicken) but just observed for a minute or so.

The first team was sitting down in a breakout area. Their body language spoke volumes. There was not one single participant maintaining eye contact with anybody else. Two people were playing on their phones. One developer had his head in his hands. Most had bored expressions. The team leader who is also the SCRUM master was the only person who spoke for the entire time I watched.

The second team was stood in a space near their desks. They were gathered round a task board which appeared to be up to date and the focus of several of the individual's updates. One person spoke at a time. Almost everybody appeared to be paying attention to whomever was speaking. Most updates were short and concise. A couple rambled on.

Other than both teams calling their meeting a SCRUM I could see no similarities.

As our agile adoption has spread beyond the original teams I suppose it is inevitable that as the experience gets spread a little thinner that people will simply label their existing activities with agile sounding names. Often we have no clear remit in those teams to supply a mentor and to try offer advice would result in rebuttal as team leaders guard their territory. Does this matter? Is there a risk that these teams who are not practicing agile correctly will diminish and discredit agile in the eyes of our programme managers? This is sounding a bit like an excuse for an Agile Inquisition going round checking that no team is using Agile's name in vain. This cannot be a good thing either.

Value and cost in hardware and software

Neal Ford's presentation on emerging architecture contained a reference to the controversial Intel practice in the 1990s regarding the 386-SX math co-pro. Intel were producing two variants of the 386 chip at the time:
  • The 386-DX which featured a math co-processor on board which allowed faster execution of floating point maths required in financial applications, graphics packages etc..
  • The 386-SX which was cheaper but did not feature the math co-processor and therefore was less 'powerful'. It had enough to differentiate itself from the previous generation of 286 chips but was regarded as distinctly inferior to its more expensive sibling.

This all sounded fair enough until it became public that the 386-DX and 386-SX shared the same manufacturing process including the construction of the maths co-processor. Where the process differed was that at some point the math copro in the SX chip was destroyed via some mechanical process. Suddenly the perception of the SX went from a lower spec product to a broken product. In Neal Ford's presentation he described the whole process as Intel selling customers their trash, like the SX was a defective unit being offloaded to unsuspecting users as a working chip.

At a very low technical level Neal's statement is true but not at a commercial or practical level. Intel engineers were given a change in requirements by their marketing department: Produce a chip that is going to enter the budget market to complement but not fully compete against the 386-DX. They looked at their system and determined the most efficient way to achieve this end was to 're-configure' existing 386-DX chips. This was likely much much cheaper than setting up a whole new production line and testing the brand new chip it produced. To do otherwise would be against the pragmatic engineering ideals that Agile is supposed to champion. If you flip the argument around and say: Should we redesign from scratch the entire process to achieve the same end result but at much higher cost just so that we claim that the chip contains no parts that it doesn't need. Maybe we find this so objectionable because a chip is a tangible entity and we are used to associating (rightly or wrongly) the cost of raw materials and manufacturing with the value of such items. Maybe we don't factor in the cost of design and marketing, which I suspect are massive for a complex, consumer electronic product like a cutting edge CPU.

This pattern raised its head again, but with less fuss, a couple of years ago when HP shipped high end servers with multiple CPUs. Some of the CPUs were disabled upon delivery. If the customer's processing requirements increased over time (which they always do) then they could pay HP who could then remotely enable the additional processors without the customer incurring the cost of an engineer on site, downtime for installation etc. etc.. Again, this early step towards todays processing on demand cloud computing concept raised some eyebrows. Why should customer's pay for something that was going to cost the supplier nothing? Again, this is a preoccupation with the physical entity of the manufactured component. If the additional CPUs had been sitting idle in a HP server farm rather than at the customer's site and purchasing them involved work being sent across the network my suspicion is that nobody would have had any objections.

We use a UML design tool at my current client site called Visual Paradigm. It has a number of editions, each with a different cost. It has a very flexible license purchase system which we have taken advantage of. We have a large number of standard level licenses because the features that this edition gives will support most of our users most of the time. Occasionally we need some of the features from a more expensive edition. Its not that only one or two individuals require these features, we all need them, just very occasionally. The Visual Paradigm license model supports this beautifully. We have a couple of higher edition licenses. On the rare occasion that users need the extra features they just start the program in the higher edition mode. As long as no other users connected to our license are using the higher edition at that time, there is no issue. The similarity with the examples above is that there is only one installation. We don't need to install a different program binary every time we switch edition. We love this as it makes life easy. I am sure Visual Paradigm like it as well as it simplifies their build and download process.

Two me the two scenarios, software and hardware appear pretty much identical. Everybody appreciates that the cost of creating a copy of piece of software is so close to zero that it is not worth worrying about. Therefore we don't mind when a supplier gives us a product with bits disabled until we make a payment and get a magic key. It's harder to think of hardware in the same way, that the build cost doesn't matter. That there might be no difference in manufacturing costs for two products with very different customer prices. The cost of delivering the product, like delivering software, includes massive costs which have nothing to with creation of the physical artifact.  

Maybe this wasn't the point in the above presentation but I guess the thing that startled me was that my natural inclination was to immediately associate the value with the tangible item. In my head this is all getting mixed up with free (as in speech) software and the idea that it is unproductive and unethical to patent / own / charge for ideas.

Agile 2009

I have spent the week at the Agile2009 conference in Chicago. This annual conference, now in its eighth year, is the premier international gathering for agilsts. It caters for a whole spectrum of experience from new comers to the discipline to the gurus who are leading the way.

I attended some very thought provoking sessions as well as presenting my own experience report on techniques for technical architecture in an agile context. My colleague from Valtech US, Howard Deiner, battled hardware and network issues to present a well received demonstration of Continuous Integration. Both sessions got reasonable attendances in the face of stiff competition from competing presentations being held in parallel. Both received very positive feedback from attendees (mine scored 80% for command of topic and 78% overall). I got lots of positive feedback for my session in conversations with conference attendees throughout the week. This was very much appreciated.

My presentation is backed up by an IEEE Report which was published in the conference proceedings. The report's premise is that incumbent waterfall software development processes force technical architects into a position of isolation and ineffectiveness (the ivory tower). The challenge I (any many many other TAs) have faced is how to deliver the guarantees of technical correctness and consistency that clients (especially those moving from waterfall to agile) demand when some of the most widely used conventional techniques for architecture have been discredited. I am thinking primarily of the emphasis placed on up front detailed design and architectural review.

The report details architectural problems during scale up of a previously successful agile project. The report then describes and evaluates a number of techniques employed on the project to deliver the technical architecture without ascent of the ivory tower. The conclusions include the justification that documentation is not an effective tool for technical governance and that the architect must target activities which bring them closer the the actual implementation. This mirrors Neal Ford's point in his Emerging Architecture presentation that we need to accept that the real design is the code, not the summaries and abstractions of the code presented via the numerous tools (UML, narrative documents, whiteboard sessions) at our disposal. Other conclusions include the identification of automated tests as an architect's, not just a tester's, most effective tool for delivering a correct solution. The paper also identifies that soft skills around communication and people management, often the anathema of the conventional architect are critical to success. Finally the report concludes that utilizing the most cost effective techniques (rather than just the most technically powerful) were key. (That does not mean you cannot justify the use of expensive techniques, just that they may only be justifiable on the most important components in the system).

Agile 2009 was a great balance of real world experiences (such as my session) and more philosophical, academic sessions. There also the chance to listen to some insightful keynotes and take part in some exciting expert sessions which challenged the way we work. It is always easier to learn in a community of professionals with real experience and this was definitely the case at this conference. I learned as much over dinner and in break out sessions as I did in the formal seminars.

I am going to blog what I learned in some of sessions in the next couple of days, possibly earlier as I am stuck at Chicago O'Hare for eleven hours after a 'mechanical issue' with our plane!

Use of best of breed open source

Over the last two years of my current project I have noticed a recuring pattern. On several occasions we have identified an implementation pattern which commonly appears on many enterprise projects. That pattern is common enough that there is a well known (i.e. at least one person in the team has heard of it) open source solution which appears to be recognised by the community as being best of breed. In order to reduce risk and increase our velocity we use that open source component, possibly making changes to the design to more effectively incorporate the ready to run package. The theory (and one I fully buy into) being that be using the open source library we free up time to concentrate on the part of our solution which are truly unique and require bespoke software.

The recurring pattern I see is that on at least four occasions the best of breed package has proven to be severely sub-optimal. What is worse is that most of the time these deficiencies occur when we move into high volume load test in a cluster. It seems only then that we discover some limitation. Typically this is caused by a particular specialism required for our application which then exercises some part of the library that is not as commonly utilised as others and therefore less stable. Some times the limitation is so bad that the library has to be refactored out before launch and other occasions the issue becomes a known restriction which is corrected at the next release. All of the significant refactorings have involved replacement of the large, generic, well known library with a much smaller, simpler, bespoke piece of code.

I am undecided whether this is a positive pattern or not. On one hand using the standard component for a short period helped us focus on other pieces of code. On the other, the identification of issues consumed significant resource during a critical (final load test) period. The answer probably is that it is okay to use the standard component as long as we put it under production stresses as quickly as possible. We then need to very carefully take account of the effort being consumed and have an idea of the relative cost of an alternative solution. When the cost of the standard component begins to approach the cost of the bespoke then we must move swiftly to replace it. The cost should also factor in maintenance. We need to avoid the behaviour where we sit round looking at each other repeating "This is highly regarded piece of software, it can't be wrong, it must be us." for prolonged periods (its okay to say this for a couple of hours, it could be true). I used to work for a well known RDBMS provider. I always felt that the core database engine was awesomely high quality and that anybody who claimed to have found a defect was probably guilty of some sloppy engineering. I knew however, from painful experience, that you did not have to stray far from the core into the myriad of supported options and ancillary products to enter a world of pure shite. The best of breed open source components are no different.

Some of the problem components:

ActiveMQ (2007) - We thought we needed an in memory JMS solution and AcitveMQ looked like an easy win. It turned out that at that release the in-memory queue had a leak which required a server restart every ten to fifteen days. It also added to the complexity of the solution. Was replaced by very few lines of code utilising the Java 5 concurrency package. I would still go back to it for another look, but only if I was really sure I needed JMS.

Quartz (2007) - The bane of our operations team's life as it would not shutdown cleanly when under load and deployed as part of a Spring application. Replaced by the Timer class and some home grown JDBC.

Quartz (2009) - Once bitten, twice shy? Not us! The shutdown issue was resolved and we needed a richer scheduling tool. Quartz looked like the ticket and worked well during development and passed the limited load testing we were able to do on workstations. When we moved up to the production sized hardware and were able to put realistic load through we discovered issues with the RAMJobStore that were not present with the JDBC store (which we didn't need). It just could not cope with very large (100 000+) numbers of jobs where new jobs were being added and old ones deleted constantly.

Security compliance without empirical evidence

As the project nears the final delivery I am having to complete a statement of compliance for group security (did you feel a shiver as you read that, it was justified). One of the values I have tried to instil is that we don't do any documentation or formal design with no clearly defined audience. When we do identify a subject that does need to be formally recorded I am keen that it is done well. The interactions between components OAUTH  is one of those few key areas.

The OAUTH sequence diagram was correctly checked into the UML repository and was pretty good. Looking at it I was suddenly struck but a deep sense of unease. How was I supposed to know whether the implementation sitting on our servers bears any relation to the work of art being displayed on my screen? What value is my statement without real knowledge that we are secure? I know this is is something I have known for years and bang on about to anybody who will listen but it was a startling moment to be sitting there looking at the design and being asked to make a formal statement about its realisation without empirical evidence. I already knew from an audit of the acceptance test suite (end to end, automated, in-container tests) that one of the omissions was anything that exercised OAUTH. I decided that one of my priorities for tomorrow will be the completion of that test and that I wont be making a statement of compliance without it.

Valtech Blog is live

The Valtech blog is live, yay! It incorporates posts from various Valtech consultants and covers all things Agile and software development in general. My submissions are featured so no more swearing...

SpringSource in the cloud?

SpringSource has been acquired by VMWare!

Interesting that VMWare paid more for SpringSource than Redhat paid for JBoss, even in the midst of the recession.

Spring has got to be the tool of choice for the development of enterprise Java applications right now. I wonder if in the near future deploying Spring applications to the VMWare cloud (or using Spring beans already deployed in the cloud) will be as easy as deploying to Tomcat or Jetty?

Professional Services Day

We had an excellent professional services day today. Rather than the previous format of set talks and dry presentations the organiser used the Open Spaces pattern. He drew up a grid with time slots and rooms then invited the attendees to write a single topic suggestion on a card and out it in a slot. When everybody who wanted to put a card in then he invited a second and then a third round.
The result was a suprisingly large number of different sessions. I was able to make it through the day without having to sit through anything dull or non-applicable. The large number of sessions meant that the groups were small and everybody got to interact. Good stuff!

Dangers of executable examples over dry documentation

My team is responsible for several web service APIs that are used by external parties. The standard pattern for educating users about these interfaces was to provide an API document which listed the calls possibly with the occasional example. I have always found this approach to be less than useful and not particularly accessible.

I prefer an approach where the API specification is very much example based. This is makes more sense for RESTful web services than a programmatic API as there is no accepted mechanism for declaring such an interface. The same is not true if you are declaring, for example, a Java interface.

We supplement the document with JMeter projects pointed at a reference system which allow these interfaces to be exercised by the consumer and provides a step by step demonstration.

This has been an excellent mechanism which has resulted in very quick uptake of our interfaces but yesterday I was provided with an example of a potential pitfall:

We provided an API which upgraded widgets. Version was supplied with the manifest of the software package being supplied and was used again in the query string to the upgrade service. The reference data supplied by the system put the current version into a title metadata field so that testers could easily see that the upgrade had been formed. The version in the title matched exactly the title in the manifest.

Weeks after delivery and demonstration it stopped working. Why? The client was using the title rather than the metadata it was supposed to. Since we had used the same value for both the end to end use case appeared to work. As soon as they started using real data where the title was no longer equal to the version everything broke.

A valuable lesson here for me to relearn (I have hit it before in similar contexts). Always make sure reference and test data attributes are not interchangable with each other without an error occurring (and being detected). I should have made the title attribute 'The version of this widget is <x>'. That would have steered the client developer away from error and would have caused a validation error then first time he tried to submit it to our application.

In container vs out of container testing

Ever since I started with Test Driven Development back in early 2004 I have always advocated out-of-container testing and regarded any test scheme that requires deployment before execution as flawed.
I still think this is largely true and there are many many good reasons why having a fast, automated and comprehensive test suite which runs without going near the full stack is essential.
I have come to believe that this is not the whole story though. Testing by humans is an expensive and time consuming activity, often executed poorly and with debatable results. Often the best thing you get out of a test team is somebody to blame if it doesn't work in production. That sounds harsh but how many times have you heard the developments team response to a defect be 'what the hell were the testers doing for two weeks?'?
If you really want to be able to remove humans from the equation then unless you are very confident that nothing in your execution environment can affect the functionality of your software then you need to create automated tests which run against a deployment in-container.
The added advantage to in-container tests is that they can be used to seed load tests which reflect the user stories being delivered rather than some cobbled together random sample by the expensive performance consultant the week before go live.
We are writing in container tests for a RESTful web service. The tests are developed using a home rolled framework designed to be as readable as possible to the test and analysis teams. The tests are driven by the user stories being delivered and are very easy to validate against the API documents we publish.
Tags :

Death by developer

If you commit suicide by forcing a police officer to kill you, this is Death by cop . We have some stakeholders external to the team who seem committed to performing death by developer. Just because a project is agile and adaptive to change doesn't mean that stakeholders can keep on changing their minds right up to the day before a go live without any impact on delivery...

Removing database dependancy from tests

At a recent internal tech day a project I had previosuly been involved in made a short report about some serious issues they were having with their test suite. They made heavy use of DBUnit to create a complex data set in an Oracle database. This was being setup and torndown for a great many of the tests. The result was the CI build could not run the test suite automatically. Instead the lead developer would manually initiate a test run overnight (I believe it was actually taking the entire night to run). Various of my collegues had had similar problems and suggested various solutions that had been succesful on their projects. These included the use of a dedicated super fast integration test machine and a the running of tests in parallel (hard to do for database dependent tests - there is often contention on the number of database connections allowed and any tests which extend beyond transaction boundaries interfere with each other).

The solution employed on my project (and advocated in many places, see links at bottom) successfully overcame these issues as is as follows.

On my project, we have a large set of unit tests (close to 2000 tests). These are pure unit tests in that there is no container (not even spring) and everything is mocked other than the class under test. There is no database dependency. These tests execute in under 5 seconds on our development PCs (which are dual quad core ubuntu boxes). We also have a smaller, but very important, set of integration and regression tests which execute inside the spring container and some use as much of the application stack as is possible. Only external system dependencies are mocked or stubbed. These tests did, for the first year of the project, use the database.

What we found (and I experienced this on the above troubled project as well) was that these tests were too fragile to be included in the automated build. They execute too slowly, cannot be run concurrently (as they access the same db resource) and if the build system dies it can leave a messy database which can be difficult to clear up. Even after several attempts to build robust setup and teardown scripts we still could not get integration tests that were reliable enough to run automatically. An even bigger problem was that the integration test suite would not execute on the build machine in less than half an hour. We build every time somebody commits and track failures religiously. Individuals are pressured never to break the build. A thirty minute build is unacceptable for us. This meant the integration tests were only run manually and with the best will in the world, people forgot to run them before checking in and they very quickly started showing false negatives. The broken window syndrome kicked in and the set of working integration tests diminished rapidly. We wasted a lot of time fixing the test suite in the first 12 months of the project.

I did some analysis of defects and was able to highlight to the client that some serious defects had made it out of development and cost serious time and effort in system test and UAT because of the lack of end to end test coverage which should have been provided by the regression test suite. Using this an ammunition it was easy to justify spending some time finding a solution. There is a well understood solution to this sort of issue and it involves the use of an in-memory hash table database replacement. I got one of the developers to implement a version of the DAOs which wrote objects to a hash table instead of the real database. As far as the users of the DAO were concerned, the system was working as normal. We already had some specific integration tests which did test the DAO database implementation so we were very confident they worked and did not need to be tested all the time. The hashcode database is easy to setup (we always had a rule that all non DAO tests had to perform database setup using the DAO methods rather than using a script solution like DBUnit, which I dislike). Tearing down the hashcode DB is very very easy. The regression test suite which was so slow it was unusable when hooked into Oracle now executes in 12 seconds. It has no external dependencies and so has been included in our automated CI build with great success. The DAO implementation test suite continues to be on a manual basis. Come to think of it, I haven't run it for a few weeks so I bet it's broken!

I should stress as well that we partitioned the tests from day 1 (unit, acceptance, externally dependent, integration and regression). The unit tests were always fast and we never had to disable them in the CI build.

This pattern does not work so well if you have data in your database which gets there without going through your DAO layer (e.g. reference data loaded through scripts, bulk loads via SQLLDR). This means you do not have a ready made way of creating all test fixtures. I would advocate the creation of a test helper DAO in this case to create those additional data structures that you require.

See this book for details of these approach.

Distractions

We are still getting distracted from our main task. Two of the four developers have spent more than 50% of their time looking at a production memory leak. I am spending a lot of time doing interviews and attending meetings. All of it is important but it is impeding progress badly.

Welcome to the waterfall

We have made some progress in the last week of development but it has not been as I envisaged. We have spent a lot of time talking through the use cases and that has helped the dev team understand some of the issues facing us. Those use cases have been typed up using our standard template but they lack structure.

I traveled to overseas for a technical alignment meeting. 05:30 start. Back home at 21:30. Shattered. All for thirty minutes where I stepped through one of the sequence diagrams. It was helpful as for that one use case we have agreement on the design. Half an hour for an extended day (which took me another day to recover from) is not startlingly efficient.

What we have failed to do is produce the product backlog. Instead we have identified some massive issues around complexity and spent days trying to thrash them out. This has also involved presenting back to the stakeholders and trying to educate them on the relative cost in complexity in comparison to the possibly limited returns in functionality. We failed to make any impression and were told that they would reconsider but the the requirements would stand unchanged. Not sure who is doing something wrong here.

We are now slotting back into the feasibility process that the clients financial processes mandate. We are producing very short summaries of the options, impact on required functionality and potential risk (expressed as a percentage). We are doing this for the entire scope of the project (9 months). This does not feel agile at all as we are having to sketch out designs for the entire system. We cannot get round this as completion of the feasibility is a payment milestone.

I am also finding that the feasibility document in its current form has been overloaded with very detailed design narrative. The narrative captures the developers thought process through out the exercise. It makes the classic mistake of documenting parts of the solution we have since discounted because the author has a significant emotional investment in those words and cannot bear to see them all deleted. The detailed design narrative also needs to go (hopefully). I am thinking of it as necessary scaffolding for the developers thought process. It helped them organize their thoughts and was useful. Now it needs to be binned as it is already out of date and has no audience.

Technical interviews

Executing a decent technical interview is hard. Often you only have an hour or less with a complete stranger to somehow assess technical skills which might take days or even weeks to fully appreciate if you were working with them on project. The urge to make a snap decision then apply post rationalisation can be strong. Its very easy to come up with some excuse and kick a candidate out if you have any doubts. It is better to kick someone out at interview stage rather than hire them then kick them out when they cannot deliver and the damage to both parties is significant. Conversely, every interviewee should be considered an opportunity. You might be talking to the next rising star who helps make your team a success. You should try not to blow it. Candidates should be respected at the very least. They have taken the time and are under going the stress of an interview. Care should be taken not to humiliate them.

I have been dropped into many technical interviews with little or no notice or chance to prepare. I used to find myself falling back into a technique I have been on the receiving end of many times during interviews: Experience roulette. You look at the candidate's CV. You pick a project that sounds similar in technology or problem domain to one you have previously experienced. You feel that you could ask a fairly detailed technical question on this subject that a candidate with good understanding could answer. You ask the question. After a few minutes you realise that the candidates experience is very different to yours for this particular topic. You can't tell whether he is talking garbage or not. All you have proved is that you have different experiences.

Sometimes, and I try very very hard not to fall into this trap, interviews turn into an experiment into ESP. I.e. the interviewer thinks of something really hard, asks a couple of questions and drops a very vague clues. The candidate is expected to read your mind and reel off the exact solution that you thought of. I have seen this done so many times. It is embarrassing if you find yourself paired with a colleague who thinks this is a great way to evaluate another human beings reasoning skills. I have been in situations where fellow interviewers pick up some dusty technical book they have had on their desk for a year (probably as an attempt to convince passers by that they actually read technical books or have any idea about the subject matter). They pick a random page, make a few notes, then wander into the interview and trot out a question based on the summary of the section and expect to get back a verbatim response. E.g. (this really happened), "What is the most important concept in Object Orientated programing?" they will ask. The candidate considers and gives a reasonably considered personal opinion. The interviewer shakes his head sadly and motions to his notes which indicate that polymorphism was considered to be the most important feature in the section of the book he was reading. When this happened I asked the interviewer "Why is that?" and failed to get a coherent answer. No party walked away from that interview with a warm feeling.

I have sat in interviews where the interviewer gets a real kick out of asking the candidate a question they cannot answer or tells them their own solution to the problem and presents it as being far superior. I have noticed that this is most common when someone senior is present. When this has happened to me I just feel like these losers are wasting my time.

Interviews are a two way street. Sure, one of the lanes is much busier than the other but the interviewer should be selling their company and their role, not using the process as an excuse for legitimised bullying.

These days I try and follow a simple formulae which works for me. It might not work for everybody and is certainly not applicable for every technical role in the world, but at least it gives both parties a fighting chance of exchanging some pertinent information.

Before we start, I make sure I know how to pronounce the candidates name, that they have been offered a drink and have had an opportunity to use the bathroom. If at all possible I get a quiet room where we will not be interrupted. If it has a white board all the better. I have interviewed and been interviewed in noisy coffee shops or the like. It looks unprofessional and is not conducive to a good candidate performance if they are self consciously shouting down the rest of the customers.

When we get started firstly I briefly describe the organisational unit or project and my role within it. I do this to give the candidate some context in which to present their experience and so that they know at what level of technical detail they should be pitching their answers. I want to hear concrete, well considered responses not meaningless buzzwords or over inflated project experiences. Letting the candidate know this (without being rude) is sometimes necessary. The temptation is to spend too long setting the context. Often I have seen colleagues (I am sure I have done it myself) spend more than half the allotted time talking about their pet project or favourite rant. I try and keep this section to no more than ten minutes in an hour long interview.

Secondly I ask the candidate to talk about any experience they have that they feel is relevant. I warn them that I may interrupt and ask for clarifications or further explore some interesting aspect of what they are saying. Under normal circumstances I hate interrupting people but when you have a very short time in which to extract the relevant information being polite serves no ones best interests. Sometimes a candidate gets over defensive or dwells too deeply on a point that is not communicating anything of interest to me. I try and gently move the conversation along, sometimes pointing out my objectives for the interview and the time left. There is a risk of experience roulette here but I find just letting candidates talk about what they think is important is often very illuminating.

Finally, and I try and allow enough time for this section, I like to do something practical. Depending on the role this can be a white board based design exercises or a pair programming experience around a laptop or what ever seems appropriate. I have prepared a set of canned exercises which I have refined over many executions.

If interviewing a candidate for the consultancy I work for then usually they will claim to have Java, Test Driven Development, Eclipse and similar sorts of skills. When they do this I feel empowered to sit them down in front of a laptop and have them work through some artificially simple problem. Often I find that if I type and they tell me what to do they do better and get more out of the exercises. It can be a nasty shock to try and use somebody else's development environment with only a couple of minutes preparation time. Role playing a pair programming scenario can be a way to reduce this shock. Often the exercises I set have tests included. Sometimes those tests fail and this gives a clue to one of the possible solutions. I should stress I try and use the exercise as a vehicle for the candidate to demonstrate they way they think and approach the problem rather than expecting them to finish the code. My favourite exercise is based around the Money pattern and allows a candidate to demonstrate good O-O design, TDD and hands on problem solving. I have used it maybe fifteen times. Only one person has actually got the code working and ticked all the possible boxes. Several people have demonstrated excellent skills without getting close to any sort of completion and gone on to be excellent team members. A few people have been thrown completely and failed to make anything of the exercise. This is a shame for both of us as I then have no positive data to base my decision on and they fail to progress. Occasionally it exposes a complete bullshit artist who has blagged their way through to this interview stage. This happened on one memorable occasion where I was doing the third and final interview for a permanent hire. The guy had got through an HR screening interview and impressed a fellow consultant. I had found him hard to pin down in the first sections of the interview but had been generally impressed with his broad experience. We started the practical and it became very clear that he did not have a clue about the tools that he claimed to be expert in and use on a daily basis. After a few minutes fumbling he refused to continue. I terminated the interview and saved myself an hour of my life.

Very often I interview for senior architect roles on behalf of clients where candidates do not have the hands on coding experience and will not be expected to 'get their hands dirty'. In these situations I have a couple of role play scenarios where I describe an ambiguous use case, sketch out some boxes that represent consumers and providers of services and leave a big white space in the middle. I invite the candidate to fill the white space with their solution. I pretend to be a technically vague marketing representative or a shoot from the hip coder (sometimes we interview in pairs and we take one role each). I ask them to fill in the space and inject requirements if I am the marketeer. If I am playing the coder then I try to ask all the questions a coder would ask if they wanted to realise the design the architect is describing. If the candidate is missing something or sometimes if they appear to be having too easy a time of it, I suggest incorrect or over elaborate solutions and wait to be shot down. I keep on going back to the original requirements to make sure they have been fulfilled.

Getting the scenario right for these exercises is tricky and I have to admit to significantly refining the problems over time. Getting the balance between a problem that consumes too much time and a problem that is over simplistic is a challenge. The best problems I have are based on real project experiences where I can add elements to them in order to increase the complexity if the candidate is doing well.

I have found that you can have a run of candidates who make a real hash of these practical problems. You begin to doubt the validity of the exercise and start to reassess candidates you at first dismissed. In my experience usually the run of dross is followed by a few stars who re-affirm your faith in our profession and the exercise. Whether this is just luck of the draw or reflects the HR or agency reacting correctly to feedback on candidates I do not know (it should be the later but I suspect the former in many cases).

I find the practical exercises useful and actually enjoy watching people solve the puzzles, especially if they do so with panache. At the very least doing a practical exercise gives you concrete evidence to support your decision if you are called to justify it.

These experiences are mine and may not work well for everybody. I have found them useful and have consistently hired good people. I find that this approach delivers much better results than quizzes of programming language constructs or ad hoc conversations around project war stories. Then again, maybe I am still post rationalising and have simply constructed a process that favours people who appeal to me...
Tags :