Testing and concurrency

Aug 24, 2010


Our team is currently working with a client on a medium sized, medium complexity Java application which has quite low test coverage. We are introducing characterisation tests
to snapshot functionality. These will give us the confidence to refactor away technical debt and extend the application without regression. One of the problems we are experiencing is the concurrent nature of the application. I have have worked on applications in the past which supported very high concurrency without issue but this application is different. I have not fully thought through why this application does differ but there are some obvious points:

  • This application spawns threads in Java code a lot. In previous applications we have always avoided this complexity by utilising somebody else's thread pool code.
  • I am used to stateless service classes which operate on domain objects. The stateless service classes obviously have no concurrency issues and the domain objects can be protected using synchronisation blocks. This application seems to have a lot more stateful objects that interact (this is anecdotal, I am have not analysed the code specifically for this attribute).

One of the first refactorings we are looking at is to remove all the Thread.sleep calls from test classes. The CI server reports significant number of test failures which turn out to be false positives. In a significant number of cases the use of Thread.sleep is to blame. I have seen two slightly different uses of Thread.sleep in the test code.

  1. The test spawns a thread which is calling some method of the class under test whilst the main test thread interacts with the class under test in some other way. The test thread calls Thread.sleep to ensure that the second thread has time to complete its processing before the test verifies the post conditions.
  2. The class under test contains some internal thread spawning code. The test thread again needs to execute a Thread.sleep to remove the chances of a race condition before firing the asserts.

Both these approaches suffer from the same problems.

  • The Thread.sleep might be long enough to allow the second thread to complete processing on one machine (e.g. the developers high spec workstation) but it is not long enough to allow the thread to complete its processing on a heavily loaded, differently configured, usually more resource constrained CI server. Under certain load situations the test fails. It works in others. The use of Thread.sleep has made the test non-deterministic.
  • Often the response to the above problem is to make the sleep longer. Yesterday I saw a very simple test which took over thirteen seconds to execute. Most of that test duration was sleeps. Refactoring to remove the sleeps resulted in a test that executed in 0.4 seconds. Still a slowish test but a vast improvement. The last application I worked on had 70% coverage with 2200 tests. If each one had taken thirteen seconds to execute then a test run would have taken almost eight hours. In reality that suite took just over a minute on my workstation to complete. You can legitimately ask a developer to run a test suite which takes one minute before every checkin and repeat that execution on the CI server after checkin. The same is not true of a test suite that takes eight hours. You are probably severely impacting the teams velocity and working practices if the build before checkin takes eight minutes. There are very few excuses for tests with arbitrary delays built into them.

To resolve both issues we introduce a count down latch.

Where the test spawns a thread, the latch is decremented inside the spawned thread and where the test code had a sleep a latch.await(timeout) is used. We always specify a timeout to prevent a test that hangs in some odd situation. The timeout can be very generous, e.g. ten seconds where before a one second sleep was used. The latch will only wait until the work is done in the other thread and the race condition has passed. On your high spec workstation it might well not wait at all. On the overloaded CI server it will take longer, but only as long as it needs. A truly massive delay is probably not a great idea as there is a point where you want the test to fail to indicate there is a serious resource issue somewhere.

Where the class under test spawns a thread (an anti-pattern I suspect) then we amend the code so it creates a latch which it then returns to callers. The only user of this latch is the test code. Intrusive as it is, it is often the only way to safely test the code without more significant refactoring. 

There are some larger issues here. Is the code fundamentally wrong in its use of threading? Should it be recoded to use a more consistent and simple concurrency model and rely more on third party thread pool support?

At risk of straying from my comfort zone of simple, pragmatic, software delivery, deep down, I have never been very happy about the implications of complicated multi-threaded code and automated testing. You can write a class augmented with a simple and straightforward test class which verifies the classes operation and illustrates its use. You can apply coverage tools such as Emma and Cobutura which can give a measure of the amount of code under test and even the amount of complexity that is not being tested. I am not convinced it is always possible to write simple tests that 'prove' that a class works as expected when multiple threads are involved (note I say always and simple).

I do not know of any tools that can give you an assurance that you code will always work no matter what threads are involved. Perhaps a paradigm shift such as that introduced by languages such as Scala and Erlang will remove this issue?

There is some good advice available regarding testing concurrent code and I am sure lots of very clever people have spent lots of time thinking this through but its certainly not straight in my head yet.

tags:

Three Peaks

Jul 5, 2010

This weekend I did the Three Peaks challenge. At six o'clock on Thursday I dashed round the corner and jumped into a Ford Galaxy with three other dads from Maple School. It wasn't a clean exit as the kids came home with Joanne just as I ran out of the door. Jamie was really upset apparently as I didn't get the chance to say goodbye. Helen just missed me and she was also upset.

We flogged it up to Warrington where we stayed in a Travelodge with adjacent truck stop. We had our dinner in the self service food hall followed by a lager shandy in the truckers bar. It was pretty grim. We were the only drinkers not wearing a corporate trucking uniform and in the small minority without massive beer bellies. One of our guys was wearing sandals and another ordered a G&T. I have probably watched too many films but I did worry that the truckers might abduct us as this point to use as their playthings.

The next day our driver got us up to Glasgow where we picked up our fourth team member and de facto leader. We then made our way to Ben Nevis, stopping briefly for a very tasty burger at Loch Lomond and again for some heavy traffic. We got to Ben Nevis just before half five. We spent a few minutes preparing and then set off dead on 17:30.

It had been a cracking day weatherwise and I had worn my sunglasses at Loch Lomond. Ben Nevis doesn't work like that so as soon as we started it began to rain. We made good progress on the ascent but by the halfway mark is was very poor visibility and stinging rain. We made it to the top in two hours by which point my fingers were so numb I couldn't operate my camera. I had failed to pack a hat or gloves (it had been an uncomfortably hot week). We rushed back down the mountain with only a minor pause when I slipped and banged my knee. The pain was intense and I really thought my race was over but after a few minutes it subsided and I was able to continue. We made it back down very quickly in an hour and a half, catching the other two teams who had a half hour head start on us!

At the bottom of the mountain I had to take all my clothes off (by the side of the road) as I was soaked and then we all piled in to rush down to Scarfell Pike. It was damp and unpleasant and my knee was pretty painful. I applied ibuprofen gel, freeze spray and deep heat but it kept on waking me on the long drive to Cumbria.

We got to Scarfell about four am and found no parking spaces. Our driver stuck the car at the side of the road and we had a brief (and for my part, unpleasant) breakfast. The other Maple teams started at least ten minutes ahead of us. We then rocketed up Scarfell which was very busy and compared to Nevis, very easy. It was still cold and unpleasant on top but I never even put my waterproof jacket on. We messed around for a few minutes on the peak taking photos and then were off.

We got to the bottom about ten past eight in the morning and headed off to Snowdon in Wales. Our driver did not get a lot of sleep (if any) during these breaks. He must be a machine as his driving was calm and accurate with great navigation throughout.

We got to Snowdown about 12:30 after a delay caused by an accident which forced us onto back roads. It was quickly up the Pyg Track to the summit. The last part of the ascent on the Zig Zag was pretty exhausting and then it was straight down again via the Miners Track. I found going downhill, clambering over stones very hard going on my knees and my guts by this point. We got to the flatish section of the Miners track and from there it was easy. We romped in at 23:13:44, beating the other two Maple teams whom both came in within the twenty four hours.

It was an excellent and slightly disorientating experience that I am not sure I would rush to repeat but I am glad I did it.

I slept like a log Saturday night and then went up the Miners Track again with Helen and the kids (and a large group of others from Maple). This time the weather was fowl. We were soaked to the skin and the winds were gusting at 80mph in the valley. We made it to the second lake but were forced back. Edith and Tom were both screaming and it was impossible even to see the rain was so hard. I was still damp six hours later when we finally got home!

tags:

Importance of real time performance monitors

Mar 17, 2010

Decent log entries are essential. On our current project we aim to write enough data to allow off-line analysis of performance and usage plus errors. I am also an advocate of more immediate and accessible runtime information. Log analysis is great but sometimes you need empirical data right away. On our current project we use the Java MBean facility. These MBeans can be easily accessed in a graphically rich way using tools like JConsole or VisualVM.

We have a couple of different types of analyzer which we expose through MBeans. One simply records how many times an event has occurred in a short time period. Another calculates an real time average, again across a short time period. For example, we have analyzers which record the length of time it takes to make a call to a particular downstream application. Each duration is recorded and an average over the last ten seconds is reported via the MBean. This calculation has been implemented to make it very efficient from a CPU perspective since 99.999% of the time the average discarded before anybody bothers to look at it. Originally we were only using two or three of these average analyzers in the system. As developers found them useful they were placed around every single external interaction and we suddenly found ourselves with several thousand per application. These used about 25% of the heap and consumed significant CPU resource. The analyzer was then optimized and now consumes negligible resources.

I have been personally a little disappointed that our operations team have not made as much as of this facility as I expected. They are happy with their existing log analysis tools. As a team, we have questioned whether our investment in MBeans is worthwhile. We concluded that it was as even though the Ops team don't use it in Production the development group rely on the data exposed through JMX for trouble shooting, especially in system test, monitoring load tests and as a quick way to gauge the health of Production.

Last week I was reminded again how useful this immediately accessible data was. After a system restart Production was doing 'something funny'. We had ambiguous portents of doom and various excited people considering some fairly drastic remedial action, including switching off a production system which was serving several thousand users. The fear was that something in the affected system might be placing unbearable demands on downstream applications. This seemed unlikely as we have many layers of throttles and queues to prevent just such an occurrence but there was something odd going on. The first port of call for the developers were the log files. With several thousands transactions being performed a second there was a lot of log lines whizzing past. Panic began to creep in as it was impossible to discern what, if anything was going on in the explosion of data. I was able to walk over to my workstation and bring up VisualVM. In about thirty seconds I could see that right at that very moment we were sending a great many messages but well within the tolerances we had load tested against. I was able to use VisualVMs graphing function to track various data and within a minute or so could see that there was an unexpected correlation between two sets of events. (The number of messages sent to mobile phones and the number of identification requests made to a network component were drawing the same shaped graph, with a slight lag between the first and second sets of data and an order of magnitude difference in volume). Again these events were both within tolerances. Yes something unexpected was occurring. No it was not going to kill the system right now. We went to lunch instead of pulling the plug.

The data we collected pointed us in the right direction and we were able to find, again using VisualVM, that a database connection pool had been incorrectly set to a tenth of its intended size. The Ops guys made some tuning changes to the configuration based on what we had discovered. The application stayed up through the peak period.

In summary, log files are essential but there is still a need for real time, pre-processed data available via a easy to access channel. MBeans hit the spot in the Java world. Developers should not be scared of calculating real time statistics, like average durations, on the fly. They do need to make sure that the system does not spend a disproportionate amount of resources monitoring itself rather than delivering its function.

tags:

Concrete problems when developers opt out of TDD

Feb 27, 2010

We have two major classifications of automated test in common use:

  • Acceptance tests which execute against the application in its fully deployed state.
  • Unit tests which typically target a single class and are executed without instantiating a Spring container.
The acceptance tests are written in language which should make them accessible outside of the development team. They are used to measure completeness, automatically test environments and provide regression tests. Their usefulness is widely accepted across the team and they tend to be very longevid, i.e. tests that were written a year ago against a particular API are relevant today and will continue to be relevant as long as that API is supported in production. The unit tests are written by developers and will almost certainly never be read by anybody other than the developers or possibly the technical leads. I program using TDD as I find it a natural way to construct software. I personally find that the tests are most useful as I am writing the code, like scaffolding. Once the code is stablized the tests still have a use but are no longer as critical. A refactoring of the application in some future sprint may see those tests be heavily amended or retired. They are not as longevid as the acceptance tests.

I have been reflecting on the usefulness and investment in test code for as long as I had been doing TDD. I had come to a conclusion that whilst acceptance tests are non-negotiable on projects where I have delivery responsibility, perhaps unit tests for TDD are not mandatory in certain situations. I have worked with several developers who are very very good and simply do not see the value in TDD as it is contrary to their own, very effective, development practices. I know in my team right now a couple of the very best developers do not use TDD the way everybody else does. Education and peer pressure has had no effect. They are delivering high quality code as quickly as anybody else. Its hard to force them to do differently - especially when some of them pay lip service to TDD and do have a high test coverage count. I know that they write those tests after they write their code.

In the last few weeks I came across a couple of concrete examples where TDD could have helped those developers deliver better code. In the future I will try and use these examples to persuade others to modify their practice

1. Too many calls to downstream service.

The application in question has a mechanism for determining identity of a client through some network services. Those network services are quite expensive to call. The application endeavors to call them infrequently as is safe and cache identity when is is resolved. We recently found a defect where one particular end point in the application was mistakenly making a call to the identity services. It was not that the developer had made a call in error, it was that the class inheritance structure effectively defaulted to making the call so did so without the developer realizing. The identity returned was never used. I suspect that this code was not built using TDD. If it had been then the developer would have mocked out the identity service (it was a dependency of the class under construction) but would not have set an expectation that the identity service would not have been called. The use of mocks not only to specify what your code should be calling but what it should not be calling is extremely useful. It encourages that top down (from the entry point into the system) approach where you build what you need when you need it.

Its likely that the defect would never have been introduced had the developer been using TDD. As it is we have a application which is making a large number (and it is a large number) of irrelevant calls to a contentious resource. We now have to schedule a patch to production.

Coincidentally, there was an acceptance test for this service, which was passing. This highlights a deficiency in our acceptance tests we have to live with. They test the 'what' but not the 'how'. The tests were running against a fully deployed application which had downstream services running in stub mode. The test proved that functionally the correct result was returned but it had no way of detecting that an additional spurious call to another service had been made during the process.

2. Incorrect error handling

In a recent refactoring exercise we came across a piece of code which expected a class it was calling to through an exception whenever it had an error processing a request. The error recovery in the code in question was quite elaborate and important. Unfortunately, the class being called never threw an exception in the scenarios in question. It had a status object it returned which indicated if corrective action needed to be taken. (It was designed to be used in conjunction with asynchronous message queues where throwing an exception would have introduced unnecessary complexity). The developer could have easily used mock objects and set an expectation that the exception would be thrown and the problem would have remained. But, if TDD was being used and the developer was working top down then the expected behavior of the mocks would have guided the implementation of downstream classes. Nothing is foolproof but I think this manner of working should have caught this quite serious error.

More subjective problems

I have also noted two other potential consequences of having some developers opt out of TDD. I do note that some developers on the team produce code that is more complex than others. It is fine from a cyclomatic complexity perspective but when you try and understand what it is doing you find yourself with a higher WTF count than you would expect. I think (again this is subjective, I have not gathered any empirical evidence) that a lot of the complexity comes from a lack of cohesion in the code. Logic is spread around in a way which made sense to the original developer as they had internalized all the classes concerned. That logic is not obvious to a new pair of eyes. If you are using TDD then this encourages cohesion in classes because it focuses the mind on what the class is responsible for before the developer has to worry about how it delivers those responsibilities.

This is a very subjective point and I would happily agree that several of the team members who do use TDD occasionally produce nasty code. My gut feeling however, is that it happens less often.

One final problem with some of the high flyers not using TDD is that bad practices tend to propagate through the team just as quickly as good ones. I have caught a couple of new joiners following a bad example or simply not using TDD becuase the developer they look to as a mentor is not evangelizing about the technique because they themselves do not buy into the practice. This is a shame as those new joiners often have a greater need of the rigor that TDD imposes than the more experienced developers.

tags:

Anti-pattern: The release vehicle.

Feb 9, 2010

At my current client site you cannot get a piece of compiled code into production unless you can find an appropriate 'release vehicle', i.e. a planned high ceremony release of the component which has been officially prioritised, scheduled and funded. (Note: The same does not apply to non-compiled code such as JSPs or XML templates containing complex XPath expressions).

Somebody very clever, who probably had a beard (Grady Booch?), once said that "Regular releases into production are the lifeblood of the software development process.". I agree. My current client also seems to be in agreement but cannot extract themselves from the constraints their existing processes.

The client in question has a successful agile adoption. Walking round the development teams you see task boards, burn downs and SCRUM meetings. Go to a management meeting and you'll hear them talk about two week iterations and the importance of continuous integration. At a strategic level, the organisation (which is very large) is still waterfall orientated. This has implications for the way in which work is financed. Funds for the development, testing and deployment of a certain application are released on waterfall inspired milestones. This, in conjunction with a legacy of long development cycles has led the this 'release vehicle' anti-pattern.

The organisation has an unwillingness to make a deployment of a component into production unless there is named and funded change request which covers its release. Activities within development, possibly funded internally as 'business as usual' do not have such CRs. Therefore, a development activity such as refactoring for technical debt reduction or improving performance might get engineering buy in but will not get released into production until some CR happens to touch the same application.

It is common to see refactorings made which then sit in source control for literally months as they wait for an excuse to go live. Medium to low priority defects or useful CRs which lack very high prioritisation from marketing never get executed because the programme manager does not have a release identified for the change.

The application suite can appear inert to external parties as it takes a considerable period for changes to make it through the full release cycle. This erodes confidence. If I was a product owner and saw that a team was taking six months to execute my minor change I am not going to be inclined to believe that the same team can turn around my big important changes quickly. I am going to be looking for other mechanisms to get my changes into production and earning money quickly. Once I find a route that works I am going to keep using it.

Why do people like the release vehicle?

  • It is the way the whole software lifecycle as exposed to the rest of the organisation works. The QA team don't test a component unless they have funding from marketing. Marketing won't be paying for something that has no role in a prioritised proposition. The Operations team won't support the deployment actives for our component if they don't have the cash from the same marketing team.
  • It looks like it is easier to manage for PMs. Releases (because they are infrequent) are a big deal, involve lots of noise, planning, disruption to everyday working pattern.
  • It reduces the infrastructure costs. It costs resource to make a release unless every aspect including testing and operational deployment is fully automated (and even then there is potential cost, dealing with failures etc.). It costs resource to automate a manual build process. Engineers appreciate that fully automated build processes are a priority because in the end they reduce costs and increase agility. It is that age old problem of trying to convince not just the build team, but the build team's manager and the build team's manager's manager that it is worth diverting resource in the short term to fix a problem in order to make a saving in the long term.
** This is a symptom of our strategic failure to get agile adopted beyond the development group. Until we do so, we will continue to hit these sort of issues.

What we should do instead:

We should schedule frequent (bi-weekly, ideally more frequent) updates in production from the trunk of source control for every component. We should not need an excuse for a release. The release process should be as cheap as possible, i.e. automated build, regression test, deployment and smoke test. The code in the trunk is supposed to always be production ready and the automated tests should keep it that way.

If we achieve this we should:
  • Reduce complexity in branch management (no merging changes made months ago).
  • Avoid a massive delay between development and deployment which is not cost effective and makes support very hard.
  • Increase our perceived agility and responsiveness.
  • Enable refactoring to improve non-functionals (stability, latency, dependency versions, capacity).
  • Prevent a release from being a 'special occasion' which requires significant service management ceremony.
If you release all the time everybody knows how to release. If you release twice a year every release involves re-education of the teams involved on deployment, load testing, merging etc.. etc. This increases the cost and risk that it fails.

Note: Having frequent, regular, low ceremony releases is greatly eased by having a fully automated build and deploy process but you can have one without the other. As stated above, having such a build process makes regular deployments to production cost effective but is an enabler rather than the justification for this change to working practice.

tags:

Ten breathtaking miles in the snow

Dec 21, 2009

I went for a 10 miler today in the snow. It was one of those fantastic runs that you remember for years afterwards. It was cold (sub-zero) but crisp and dry, not damp at all. The snow was pretty thick still, about 10cm off pavement. I ran one of my usual routes down to Sandridge then over the hill to Ayers End then back over the hill again to Nomansland Common and then home.

The snow was very powerdery, like icing sugar and it had been windy so the drifts were deep. The sunken lane behind Pound farm had filled up to knee height but the farmer (or somebody) had cut a narrow path through it. There were loads of hardy St Albans folk out enjoying the weather and breathtaking views in the bright sunlight. On the top overlooking Sandridge there was even a family having a picnic on a rug.

The path at the top of the hill had drifted up and there was no easy way thru. I jumped straight in. It only came up to just below my knee but the shock was incredible. It was like jumping into an ice bath. All the heat seemed to get sucked out of my calf muscles. It only lasted for about twenty metres but was a real trial to get through. I warmed pretty quickly as soon as I got out and brushed the snow off (bare legs).

I don't like to pause on runs, especially when its cold, but the views and the silence was so spellbinding several times I halted to drink it in for a few seconds.

Got back to St Albans feeling pretty good. Managed to do it at nine and a half minute mile rate which is nothing clever but the going was tough.

tags: running snow

Tale of two SCRUM stand ups

Dec 15, 2009

I walked past two teams doing their daily SCRUM standup today. Both teams claim to be agile. I didn't join in (even as a chicken) but just observed for a minute or so.

The first team was sitting down in a breakout area. Their body language spoke volumes. There was not one single participant maintaining eye contact with anybody else. Two people were playing on their phones. One developer had his head in his hands. Most had bored expressions. The team leader who is also the SCRUM master was the only person who spoke for the entire time I watched.

The second team was stood in a space near their desks. They were gathered round a task board which appeared to be up to date and the focus of several of the individual's updates. One person spoke at a time. Almost everybody appeared to be paying attention to whomever was speaking. Most updates were short and concise. A couple rambled on.

Other than both teams calling their meeting a SCRUM I could see no similarities.

As our agile adoption has spread beyond the original teams I suppose it is inevitable that as the experience gets spread a little thinner that people will simply label their existing activities with agile sounding names. Often we have no clear remit in those teams to supply a mentor and to try offer advice would result in rebuttal as team leaders guard their territory. Does this matter? Is there a risk that these teams who are not practicing agile correctly will diminish and discredit agile in the eyes of our programme managers? This is sounding a bit like an excuse for an Agile Inquisition going round checking that no team is using Agile's name in vain. This cannot be a good thing either.

tags: adoption agile scrum

Another great day off

Nov 20, 2009

I have had quite a few days off recently and the common theme seems to be mostly how rubbish they are. Today is a good example. So far my day off has consisted of leaving home at 6:30am and cycling, in the rain, to work, with a slow puncture. I had to go into work as something important overan so I need to go in to finish it off. Six hours later I left work and cycled home. As soon as I got in Helen went for a nap. There wasn't much in the house for lunch so I made a mess of my diet by having jam sandwich followed by chocolate biscuits. Very healthy. Then I read some email. It's 14:48 now. Tom needs to be woken up and Jamie will be finished at school in twenty minutes. Sigh. Before you know it, it will be time for bed.

tags: family rant

Chef Edith

Nov 16, 2009

Edith loves helping in the kitchen and has helped with the Sunday roast for the last few weeks. She keeps on telling me what to do and likes me to ask her permission and say 'Yes, Chef Edith' lots.

This Sunday when we got the meal on the table Helen congratulated 'Chef Edith' on an excellent dinner only to have Edith correct her "I'm not Chef Edith now, mummy, I'm Eater Edith". She then proceeded to consume her own body weight in roast potatoes, chicken and sausages.

Jamie has a habit of prefixing every statement with 'Actually'. Edith has picked up on this and got a bit confused so now every time we have Yorkshire Puddings with the dinner she thinks they are called 'actual-puddings'. If you try and correct her then you get a quizzical look then she will keep on calling them 'actual-puddings'. It is sending Jamie mad "because ice cream and jelly is an actual pudding, not these". This might well explain why she keeps doing it.

tags: edith family funnies

Obi-Wan's time was up

Sep 27, 2009

Jamie got the original Star Wars trilogy for his birthday yesterday. He has watched all of the clone wars cartoons so Obi-Wan, R2D2 are all familiar friends. Today we watched Star Wars. The stream of questions was unceasing. "Why are the goodies (Storm Troopers) on the bad team now?", "Why is Obi-Wan so old?". We got to the fight scene between Darth Vader and Obi-Wan and Jamie couldn't quite believe that Obi-Wan got killed. He was a little upset and sat there quietly for a few minutes until I thought he was just watching the film when suddenly he announced "Obi-Wan was quite old.". I asked why this was significant, was it that an old Obi-Wan being cut down by Darth Vader's light sabre was okay? "Yes, he was old so he would have died soon anyway.". He perked up after that.

tags: funnies jamie

About Me

I am a software developing, long distance running, husband and father of three who owns a badly maintained website.