Tuesday, July 28, 2009

Agile Mission Critical Software Development

Agile Mission Critical Software Development
A presentation that I gave to the Australian Institute of Engineers.

Friday, January 30, 2009

THE “HOLY GRAIL” OF I.T. IN DEFENCE

Defence processes information from many sources: much of it is Unclassified, but much of it is not. This latter set causes Defence the biggest I.T. headache, since it needs to be kept physically separate from all the other information that is being held.
For the different classification levels, Defence maintains separate information networks, separate phone networks, even separate messaging networks. However, most Defence personnel need access to many if not all of these networks, so their desktops are littered with different PCs, terminals and phones, all connected to the relevant network(s) they have access to.
Why not plug the different networks into the same PC? That would let them surf the Internet in one window, while typing up a highly-classified report in another window, and the computer would keep them separate. Wouldn’t it?
America's and Australia’s highest security offices (the National Security Agency, or NSA, and the Defense Signals Directorate, or DSD) say “No”. In fact, most professionals involved with maintaining security agree with them: you can’t trust computers to not accidentally put data from one window into another. Actually, it’s not the computers you can’t trust: it’s the software that the computers are running.
Think about it. Today’s PCs run at around 3 billion instructions per second. That’s 86 trillion operations in one typical work day – and it does all of them the same way every time. It’s not the hardware that makes mistakes, it’s the software. Today’s software is very complex, and has lots of bugs in it. Every day Linux has new patches that you can download. Every week Microsoft releases new patches for Windows and its applications. Does that mean that after the latest round of patches all the bugs have been fixed? Hardly.
And that’s just the accidental problems. What about a deliberate attempt to extract the information from your computer? You can visit a website that silently uploads a malicious program onto your computer, that then searches all of your hard disk for your private information, such as credit card details or passport information, and sends it to… well, anyone. And it doesn’t matter how many firewalls, anti-virus scanners, anti-phishing filters and other anti-malware software you might install, a clever cracker can find a weakness in any of these and steal your information.
No wonder the Security organisations said “No”! So, the current solution is to simply keep the separate classification levels compartmentalised on different machines, and put up with the extra hardware, the extra power usage, the extra maintenance, and the extra sheer physical room all of this takes up.
Until now.
The “Holy Grail” of Defence’s I.T. department has always been a software solution to this hardware problem, but every software attempt has always been met with the question “But are you sure?” In fact, the security agencies of many countries got together and defined a Standard that software would have to meet if it was to maintain the compartmentalisation described above. They set a seven-level scale known as the Common Criteria Evaluation Assurance Level (EAL) scale, and deemed that EAL-4 was barely sufficient to merely allow two adjacent classification levels to be separated against “casual or accidental threats”.
To achieve EAL-4 takes many months of laborious paperwork, essentially reasoning out where the risks might be, and how they are ameliorated. Of course, the larger the amount of software to go through, the larger the effort in paperwork. Did you know that Microsoft’s Windows XP is over 40 million lines of code? Or that Linux is over 200 million? (http://en.wikipedia.org/wiki/Lines_of_code) It is no wonder that XP has never been evaluated to EAL-4, and that there are only a couple of severely cut-down versions of Linux that have been so evaluated!
What about higher levels? To achieve EAL-6 requires much more rigorous examination. In fact, the Standard says that every line of code in the final system has to be mathematically proven to be correct. Not only that, but there cannot be a way for one compartment to even work out what another compartment might be doing by simply noticing (for example) that it itself is taking longer to make its calculations. (As an aside, have you ever wondered why many Defence car parks are underground? It’s so that observers can’t count the cars and wonder what could be happening in the world that means that so many people are working at 3 A.M.!)
And trying to mathematically prove the correctness of 40,000,000 lines of code would take a horde of Ph.D. mathematicians centuries of time – and we already know it is not correct!
So, in 1997 Dan O’Dowd, the CEO and then-CTO of Green Hills Software, Inc in the U.S. (http://www.ghs.com) sat down and designed a new Operating System. One that could be proven to be correct. One that wouldn’t take forever to test. It was small. It was efficient. Above all, it was reliable AND secure. And the first time it was used was as the Flight Control System (FCS) of the B1 Bomber. An FCS needs to be ultra-reliable. (QANTAS recently suffered an in-flight error in an FCS, and many people were hurt.) An FCS needs to do many different things, but cannot let processing for one function dominate the processing of another. In other words, an FCS is a smaller version of what is needed by the ADF, and indeed all Defence organisations around the world.
In 2005, the software was given to the NSA for the purposes of evaluation to EAL-6. It came with (literally) a truckload of documentation, as well as the results of a horde of Ph.D. mathematicians to prove that its 10,000 lines of code were correct. But the claims of what the software did was more stringent than EAL-6 required, so was evaluated to what was termed EAL-6+. And NSA studied the documentation. It studied the software. It studied the proofs. It even gave the whole shebang to some tame code hackers to try to crack it. And after three years of trying to break it, recently NSA agreed it could not be done.
At last, the world has a system that the finest security minds agree is able to maintain the separation of different compartments, even in the face of “a determined and well-funded attack”.
OK, so what now? Does this mean that everyone has to throw out their beloved Linux, or familiar Windows, and learn a new way of doing things? Actually, no. This new OS is so small that it can run on a PC with practically no overhead at all, and then compartmentalise not only different applications (like a Web browser and a word processor), but even different Operating Systems.
That’s right – it will run Windows or Linux as though it was just another program. This means that it can run both Windows and Linux at the same time, or two copies of Windows, or even two of one and three of another, all at the same time! Each OS runs at a different classification level, and you can surf the Internet using Windows, while typing that highly-classified report under your favourite distribution of Linux.

Tuesday, December 16, 2008

DO-178B Level A "Compliant" Software

I get asked a lot about developing software for systems undergoing DO-178B certification. This entry attempts to dispel some myths and provide some information about building safety critical software under this regime, and what to look for in partners offering "DO-178B Compliant" components.

If you've not seen it before, DO-178B is a set of guidelines relating to the development of software for the avionics industry. In recent years its usage has been spreading beyond aircraft, primarily due to its success in maintaining safety in complex software systems. As far as I am aware, there's never been an aircraft fatality linked to a failure of DO-178B Level A software (more about levels in a second).

The DO-178B guidelines allow for software to be divided into levels of criticality. The lowest is Level E, which means it's not going to hurt anyone if it fails. The highest is Level A, which means there's a good chance the platform will crash and people will die if it fails. The DO-178B guidelines propose different levels of planning, requirements documentation, testing effort and validation for each level, obviously getting progressively harder until you hit 'A'.

There's no such thing as "reusable" DO-178B software. Each and every implementation has to be re-certified from scratch for the specific system undergoing review. This is because DO-178B is evidence based, and tests must be exercised on the final flight hardware (not some other piece of equipment with some other version of the software).

There is a lot of confusion about building software for DO-178B certified platforms. Unfortunately a lot of the marketing done by various companies (with phrases like DO-178B "ready" or "compliant") makes the situation very difficult for first-time entrants to the avionics software market. Some of these companies have caused whole avionics programs to be delayed or even fail outright due to their inability to generate satisfactory evidence for the flight certification authority.

Given the risks involved with not achieving certification - how does a company select a provider to work with?

  1. Confirm the provider has pedigree in delivery of software and certification evidence on DO-178B Level A systems. They should have done dozens of these before (you don't want to be number one or two).
  2. Get contact details for a reference customer who has achieved successful certification using the same version of software as you will be using. Don't accept a reference who used a different version of the software. Again, you don't want to be the first person certifying some new experimental software - you want proven experience.
  3. Confirm that the COTS supplier is providing the certification evidence using their own experts. Ask to talk with their DER (Designated Engineering Resource) about the solution and how previous certifications have gone with the FAA, EASA or whoever. If they don't have internal experts, including a DER, then they don't have the expertise to get your solution over the line. Never accept a COTS supplier who refers you to a 3rd party for generation of certification evidence. The 3rd party agent is not an expert in the software, can't make changes or fixes as necessary to assure success and probably hasn't got the financial backing to be in the market for the next 20-30 years as required for maintaining support (in case a fielded defect is discovered and a fix needs to be made and re-certified quickly).
  4. Ensure the COTS supplier has a program for developing the certification evidence on your specific flight hardware. You cannot reuse evidence from different hardware. You especially can't reuse evidence from a different version of software.
  5. Request an onsite audit of the COTS supplier's software and processes prior to selection. Visit their facilities. Talk with their DER. Understand their process and how they will work with you to get your product successfully through certification. Discuss how your specific code-branch is handled, how you are notified in the event that a defect is found, who will provide your engineering support during development and what experience they've had with previous programs.
  6. Ask the supplier outright (in writing) if certification for their software has ever delayed a program or caused it to fail certification.


There's probably lots more. A great reference for first-time DO-178B software developers is "Avionics Certification - A Complete Guide to DO-178 (Software) DO-254 (Hardware)" by Vance Hilderman and Tony Baghai. I believe it's now available on Amazon.

Monday, December 15, 2008

Welcome

Welcome to the Embedded Software Technology blog. Many thanks for dropping in.

I've been developing software and managing teams in the automotive, defence and commercial software fields for over ten years - however have never spent time documenting my (often strong) beliefs across a range of issues relevant to the industry. I am privileged in my current role to spend time with many technology companies across a variety of markets, and I've consequently discovered a great deal more about "what works" (and what doesn't). I'm hoping to share that information here for the benefit of others, and also learn from others who post comments.

To provide some insight into issues you might see here, I've picked some random topics for future expansion:
  • Software Reliability and Mission Critical Systems
  • Virtualisation for Embedded Systems
  • Test Driven Design - debrief from a successful project
  • Compiler Technology - the need for optimisation
  • Security for Embedded Systems
  • Tools versus People - where should we invest?
  • Embedded Software Technology Forecasting (what's around the corner?)
  • Toolchain Selection - choosing what's right for your development
  • Will Free Software Kill Commercial Software?
I consider embedded software systems to be among one of the most challenging places for a software engineer to work. The work is always challenging, often frustrating and hardly ever dull. This must be why so many talented (and good looking) people work in this industry...

Please feel free to leave me requests, comments and opinions.