I went to the RTECC Conference in Phoenix today. It was interesting, mostly marketing type presentations. But, you got a free lunch out of it (they know how to get engineers). Here is the break down of the sessions I went to.

Debugging Drivers in User Space -- QNX

QNX is a microkernel architecture, so they think that everything should be done in user space. But, the speaker was very good about mentioning that all these techniques could be used, with slight modification, in OSes that don't traditionally use user space drivers. One of the things that he did really well was grouping:

Types of device drivers

  • User apps renamed. These are programs that people call device drivers because the talk to a device. But, the device is so high level, and their control is so high level, it really is a user space task.
  • R/W hardware on request. These drivers are pretty simple. They're not getting notification of anything from the hardware, just gathering information when a calling task asks for it. These are pretty easy to implement in user space.
  • Polling hardware. This is the type of driver that polls hardware for information. Not a good thing in anyone's book, but they do exist. They can also be easily written in user space.
  • Interrupt based. These drivers are the most difficult to do in user space, but it is possible. Some OSes (QNX in particular) allow user space applications to receive interrupts. For others a kernel interrupt proxy can be created where the kernel driver gets the interrupt and unblocks a calling task (ick!). He said that they're also seeing that about 80% of drivers today do not do any processing of data in the ISR, just signaling back to a waiting task.

One of the biggest advantages that he saw for user space drivers (and I agree with) is that you can use source level debug tools (although he didn't like GDB). It is important to note, that even in user space drivers, ISRs can't be debugged with source level debuggers. Also, you get memory protection for when you're developing and executing, increasing system stability.

Real-Time OS Support for Distributed Systems -- Enea

Enea has their own OS called OSE which is used in some medical and telco products, but doesn't seem to be that big overall. One of the strengths of OSE is that it has a fault-tollerant, robust, multi-cpu messaging system. They have now ported that system to other OSes, including Linux and Windows, and are trying to get developers to buy into its functionalities for more generic situations.

Probably the system that it is most similar to is CORBA. It allows for service discovery, then connecting to services on other processors, or the current processor. Messages can be passed between all the devices on the system. One thing that is nice, is that the services can be attached to, and if they go down there will be notification of the loss of service. You can see while the telcom people love this.

In the "enterprise" environments (not OSE) the communication is handled through a "gateway" process. This is communicated to by the a client library (which they give you source to) and then it handles all the network stuff (though, network could mean serial, CAN, ethernet, etc.). The gateway could be a daemon on the current platform, or there could be a single gateway on the network for enabling communication.

To set up the connection broadcast UDP packets are sent, with different types of custom ACKs that are sent back. Hunting for devices can be synchronous or asynchronous.

Going beyond White-box testing -- Polyspace

Unfortunately the apps engineer who was supposed to give this talk was unable to make it, so it was given by the marketer from Polyspace. The material was still interesting though. They are selling a product that does abstract semantic analysis of code, which is basically determining run-time errors without executing it. They do this by building a model of the code, including ranges on all values. There is then a list created of all the errors that could occur for particular operation, and that list is run against the model.

The real problem here is that the model is huge, and the possible combinations is even larger. So, they have different algorithms to reduce the set, while still maintaining some level of coverage. For some approximate results, he said that running on a modern machine (3GHz P4) doing a complete analysis on 20K LOC would take about 2 hours. There are much faster algorithms in their tool, but that was just a number he knew.

Also, what was interesting is that their tool will stub out code that isn't available at analysis time. This could include vendor supplied libraries, or other modules that won't be in this analysis run. This can increase computation time in some ways (as there are few restrictions on the return parameters) but significantly reduce it in others (less code paths).

Interesting tidbit: The FDA reports that between 1992 and 1998 7.7% of all medical product recalls were software related, and that number is increasing.


posted Feb 17, 2005 | permanent link