User Tools

Site Tools


memory_management_design_ii

In memory management design I described the IoTivity deficiencies in memory management. Here I describe the architectural and design changes that can eliminate those deficiencies.

A wise mobile game architect once told me that his games allocate all memory at startup, never touching the heap thereafter. My experience says this is a worthy goal for IoTivity.

Here are the primary strategies for robust, long term memory management:

  • Drastically minimize the number of heap allocations.
  • Reorganize critical functions.
  • Prioritize reducing the number of heap allocations over efficiency of memory usage.
  • Parameterize and fully characterize every heap allocation.
  • Report memory failures explicitly.

minimize

Most of the coding work will involve reducing the number of heap allocations. The following tactics allow drastic reductions:

  • Include structures in other structures. For example, the fixed size CAEndpoint_t structure is allocated and carried along as a separate structure from CARequestInfo_t and CAResponseInfo_t. It could be included in those structures, eliminating malloc, separate copy, and overhead.
  • Eliminate trivially different structures. For example, CARequestInfo_t and CAResponseInfo_t are nearly identical. Use one structure, even if some elements are only used for one or the other purpose.
  • Put fixed size arrays in structures instead of pointers to allocated arrays. For example, the CAToken_t structure and be allocated inside the ResourceObserver structure instead of a pointer to an allocated structure as in AddObserver in ocobserver.c.
  • Put fixed string arrays in structures instead of pointers to variable length string alloctions. Most of these strings have maximum sizes. From a memory fragmentation point of view, it's better to allocate unused space than have a separate allocation. In practice, the heap free list overhead often exceeds the ostensible savings. For example, the resourceUri element in CAInfo_t should be a char array of size MAX_URI_LENGTH. This may result in some extra string copying, but it also eliminates a significant source of memory leaks since there is no question of buffer ownership.
  • Locally allocate small buffers. For example, one of the largest users of mallocs is ocpayloadparse.c and ocpayload.c, which turn the many elements of a cbor payload into allocated elements of a payload structure. By allocating enough extra space at the end of a payload structure, the allocations for those elements can be a very lightweight series of buffer increments in the extra space. Then when the payload structure is destroyed, only the original structure needs to be freed, eliminating an equivalent number of frees. This technique is used in the Linux ifaddrs structure, which deals with a similar issue.
  • Eliminate temporary mallocs. For example, getQueryFromUri in ocstack.c allocates a string buffer that its caller copies from and frees. By changing the argument of getQueryFromUri to a pointer to a stack string array, the nearly instantaneous malloc/free sequence can be eliminated.

reorganize

After reducing the number of allocations to a critical few, it is possible to rethink the overall structure:

  • Separate unneeded functions. For example, IoTivity can only be built as a combined client-server. Constrained OIC devices usually run only as servers. Eliminating the need to compile client code in an IoTivity build can simplify some structures and eliminate others.
  • Preallocate critical structures. For example, instead of allocating every structure from a heap, most or all structures can be allocated from specialized memory pools that offer one buffer size each. When a buffer is returned to a pool, it is kept there until the same usage needs it again. The size of a pool can be chosen at build time. The pool will generally be filled to that size at startup, though dynamic expansion might be allowed. When a pool can't supply its buffer, the associated function doesn't happen. For example, if a buffer is needed for processing a received message, the number of those buffers pre-allocated determines how many simultaneous requests can be handled. If that buffer were allocated from the heap, its allocation might interfere with processing of another request.
  • If all buffers are pre-allocated to memory pools, then exception handling can be simplified and made more reliable. For instance, one buffer in a critical pool might be reserved for emergency use such as reporting an exception such as failure of another critical memory pool.
  • Fail-soft is also possible. If one pool runs out of buffers, it can ask another pool to free of its buffers. For instance, if IoTivity unexpectedly faces a surprising number of observe structures, the observe buffer pool could ask the receive buffer pool to free a receive buffer. This might limit the number of requests that can be handled simultaneously, but the observe requests can be satisfied, and a client can be notified that a configuration error happened.

prioritize

Many developers' reaction to a constrained memory environment is to write code that minimizes the total size of allocated structures. This leads to issues like the ones described in the previous section. I submit that reliability and repeatability are far more important in a constrained environment than efficiency.

To illustrate this issue, consider a design where a complex structure is needed for each transaction. A developer finds that by allocating the component elements for only the size needed for a specific transaction, he can usually use half the memory required by a worst-case allocation of all the fields. Theoretically, the design can then usually run twice as many transactions simultaneously. In practice, random variations will result in one or the other of the theoretical allocations being larger than average, so in many cases, there will still only be enough memory to run one transaction at a time. Worse, any other demands on the memory, especially ones with lasting allocations, will have an outsized effect on the memory. Furthermore, allocating a second structure while the first is being freed (unlikely to be an atomic operation), can easily result in the second failing buffer failing to allocate (usually when it is almost fully allocated). And while the second one is deallocating due to that failure, another one can arrive and have the same issue, with repeats possible. Using most of the memory as a single, fully allocated buffer, ensures that at least one transaction can take place and eliminates the allocation/freeing overhead, perhaps allowing faster transaction throughput.

I'm not asking you to believe the previous paragraph, but I wanted to illustrate some of the issues involved. The real situation is much more complex and the likelihood of unexpected behavior is high. I believe the certainty of static allocation, the reduced malloc/free time of static allocation, and the coding simplicity make static allocation preferable.

parameterize

When the number of allocated structures has been reduced to a manageable number, such as 5-10, the contents of the structures should be carefully analyzed to look for redundancies and overages. The structures of a constrained environment should be carefully constructed and analyzed, and their sizes should be easily adjusted with macros or build variables.

Once it is possible to build an IoTivity server, the sizes of some structures for a server might be different than the values canned into a combined client-server. For example, the resource URI strings of a client must be big enough for any server it might talk to, including ones not designed yet, but the URI strings of a server need only be large enough to reference the server itself. URI strings can be a significant memory allocation, and that size can be known with careful analysis.

The structure size parameters should be put in the same place and explained adequately for server application developers to manipulate.

report

Memory allocation failure is different that most errors that will occur. Allocation failure is a system level failure, and it is almost always fatal to the system that sees it. It may be the result of processing a specific request, but it can also just result from a request that happens to be the 1000th request to be processed. It is typically a resource failure rather than a protocol violation. And it is likely to be the last real failure that node sees.

As a special error, memory allocation failure should be treated carefully. When it occurs, it should invoke a carefully designed reporting path that has priority over everything else and can't fail. By “can't fail” I mean that there should not be any memory allocations in the path of reporting it, or, if a buffer is needed, the availability of that buffer should be guaranteed. One way to guarantee a buffer is to pre-allocate it and only use it to report resource allocation failures.

The response to an allocation failure should also make clear that is an allocation failure (and likely the last sane message coming from that node.) As I mentioned previously, IoTivity reports some memory allocation failures with a generic OC_STACK_ERROR, losing critical information.

summary

I hope I have made the case that fixing IoTivity's memory issues will require significant effort and the result will be a radically transformed IoTivity. It will require a major act of will on the part of the IoTivity development community to make the changes I present.

But there is a path forward. In memory management design III I describe an IoTivity fork that includes most of the changes presented here.

John Light
Intel OTC OIC development

memory_management_design_ii.txt · Last modified: 2015/11/20 17:40 by John Light