Next: 4. Model Information Up: Diploma Thesis: Utility Support Previous: 2. Related Work Contents

Subsections

3. Code Instrumentation

Insertion of generated code into java programs is the main subject of this paper. Such an automatic source code transformation is commonly referred to as code instrumentation. In this paper it covers anything beyond code generation, to get a java program checking its own constraints. For an idea, where code generation ends and instrumentation starts, see section .

This is followed by an analysis of requirements for the code instrumentation and resulting design decisions in section . Section describes the solution in detail.

Finally sections and discuss the more fundamental issue, when and how often invariants have to be checked.

3.1 Results of Code Generation

The java code generator developed in [FF00] produces a set of code fragments^3.1. These code fragments have the following properties:

Property
Constrained type	The class this constraint applies to.
Kind	Specifies, whether this fragment is an invariant, a pre- or a postcondition or a transfer or preparation fragment for a postcondition.
Constrained operation	The operation, this constraint applies to (not valid for invariants).
Code	Contains the actual java code to be executed.
Result variable	Specifies the name of the boolean variable, which contains the result of the OCL expression after code execution.

For each postcondition containing a @pre expression there are two additional code fragments called preparation and transfer. See below.

3.1.1 Preparation and Transfer Fragments

The meaning of preparation and transfer fragments is explained on a dramatically simplified example.

Suppose a post condition for operation employ(), that leaves the attribute age unchanged:

context Person::employ()
post: age=age@pre

The following code fragments will be produced:

Kind	Code
Transfer	`int node1;`
Preparation	`node1=this.age;`
Post Condition	`int node2=this.age;`
	`boolean result=(node1==node2);`

Typically these fragments would be inserted as follows:

class Person
{
  void employ()
  {
    int node1;       // transfer fragment
    node1=this.age;  // preparation fragment
    // original code of employ()
    // post condition fragment
    node2=this.age;
    boolean result=(node1==node2);
  }
}

Note, that precise semantics of code fragments involving the @pre expression has been changed, so that the original meaning described in [FF00] section 7.1.2 is no longer fully correct. For a detailed comparison see section

3.2 Requirements and Design Decisions

This section analyzes the requirements for the code instrumentation and derives some fundamental design decisions.

3.2.1 Reversable Modification

The most important feature is the reversability of code instrumentation. It must be possible to

clean the code tracelessly from all inserted fragments.
redo the instrumentation on source code that has already been modified, for instance when constraints have been changed.
edit the modified source code without losing all changes at the next instrumentation.

These requirements makes things quite a bit more difficult, but there are serious reasons for this. Otherwise there would be two versions of source code: the original and the modified version. This raises some unpleasant problems:

Configuration management must handle two source code trees.
Developers must be careful to edit the original version only.
Running the instrumentation is required after every change of the java source code, not only when the constraints have been changed.
Stack traces of runtime exceptions point to the modified source code. Developers must look for the corresponding place in the original version.

The implementation of reversable modification requires a strategy of minimally invasive modification. This is realized by two design decisions:

Method wrappers, explained detailed in section .
Explicit package qualifiers for the OCL library in the generated code. Otherwise, an import statement for the OCL library would be necessary. This would be just another spot, were the original source code had to be touched. Additionally, this may introduce name conflicts between OCL library and user code.

3.2.2 Embedding Constraints in Java Source Code.

It should be possible to embed constraints in the javadoc comments. The placement of embedded constraints implicates (and replaces) the context of the constraint. See the example below.

/**
   @invariant ageGreaterZero: age>0
*/
class Person
{
  int age;

  /**
     @postcondition: age=age@pre
  */
  void employ();
}

Constraints should be immediately visible to the editor of a java file. Also, this generally promotes the single source approach. The author is strongly convinced, that constraints stored in an extra text file are too far away from attention.

Invariants may also be placed on an attribute or method of their context class. This is for convenience, since most invariants are clearly related to one specific attribute or method.

3.2.3 Checking the Element Type.

The instrumented code must check, that container attributes comply to the @element-type and @key-type javadoc tags. For collections the @element-type tag specifies the type of the objects allowed in the collection. For maps @element-type specifies the type of values, while @key-type defines the type of key objects. A detailed description is provided in section .

Note, that this feature may be used standalone, without OCL expressions at all. Then it provides a runtime check for typed collections.

3.3 Code Insertion

The main task of code instrumentation is to have some code executed immediately before and after all methods (and after all constructors too). This section describes, how this is done by the tool developed with this paper.

At first, section intoduces a simple approach, and why this would not work. Sections and explain the approach followed in this work. A somewhat tricky caveat and how it is solved is worked out in . Section shows, how it is managed to undo all modifications of the instrumentation. The java parser used to do all this is outlined in .

Sections , and compare this solution to three other tools approaching a similar task quite differently. This is summarized in .

3.3.1 A Simple Approach

A straight-forward solution would insert the code directly into the method. Consider the following method.

int someMethod(double x)
{
// here comes the code.
return result;
}

The generated code could be inserted like this:

int someMethod(double x)
{
  // some code checking invariants/preconditions.
  // here comes the code.
  // some code checking invariants/postconditions.
  return result;
}

But this raises some severe problems:

The code to be executed after the method (postcondition and invariants) has to be inserted before any return statement.
The post condition code must have the return value available. Instead of the result variable in the example above, there could be a complex expression. Such a return expression has to be computed in advance, if the post condition code refers to the return value or the return expression produces side effects.
There may be name conflicts between the original and the generated code, since the generated code defines local variables.
For methods with return type void it must be decided, whether the post condition code has to be inserted at the end of the method. This depends on whether the end of the method is a reachable point of code. For the decision it needs a complete control flow analysis of the method. Note, that if the post condition code is wrongly inserted at the end of the method, the java compiler will fail due to unreachable statements.

An implementation would need a complete java parser. The following code instrumentation would have to modify the original code at many different places and in a complicated way. This runs contrary to the strategy of minimally invasive modification as decided in section

Additionally, item 4 requires much of the semantic analysis performed by a java compiler. This makes the simple approach very hard to implement.

Jass ([JASS]) and iContract ([RK98]) use this simple approach and encounter all the problems mentioned above. For a detailed comparison see sections and .

Method wrappers solve all these problems in a nifty but simple way. This is introduced in the following section.

3.3.2 Wrapping Methods

Some code tells more than thousand words, so an example is used to explain. Consider the following method.

int someMethod(double x)
{
// here comes the code.
}

This is transformed into two methods.

int someMethod_wrappedbyocl^3.2(double x)
{
  // here comes the code.
}

int someMethod(double x)
{
  // some code checking invariants/preconditions.
  int result=someMethod_wrappedbyocl(x);
  // some code checking invariants/postconditions.
  return result;
}

Now let's have a look back at the problems encountered for the simple approach. None of them exist anymore.

The code to be executed after the method has to be inserted once only.
When the post condition code is executed, the return expression is already evaluated and ready to use.
No name conflicts are possible, since user code and generated code are strictly separated into different methods.
No control flow analysis is needed.

The user code is modified in a simple way: a suffix is appended to the method name. For an implementation a very fragmentary java parser is sufficient, which understands ``java signature level'' only. This signature level covers anything outside of method bodies and attribute initializers. This is a very small part of the java language and easily to be analyzed by a hand-crafted parser. See section

how easy it is.

This kind of method wrapping still has a problem, which has been called the ``wrapper loop'' in this paper. Section shows how this is solved.

3.3.3 Wrapping Constructors

Another transformation is used for constructors, since they cannot be renamed. Suppose an example constructor.

SomeClass(String x)
{
// here comes the code
}

Instead of renaming, the original constructor gets an additional dummy argument.

SomeClass(String x, Dummy^3.3 oclwrapperdummy)
{
  // here comes the code
}

SomeClass(String x)
{
  this(x, (Dummy)null);
  // some code checking invariants.
}

A special case occurs if a class doesn't provide any constructors. Then the java compiler generates a default constructor as specified in [GJS96] section 8.6.7. This default constructor cannot be wrapped. Instead it is replaced by an explicit constructor with the same access modifier as the generated default constructor would get.

3.3.4 Avoiding the Wrapper Loop

Wrapping methods as described in section causes a problem for a special situation. This sections describes this situation and provides a solution.

The critical situation is shown in the figure below:

The dotted arrow represents a method call: Sub.method() contains a statement super.method(); somewhere.

The code instrumentation changes the structure as shown below:

The arrows show the problem: there's an infinite loop of method calls. In detail the following happens:

Method Sub.method() is called somewhere in the user program. This is a wrapper method replacing the original method, which is named method_wrappedbyocl() now.
Sub.method() does some OCL specific things, before it executes the statement method_wrappedbyocl(). This calls the original method Sub.method_wrappedbyocl() as it is supposed to be.
Sub.method_wrappedbyocl() contains the super.method(); statement, therefore calls Super.method().
Super.method() is a wrapper method replacing the original method, which is called Super.method_wrappedbyocl() now. It does some OCL specific thing, before it executes the statement method_wrappedbyocl(). But this statement does not call Super.method_wrappedbyocl() as it is supposed to be, but Sub.method_wrappedbyocl(), which finally causes the infinite loop.

The principal solution approach is simple: method Super.method() should force Super.method_wrappedbyocl() to be executed, although this method was overridden in class Sub. Java language does not provide a way, to call a method, which was overridden. Therefore we do a small trick:

Wrapped methods get the class name appended. Thus, a wrapper method can call the wrapped method of its own class.

3.3.5 Cleaning the Code

Reversable modification means, that the instrumented code can be cleaned from any modifications without leaving any traces. This section explains, how this requirement is met.

The user code is modified in two different ways only:

Renaming the wrapped methods/constructors.
Adding new object features, e.g. wrapper methods, methods for checking invariants and observing attributes.

For each method to be wrapped the suffix _wrappedbyocl is appended to the name. This transformation is done on the unparsed method header, so all typographical extras (line breaks, comments etc.) are preserved. This transformation is easily reversed, when the code has to be cleaned. For constructors, this works similarly with appending the dummy parameter to the parameter list.

Removing generated class features relies on the fact, that all generated features get a special tag as shown below.

/**
@author ocl_injector
*/
void checkOclInvariants();

When cleaning the code, all object features carrying such an @author tag are removed. This is quite simple and functional.

3.3.6 Design of the Java Parser

Previous sections stated, that a very simple parser is sufficient for implementing wrapper methods. This is proven in this section by giving an overview of the parser's design. In fact, it is as simple as a parser used for syntax highlighting and class browsers in a java IDE.

First, the parser is actually a manipulator. The java file is simultaneously read, parsed, modified on-the-fly and written to an output file. For a class diagram of the parse tree produced see figure .

**Figure:** Design of the Java Parser used by the OCL Instrumentation

The parser analyses things which are relevant for the parse tree only. Particularly method bodies and attribute initializers are ignored. These skipped parts may not even compile. As long as the parenthesis balance is held, the java parser will process them correctly.

3.3.7 Comparison to Jass

Jass ([JASS]) is a precompiler for checking assertions in java programs. It translates jass files into java. Jass files are valid java source code with assertions specified in comments. The generated java file contains additional code checking these assertions. Thus, Jass performs something similar to the code instrumentation presented in the sections above.

However, Jass directly inserts generated code into user code as described in section . Thus the problems found there should occur in Jass too:

The code to be executed after the method has to be inserted before any return statement.
This is done by Jass. Thus, it requires a full java parser (JavaCC here).
The return expression has to be computed in advance.
This is also done by Jass.
There may be name conflicts between the original and the generated code.
Jass just defines a number of names (e.g. jassResult), which cannot be used in the user code.
For methods with return type void it must be decided, whether the end of the method is a reachable point of code.
This is a known problem of Jass. It will simply fail in such cases. There is a work around: enclose the method body into a if(true){...} statement.

Jass performs a complex modification of the java source code. This modification cannot be reversed as described in section

. Thus, Jass must be run before compilation whenever the source code has changed. Additionally, the source code repository must hold jass files instead of java, which causes administration effort for existing projects.

Jass cannot use wrapper methods, since this would not allow loop invariants and check statements to be implemented. But without these features as in OCL, method wrappers are much better than direct insertion of code.

3.3.8 Comparison to iContract

iContract ([RK98]) is a precompiler for checking OCL constraints in java programs. It extracts constraints from javadoc comments and produces modified java source files checking these constraints. This is exactly the functionality, which the tool presented in this paper tries to provide.

Just like Jass it uses direct insertion of generated code into user code. Once again the problems found in section are reviewed.

The code to be executed after the method has to be inserted before any return statement.
This has to be done by iContract. However it fails here for most cases. According to the list of known problems ``iContract generates wrong code or crashes, if there is more than one return statement in a method.''. This matches with the experience of the author.
The return expression has to be computed in advance.
This is also done by iContract.
There may be name conflicts between the original and the generated code.
Also iContract forbids a number of names (e.g. __return_value_holder_), to be used in the user code, although this isn't documented.
For methods with return type void it must be decided, whether the end of the method is a reachable point of code.
iContract doesn't do the control flow analysis needed to decide this question. This has been proven on example in appendix .

Experience with iContract shows, that correctly modifying java source code isn't trivial at all. The current list of known problems suggests, that iContract isn't usable for a real-world task. Additionally, the problem of reachable ends of method bodies remains.

3.3.9 Comparison to Byte Code Instrumentation

This paper is focused on source code instrumentation. Another approach is to modify java byte code. This section discusses, whether it's possible and useful, to apply the concept of method wrappers to byte code instrumentation.

For the third time the problems found in section are reviewed.

The code to be executed after the method has to be inserted before any return statement.
This is also needed for byte code instrumentation, but it's much easier. The code just has to be scanned for return opcodes.
The return expression has to be computed in advance.
This is not needed. Whenever a return opcode is executed, the return value is ready-to-use on the top of the execution stack.
There may be name conflicts between the original and the generated code.
This cannot happen. Name conflicts with local variables are not possible, since variable names do not exist anymore in byte code. Name conflicts with non-local variables are not possible too, since they are handled by different opcodes.
For methods with return type void it must be decided, whether the end of the method is a reachable point of code.
This is also no problem. The java compiler takes care for it - just scan for return opcodes. If the end of the method body isn't a reachable point of code, then there is also no return opcode. On the other hand, if the end of a method is reachable, there will also be a return opcode, no matter whether there was a return statement in the source. This is shown on example in appendix , since the JVM Specification [LY97] wasn't that clear about it.

Thus, it makes no sense, to apply the approach of method wrappers to byte code instrumentation.

In contrary to the approach presented in this paper, byte code instrumentation has to be redone after every compilation of the source code. This may be done on the fly, when loading the classes into the JVM. For instance jContractor [KHB98] uses a class loader to instrument java code. This may cause problems, if the user code registers a class loader of it's own. This is solved by Handshake [DH98], which uses a proxy system library to intercept the JVM when opening class files. Thus, Handshake buys total transparency to the JVM with platform dependency.

3.3.10 Summary

Wrapping methods seems to be the best choice when instrumenting source code for checking constraints. Several problems encountered with the simple approach of direct insertion are solved.

Method wrappers cannot be used, if implementation constraints (assertions, loop invariants) are to be checked. Since OCL does not support implementation constraints, this does not affect the scope of this paper.

Method wrappers are not useful for instrumentation on byte code level. The biggest advantage of source instrumentation (if it is reversible) is, that it doesn't need to be redone on each recompilation.

3.4 Scope of Invariants

This section discusses the issue, when an constraint is required to be fulfilled. This is trivially for pre/post conditions, but for invariants it's not so easy.

[WK99] section 5.4.2 suggests to check invariants immediately after an object has changed^3.4. This is not workable, even if runtime efficiency is ignored. Modifications on the model often produce intermediate states, which are not consistent according to the constraints.

When using databases the answer is simple: invariants must be valid outside of transactions. Since the java system does not provide transactions there are several strategies offered for various user requirements.

Invariants may be required to be fulfilled on:

All methods. This may be too strict, since private methods may intentionally leave an object in an inconsistent state.
Public methods (or any other access modifier). This may be not strict enough.
Tagged methods. A special tag in the javadoc comment declares, that a method promises to leave the system in a consistent state. This tag is then part of the interface contract. This is the best solution, but requires additional effort spent by the developer.
Explicit request. This is the way of choice, if the model is held in a database backend. Then the checking of invariants is simply done immediately before committing.

These strategies may be used in conjunction. Except of All Methods together with Public Methods and/or Tagged Methods all other combinations make sense for special user requirements.

There won't be a single solution for this problem. Many applications will require their own individual scope of invariants. The scope may even differ between several classes of invariants.

The current implementation supports scopes needed during the ongoing diploma thesis only. Up to now, all invariants share the same scope. Also, tagged methods are not yet supported.

3.5 Caching Results of Invariants

The previous section discussed, when we have to make sure, that all invariants are fulfilled. But even then it's not absolutely necessary to evaluate all invariants. The implementation developed along with this paper checks only those invariants, whose result may possibly have changed by recent changes of the model.

3.5.1 Design

Caching is realized with an observer design. Each invariant determines all object attributes it depends on and registers to these attributes as observer. Figure shows the meta model of the principal design.

**Figure:** Design for Observing Invariants

The classes in the UML chart have the following meaning:

Class		Example
Class	An arbitrary class of	`class Person`
	the user model
Invariant	An invariant in the	`context Person`
	context of a class	`inv: age>=0`
Object	An instance of a class	Person Joe
Invariant-	An invariant in the	Has Joe
Instance	context of an object.	a positive age?
Feature	A feature (attribute or	Joe's age
	query method) of an object.

For now, lets think of features as attributes only. How to deal with query methods is explained below.

The cycle of checking invariants contains two stages.

Evaluating invariants. When evaluating an invariant instance, this invariant instance registers to all object attributes used during evaluation as observer. This means, the attribute promises to notify the invariant instance, when the attributes value changes.
Running the model. When an attribute changed its value during execution of user code, it notifies all observing invariant instances. Then, the attribute unregisters all observers, so they must register again on the next evaluation stage. See section how changed attributes are detected.

This design can be extended to query^3.5 methods. If the query does not have parameters, it's exactly like attributes. Things get a bit more complex, if the queries are parameterized. See figure

**Figure:** Design for Observing Methods

The point is, that not methods but method invocations are features observed by invariants. A method invocation is a method together with a parameter sequence suitable to invoke this method.

Up to now, the implementation observes attributes only.

3.5.2 Implementation

Not all of these classes exist explicitly in the implementation. Class and Object are provided by the user model already. Invariant exists only as an additional method checkOclInvariant_<name> of its context class. InvariantInstance is an explicit class^3.6 of the instrumentation runtime library. Finally Feature is provided by the user model, but cannot be referred to as a single java object. (java.lang.reflect.Field is a field of a class, not of an object.) Whenever a feature has to be referred to, it is represented by it's observer collection object, which is sufficient for the needs of this implementation.

Changes of object features are detected with polling. For each feature a backup attribute is added to the class.

class Person
{
int age;
int age_oclbackup=age;
}

Additionally there is a utility method added comparing each attribute to it's backup. If there is a difference, the observers of the attribute are notified.

private void checkForChangedFeatures()
{
  if(age!=age_oclbackup)
  {
    age_oclbackup=age;
    // notify observers of age
  }
  // ... further attributes
}

This method is called immediately before and after each method of the class. If the attribute contains an object reference, the comparison tests object identity, not object equality. This means, the != operator is used as for basic types and not the equals() method.

For collections the backup stores a hashcode of the collection to avoid the overhead of maintaining a complete backup collection. Since comparison to backups is done very often, hashcode computation is required to be lightweight. The following section discusses this in detail.

3.5.3 Detecting Collection Modification

Detecting changes of attributes is essential for caching of invariants as described above. For atomic attributes this is trivial: age!=age_backup does it all. For collection attributes it's not that easy. The backup of a collection shouldn't be a collection as well, since this would consume large amounts of memory. Also, comparison between two collection isn't that fast.

Thus, the backup for collection attributes stores a single integer value only. This value is computed from the collection. When the collection changes, also the value is expected to change. Such a value is commonly referred to as hash value.

However, these hash values are not required to be uniformly distributed. Also, the hash value is required to be constant for unmodified collections only, not generally for equal collections. This means: if an element is added to the collection and removed immediately afterwards, the collection is not required to have the same hash code as before. This relaxion of requirements will be used below. It has to be admitted, that the term ``hash code'' is used here mainly for historical reasons.

A first try to implement the hash functions follows the implementation of hashCode methods in the Java Collection API. These methods cannot be used directly, since they call method hashCode for each of their elements. This is not desired, since change detection covers object identity, not object value. To achieve the intended behavior, collection hash functions had to be rewritten with calls to System.identityHashCode. This has been implemented in class HashExact^3.7.

These hash functions are good at detecting changes. However, they are too slow. Each invocation involves an iteration over the whole collection. For object populations of a few hundert instances with many relations between them, this virtually causes the system to stop.

A quick but incomplete fix is achieved with the hash functions in HashSize. They simply return the size of the collection. This is very quick of course. But it does not detect changes which don't affect the collections size.

Finally, HashModCount performs a hash function which is a) nearly as fast as HashSize, but b) performs change detection even better than HashExact. It uses the fact, that standard collections already provide a change detection mechanism for implementing fail-fast iterators. Each collection maintains a modification counter, which is incremented whenever the collection is modified. Iterators create a copy of this counter on creation, and check this copy on every access. Unfortunately, the counter isn't publicly available. Thus, HashModCount has to access the private counter attribute via reflection. This is highly dependent on the internal details of collection classes. The implementation has been tested successfully on JDK 1.2.2. With other versions better do not expect it to work.

There is a work-around for this. The collection backup could be an iterator instead of an integer. To detect a modification, just access the iterator and wait for a ConcurrentModificationException. This has not been implemented yet, since it imposes several problems. Method hasNext() does not check for modifications. Thus, method next() has to be invoked. But this may also fail due to NoSuchElementException if there are no elements left. Thus, the iterator used for backup would have to be recreated, whenever there is no element left. Empty collections would require a special treatment.

Using the fail-fast mechanism obviously does not work for arrays. However, a fall back to HashExact or HashSize is easily provided. This could even be decided dynamicely depending on the size of the arrays.

The table below summarizes the different detection mechanisms:

	Exact	Size	ModCount	Iterator
Modification	nearly perfect	insertion/	perfect	perfect
Detection		deletion only
Runtime	linear	constant/	constant/	constant/
Complexity		very low	low	intermediate
Implemention	yes	yes	yes	-
Works with
Arrays	yes	yes	fall back	fall back
			to other	to other
Collections without	yes	yes	runtime	silent
Fail-Fast Iterators			error	failure
Non-JDK	yes	yes	runtime	yes
Collections with			error
Fail-Fast Iterators
JDK Standard	yes	yes	yes	yes
Collections

Finally there is the question, which to choose. Probably one should start with Exact (the default). If this works, everything is fine. If the application slows down more, than one is willing to accept, try ModCount (invoked with option -modcount-hash). If this works, it's fine. Otherwise it will fail-fast throwing a RuntimeException. Then one may try Size (-simple-hash). Hopefully it doesn't miss too many modifications.

If this is not acceptable, one may implement the Iterator method. It is important, that all collections used in the application provide fail-fast iterators (JDK standard collections do). Otherwise modifications may be missed silently.

Next: 4. Model Information Up: Diploma Thesis: Utility Support Previous: 2. Related Work Contents

Ralf Wiebicke 2005-11-25