FFIGEN User's Manual


(Preliminary)
Lars Thomas Hansen
lth@cs.uoregon.edu
February 6, 1996

1. Introduction

FFIGEN is a program system which facilitates the writing of translators from C header files to foreign function interfaces for particular programming language implementations. This document describes its structure and use. The discussion is aimed at translator writers; everyone else should confine themselves to section 3. A companion document, FFIGEN Manifesto and Overview, motivates the work, and other companion documents describe specific translator implementations. In particular, the document FFIGEN Back-end for Chez Scheme Version 5 describes one translator in detail.

FFIGEN is based on the lcc C compiler, which is copyrighted software. See Section 10 for a full copyright notice.

2. Writing Translators

To generate a translation of a header file you run the ffigen command to generate an intermediate form of the C header files you want to translate, and then run the back-end on the resulting files to generate the foreign function interface for the library.

Your task, should you choose to accept it, is to implement the target-specific parts of the back-end for your particular target (which is to say, combination of host language implementation, operating system, architecture, foreign language implementation, and translation policy). You should be able to use the FFIGEN front-end and the target-independent parts of the back-end pretty much as they are.

How to implement the target-specific parts of the back-end is discussed in Section 6. Use of the front end is described in Section 2. The intermediate format is described in Section 4, and the target-independent parts of the back-end and their interface to the target-dependent part are described in Section 5. Finally, Section 7 covers some issues which need to be tackled in the future.

3. Running FFIGEN

The command ffigen is run on a set of header files with preprocessor option and include file options. Arguments are processed in order. For each header file (type .h) and all the files it includes, a single preprocessor file (type .ffi) is produced.

The options are:

-Dname[=value]
Define preprocessor macro.
-Uname
Undefine preprocessor macro.
-Idirectory
Add directory to the beginning of the list of include files. Standard directories include the lcc include directory, /usr/include, and the current directory (in that order). See the release notes for information about how to change the defaults.
ffigen performs full syntax and type checks on its input.

The back-end is run by starting your favorite Scheme system and then loading first the target-independent file process.sch and second the target-dependent part of the translator; in the case of the Chez Scheme back-end the file is called chez.sch. You then call the procedure process with the name of the .ffi file to process, as discussed in section 5.

4. Intermediate Format

The intermediate format consists of s-expressions following this grammar:

  <file>      -> <record> ...
  <record>    -> (function <filename> <name> <type> <attrs>)
               | (var <filename> <name> <type> <attrs>)
               | (type <filename> <name> <type>)
               | (struct <filename> <name> ((<name> <type>) ...))
               | (union <filename> <name> ((<name> <type>) ...))
               | (enum <filename> <name> ((<name> <value>) ...))
               | (enum-ident <filename> <name> <value>)
               | (macro <filename> <name+args> <body>)
  <type>      -> (<primitive> <attrs>)
               | (struct-ref <tag>)
               | (union-ref <tag>)
               | (enum-ref <tag>)
               | (function (<type> ...) <type>)
               | (pointer <type>)
               | (array <value> <type>)
  <attrs>     -> (<attr> ...)
  <attr>      -> static | extern | const | volatile
  <primitive> -> char | signed-char | unsigned-char | short
               | unsigned-short | int | unsigned | long
               | unsigned-long | float | double | void
  <value>     -> <integer>
  <filename>  -> <string>
  <name>      -> <string>
  <body>      -> <string>
  <name+args> -> <string>
  <tag>       -> <string>
Notes relating to the grammar:

5. The Target-Independent Back-End

The target-independent back-end is a Scheme program called process which reads the intermediate form into memory and performs some initial processing. It exports some global variables and a number of procedures which are used to access the structures in the database of intermediate records, and imports two target-dependent functions from the target-dependent back-end. This section describes the interfaces.

The global variables which hold the database are:

    (define functions '())      ; list of function records
    (define vars '())           ; list of var records
    (define types '())          ; list of type records
    (define structs '())        ; list of struct records
    (define unions '())         ; list of union records
    (define macros '())         ; list of macro records
    (define enums '())          ; list of enum records
    (define enum-idents '())    ; list of enum-ident records
Each of these contains a list of all the records of the type indicated by their names. Note that records may look different internally than in the defined intermediate form, so accessor functions (see below) should always be used.

In addition, there are two globals which are set but not used by the target-independent back-end:

    (define source-file #f)     ; name of the input file itself
    (define filenames '())      ; names of all files in the input

The main entry point to the back end is the procedure process, which takes a single file name as an argument. Process initializes globals, reads the file, and processes the records.

    (define (process filename) ...)

Record processing consists of some general analysis and target-specific code generation. First, the target-specific procedure select-functions is called; it must set or reset the "referenced" bit in each record depending on whether the function is interesting to the back-end or not. After computing reachability of structured types and setting the referenced bits of those types which are reachable, a translation is generated by a call to the back-end function generate-translation, which takes no arguments.

    (define (select-functions) ...)
    (define (generate-translation) ...)

A number of data structure accessors and mutators are also available. These are generic procedures which work on all of the record types.

    (define (file r) ...)          ; file name of record
    (define (name r) ...)          ; name in records which have one
    (define (type r) ...)          ; type in records which have one
    (define (attrs r) ...)         ; attrs in records which have one
    (define (fields r) ...)        ; fields in struct/union record
    (define (value r) ...)         ; value of enum-ident record
    (define (tag r) ...)           ; tag in struct/union/union/-ref record

    (define (referenced? r) ...)   ; is record referenced?
    (define (referenced! r) ...)   ; set referenced bit
    (define (unreferenced! r) ...) ; reset referenced bit
Arguably the tag accessor should go away and name should simply be used in its place. As it is, name is not defined on struct-ref, union-ref, and enum-ref records.

The procedure record-tag returns the tag of the record currently being held. It can also be applied to types.

    (define (record-tag r) ...)    ; get record tag

All records can have back-end specific values attached to them; usually these are cached names for operations on structured values, so for now the procedures which manipulate the back-end specific data are called cache-name to remember a value and cached-names to return the list of remembered values:

    (define (cache-name r v) ...)  ; remember value in record
    (define (cached-names r) ...)  ; retrieve remembered values
We should probably replace this with a more general property-list-like mechanism.

In addition, two procedures extract parts of function types:

    (define (arglist r) ...)       ; function argument types
    (define (rett r) ...)          ; function return type

Some utilities to deal with file names are also provided:

    (define (strip-extension fn) ...)
    (define (strip-path fn) ...)
    (define (get-path fn) ...)

A string macro expander makes it easier to generate C code, for the back ends that need it. The macro expander is called instantiate and is called with a string template and a vector of arguments (which are also strings). The template contains patterns of the form @n where n is a single digit; when such a pattern is seen it is replaced with the corresponding value from the argument vector.

    (define (instantiate template arguments) ...)

Two procedures, struct-names and union-names, take a structure (or union) and returns a list of all the typedef names which reference the structure directly.

    (define (struct-names struct) ...)
    (define (union-names union) ...)

An association function which searches one of the record lists for a given record by the name field is also available:

    (define (lookup key items) ...)

The procedure user-defined-tag? determines whether a tag was defined by the user or generated by the system:

    (define (user-defined-tag? x) ...)

The procedure warn takes some arbitrary arguments and generates a warning message on standard output:

    (define (warn msg . rest) ...)

Some standard predicates take a type and test its kind: primitive-type? is true if the argument is of a primitive type as outlined in the grammar above; basic-type? is true if the argument is a primitive type or a pointer type; array-type? is true if the argument is an array type, and finally, structured-type? is true if the argument is a struct-ref or union-ref type:

    (define (primitive-type? t) ...)
    (define (basic-type? t) ...)
    (define (array-type? t) ...)
    (define (structured-type? t) ...)

6. Writing a Target-Dependent Back-End

To write the target-dependent back-end, you must decide on the policy for the translation and then implement the translation. The policy covers such issues as: which constructs in C are or are not handled; the translation for each handled construct; how non-handled constructs are dealt with (ignored, detected with warnings, detected with errors); how to deal with exceptional cases (consider the fgets example from the Manifesto).

For a concrete example, see the companion document FFIGEN Backend for Chez Scheme Version 5, which addresses many of the choices to be made and their possible solutions.

7. Future Work

A number of features will be supported in the future:

A number of features will most likely be supported, but need to be investigated:

In addition, there are some issues to investigate in a larger perspective:

8. Please Contribute!

My goal is to support as many target languages as is reasonable, but I can't write all the translators myself (I lack the time and, in many cases, the knowledge). Targets that I will take care of include STk, and, if no-one beats me to it, Scsh, both Scheme systems. Someone has already volunteered to write the ILU back-end. Others are interested in back-ends for Modula-3 and Mercury.

Volunteers for any translator back-end are welcome to e-mail me and volunteer their help. I will coach, coordinate, and help out as much as possible.

9. Credits

FFIGEN is based on the freely available lcc ANSI C compiler, implemented by Christopher Fraser (of AT&T Bell Labs) and David Hanson (of Princeton University).

I would like to thank Fraser and Hanson for producing such an excellent system; lcc has been a joy to work with, and their book, A Retargetable C Compiler: Design and Implementation, made the implementation of the FFIGEN front end in the matter of roughly a single work day possible. Would it be that all software was this clean!

The development of FFIGEN was supported by ARPA under U.S. Army grant No. DABT63-94-C-0029, ``Programming Environments, Compiler Technology and Runtime Systems for Object Oriented Parallel Processing''.

10. Copyrights

lcc is covered by the following Copyright notice:

The authors of this software are Christopher W. Fraser and David R. Hanson.

Copyright (c) 1991,1992,1993,1994,1995 by AT&T, Christopher W. Fraser, and David R. Hanson. All Rights Reserved.

Permission to use, copy, modify, and distribute this software for any purpose, subject to the provisions described below, without fee is hereby granted, provided that this entire notice is included in all copies of any software that is or includes a copy or modification of this software and in all copies of the supporting documentation for such software.

THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN PARTICULAR, NEITHER THE AUTHORS NOR AT&T MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.

lcc is not public-domain software, shareware, and it is not protected by a `copyleft' agreement, like the code from the Free Software Foundation.

lcc is available free for your personal research and instructional use under the `fair use' provisions of the copyright law. You may, however, redistribute the lcc in whole or in part provided you acknowledge its source and include this COPYRIGHT file.

You may not sell lcc or any product derived from it in which it is a significant part of the value of the product. Using the lcc front end to build a C syntax checker is an example of this kind of product.

You may use parts of lcc in products as long as you charge for only those components that are entirely your own and you acknowledge the use of lcc clearly in all product documentation and distribution media. You must state clearly that your product uses or is based on parts of lcc and that lcc is available free of charge. You must also request that bug reports on your product be reported to you. Using the lcc front end to build a C compiler for the Motorola 88000 chip and charging for and distributing only the 88000 code generator is an example of this kind of product.

Using parts of lcc in other products is more problematic. For example, using parts of lcc in a C++ compiler could save substantial time and effort and therefore contribute significantly to the profitability of the product. This kind of use, or any use where others stand to make a profit from what is primarily our work, is subject to negotiation.

Chris Fraser / cwf@research.att.com
David Hanson / drh@cs.princeton.edu
Fri Jun 17 11:57:07 EDT 1994


lth@acm.org
24 May 2000