## 4.2 Scalar Numeric Types

There are four scalar numeric data types defined in the Ptolemy kernel: complex, fixed-point, double precision floating-point, and integer. All of these four types can be read from and written to portholes as described in "Reading inputs and writing outputs" on page 2-17. The floating-point and integer data types are based on the standard C++ `double` and `int` types, and need no further explanation. To support the other two types, the Ptolemy kernel contains a `Complex` class and a `Fix` class, which are described in the rest of this section.

### 4.2.1 The Complex data type

The `Complex` data type in Ptolemy contains real and imaginary components, each of which is specified as a double precision floating-point number. The notation used to represent a complex number is a two number pair: (real, imaginary)-for example, (1.3,-4.5) corresponds to the complex number 1.3 - 4.5j. `Complex` implements a subset of the functionality of the complex number classes in the cfront and libg++ libraries, including most of the standard arithmetic operators and a few transcendental functions.

#### Constructors:

`Complex() `

Create a complex number initialized to zero-that is, (0.0, 0.0). For example,
`Complex C;`

`Complex(double real, double imag) `

Create a complex number whose value is (real, imag). For example,
`Complex C(1.3,-4.5);`

`Complex(const Complex& arg) `

Create a complex number with the same value as the argument (the copy constructor). For example,
`Complex A(complexSourceNumber);`

#### Basic operators:

The following list of arithmetic operators modify the value of the complex number. All functions return a reference to the modified complex number (`*this`).

`Complex& operator = (const Complex& arg) `

`Complex& operator += (const Complex& arg) `

`Complex& operator -= (const Complex& arg) `

`Complex& operator *= (const Complex& arg) `

`Complex& operator /= (const Complex& arg) `

`Complex& operator *= (double arg) `

`Complex& operator /= (double arg) `

There are two operators to return the real and imaginary parts of the complex number:

`double real() const `

`double imag() const `

#### Non-member functions and operators:

The following one- and two-argument operators return a new complex number:

`Complex operator + (const Complex& x, const Complex& y) `

`Complex operator - (const Complex& x, const Complex& y) `

`Complex operator * (const Complex& x, const Complex& y) `

`Complex operator * (double x, const Complex& y) `

`Complex operator * (const Complex& x, double y) `

`Complex operator / (const Complex& x, const Complex& y) `

`Complex operator / (const Complex& x, double y) `

`Complex operator - (const Complex& x) `

Return the negative of the complex number.

`Complex conj (const Complex& x) `

Return the complex conjugate of the number.

`Complex sin(const Complex& x) `

`Complex cos(const Complex& x) `

`Complex exp(const Complex& x) `

`Complex log(const Complex& x) `

`Complex sqrt(const Complex& x) `

`Complex pow(double base, const Complex& expon) `

`Complex pow(const Complex& base, const Complex& expon) `

Other general operators:

`double abs(const Complex& x) `

Return the absolute value, defined to be the square root of the norm.

`double arg(const Complex& x) `

Return the value arctan(x.imag()/x.real()).

`double norm(const Complex& x) `

Return the value x.real() * x.real() + x.imag() * x.imag().

`double real(const Complex& x) `

Return the real part of the complex number.

`double imag(const Complex& x) `

Return the imaginary part of the complex number.

Comparison Operators:

`int operator != (const Complex& x, const Complex& y) `

`int operator == (const Complex& x, const Complex& y) `

### 4.2.2 The fixed-point data type

The fixed-point data type is implemented in Ptolemy by the ``` Fix``` class. This class supports a two's complement representation of a finite precision number. In fixed-point notation, the partition between the integer part and the fractional part-the binary point-lies at a fixed position in the bit pattern. Its position represents a trade-off between precision and range. If the binary point lies to the right of all bits, then there is no fractional part.

#### Constructing Fixed-point variables

Variables of type `Fix` are defined by specifying the word length and the position of the binary point. At the user-interface level, precision is specified either by setting a fixed-point parameter to a "(value, precision)" pair, or by setting a ``` precision``` parameter. The former gives the value and precision of some fixed-point value, while the latter is typically used to specify the internal precision of computations in a star.

Fix x = Fix(((const char *) precision));
if (x.invalid())
Error::abortRun(*this, "Invalid precision");
The "precision" parameter is cast to a string and passed as a constructor argument to the `Fix` class. The error check verifies that the precision was valid.

a. whether rounding or truncation should take place when other `Fix` values are assigned to it-truncation is the default
b. the response to an overflow or underflow on assignment-the default is saturation (see page 4-6).

#### Warning

The `Fix` type is still experimental.

#### Fixed-point states

State variables can be declared as `Fix` or `FixArray`. The precision is specified by an associated precision state using either of two syntaxes:

#### Fixed-point inputs and outputs

`Fix` types are available in Ptolemy as a type of `Particle`. The conversion from an `int` or a `double` to a `Fix` takes place using the `Fix::Fix(double)` constructor which makes a `Fix` object with the default word length of 24 bits and the number of integer bits as needed required by the value. For instance, the `double` 10.3 will be converted to a `Fix` with precision 5.19, since 5 is the minimum number of bits needed to represent the integer part, 10, including its sign bit.

defstar {
domain {SDF}
derivedFrom{ SDFFix }
input {
name { input1 }
type { fix }
}
input {
name { input2 }
type { fix }
}
output {
name { output }
type { fix }
}
defstate {
name { OutputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the output in bits and precision of the accumulation.
When the value of the accumulation extends outside of the precision,
the OverflowHandler will be called.
}
}
(Note that the real `AddFix` star supports any number of inputs.) By default, the precision used by this star during the addition will have 2 bits to the left of the binary point and 14 bits to the right. Not shown here is the state OverflowHandler, which is inherited from the SDFFix star and which defaults to `saturate`-that is, if the addition overflows, then the result saturates, pegging it to either the largest positive or negative number representable. The result value, sum, is initialized by the following code:

protected {
Fix sum;
}
begin {
SDFFix::begin();

sum = Fix( ((const char *) OutputPrecision) );
if ( sum.invalid() )
Error::abortRun(*this, "Invalid OutputPrecision");
sum.set_ovflow( ((const char*) OverflowHandler) );
if ( sum.invalid() )
Error::abortRun(*this, "Invalid OverflowHandler");
}
The begin method checks the specified precision and overflow handler for correctness. Then, in the `go` method, we use sum to calculate the result value, thus guaranteeing that the desired precision and overflow handling are enforced. For example,

go {
sum.setToZero();
sum += Fix(input1%0);
checkOverflow(sum);
sum += Fix(input2%0);
checkOverflow(sum);
output%0 << sum;
}
(The checkOverflow method is inherited from SDFFix.) The protected member sum is an uninitialized `Fix` object until the begin method runs. In the begin method, it is given the precision specified by OutputPrecision. The go method initializes it to zero. If the go method had instead assigned it a value specified by another `Fix` object, then it would acquire the precision of that other object-at that point, it would be initialized.

#### Assignment and overflow handling

Once a `Fix` object has been initialized, its precision does not change as long as the object exists. The assignment operator is overloaded so that it checks whether the value of the object to the right of the assignment fits into the precision of the left object. If not, then it takes the appropriate overflow response is taken and set the overflow error bit.

#### Explicitly casting inputs

In the above example, the first line of the `go` method assigned the input to the protected member `sum`, which has the side-effect of quantizing the input to the precision of `sum`. We could have alternatively written the `go` method as follows:

go {
sum = Fix(input1%0) + Fix(input2%0);
output%0 << sum;
}
The behavior here is significantly different: the inputs are added using their own native precision, and only the result is quantized to the precision of `sum`.

defstar {
name { GainFix }
domain { SDF }
derivedFrom { SDFFix }
desc {
This is an amplifier; the fixed-point output is the fixed-point input
multiplied by the "gain" (default 1.0). The precision of "gain", the
input, and the output can be specified in bits.
}
input {
name { input }
type { fix }
}
output {
name { output }
type { fix }
}
defstate {
name { gain }
type { fix }
default { 1.0 }
desc { Gain of the star. }
}
defstate {
name { ArrivingPrecision }
type {int}
default {"YES"}
desc {
Flag indicating whether or no to use the arriving particles as they
are: YES keeps the same precision, and NO casts them to the precision
specified by the parameter "InputPrecision". }
}
defstate {
name { InputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the input in bits. The input particles are only cast
to this precision if the parameter "ArrivingPrecision" is set to NO.
}
}
defstate {
name { OutputPrecision }
type { precision }
default { 2.14 }
desc {
Precision of the output in bits.
This is the precision that will hold the result of the arithmetic
operation on the inputs.
When the value of the product extends outside of the precision,
the OverflowHandler will be called.
}
protected {
Fix fixIn, out;
}
begin {
SDFFix::begin();

if ( ! int(ArrivingPrecision) ) {
fixIn = Fix( ((const char *) InputPrecision) );
if(fixIn.invalid())
Error::abortRun( *this, "Invalid InputPrecision" );
}

out = Fix( ((const char *) OutputPrecision) );
if ( out.invalid() )
Error::abortRun( *this, "Invalid OutputPrecision" );
out.set_ovflow( ((const char *) OverflowHandler) );
if(out.invalid())
Error::abortRun( *this,"Invalid OverflowHandler" );
}
go {
// all computations should be performed with out since
// that is the Fix variable with the desired overflow
// handler
out = Fix(gain);
if ( int(ArrivingPrecision) ) {
out *= Fix(input%0);
}
else {
fixIn = Fix(input%0);
out *= fixIn;
}
checkOverflow(out);
output%0 << out;
}
// a wrap-up method is inherited from SDFFix
// if you defined your own, you should call SDFFix::wrapup()
}
Note that the `SDFGainFix` star and many of the `Fix` stars are derived from the star `SDFFix`. `SDFFix` implements commonly used methods and defines two states: OverflowHandler selects one of four overflow handlers to be called each time an overflow occurs; and ReportOverflow, which, if true, causes the number and percentage of overflows that occurred for that star during a simulation run to be reported in the wrapup method.

#### Constructors:

`Fix() `
Create a `Fix` number with unspecified precision and value zero.

`Fix(int length, int intbits) `

Create a `Fix` number with total word length of `length` bits and `intbits` bits to the left of the binary point. The value is set to zero. If the precision parameters are not valid, then an error bit is internally set so that the `invalid` method will return `TRUE`.

`Fix(const char* precisionString) `

Create a `Fix` number whose precision is determined by `precisionString`, which has the syntax "leftbits.rightbits", where leftbits is the number of bits to the left of the binary point and rightbits is the number of bits to the right of the binary point, or "rightbits/totalbits", where totalbits is the total number of bits. The value is set to zero. If the `precisionString` is not in the proper format, an error bit is internally set so that the `invalid` method will return `TRUE`.

`Fix(double value) `

Create a `Fix` with the default precision of 24 total bits for the word length and set the number of integer bits to the minimum needed to represent the integer part of the number value. If the value given needs more than 24 bits to represent, the value will be clipped and the number stored will be the largest possible under the default precision (i.e. saturation occurs). In this case an internal error bit is set so that the `ovf_occurred` method will return `TRUE`.

`Fix(int length, int intbits, double value) `

Create a `Fix` with the specified precision and set its value to the given `value`. The number is rounded to the closest representable number given the precision. If the precision parameters are not valid, then an error bit is internally set so that the `invalid` method will return `TRUE`.

`Fix(const char* precisionString, double value) `

Same as the previous constructor except that the precision is specified by the given `precisionString` instead of as two integer arguments. If the precision parameters are not valid, then an error bit is internally set so that the `invalid()` method will return true when called on the object.

`Fix(const char* precisionString, uint16* bits) `

Create a `Fix` with the specified precision and set the bits precisely to the ones in the given `bits`. The first word pointed to by `bits` contains the most significant 16 bits of the representation. Only as many words as are necessary to fetch the bits will be referenced from the `bits` argument. For example: `Fix("2.14",bits)` will only reference `bits[0]`.

This constructor gets very close to the representation and is meant mainly for debugging. It may be removed in the future.

`Fix(const Fix& arg) `

Copy constructor. Produces an exact duplicate of `arg`.

`Fix(int length, int intbits, const Fix& arg) `

Read the value from the `Fix` argument and set to a new precision. If the precision parameters are not valid, then an error bit is internally set so that the `invalid` method will return true when called on the object. If the value from the source will not fit, an error bit is set so that the `ovf_occurred` method will return `TRUE`.

#### Functions to set or display information about the Fix number:

`int len() const `

Return the total word length of the Fix number.

`int intb() const `

Return the number of bits to the left of the binary point.

`int precision() const `

Return the number of bits to the right of the binary point.

`int overflow() const `

Return the code of the type of overflow response for the `Fix` number. The possible codes are:
`0` - `ovf_saturate`,
`1` - `ovf_zero_saturate`,
`2` - `ovf_wrapped`,
`3` - `ovf_warning`,
`4` - `ovf_n_types`.

`int roundMode() const `

Return the rounding mode: `1` for rounding, `0` for truncation.

`int signBit() const `

Return `TRUE` if the value of the `Fix` number is negative, `FALSE` if it is positive or zero.

`int is_zero() `

Return `TRUE` if the value of the `Fix` number is zero.

`double max() `

Return the maximum value representable using the current precision.

`double min() `

Return the minimum value representable using the current precision.

`double value() `

The value of the `Fix` number as a double.

`void setToZero() `

Set the value of the `Fix` number to zero.

`void set_overflow(int value) `

Set the overflow type.

`void set_rounding(int value) `

Set the rounding type: `TRUE` for rounding, `FALSE` for truncation.

`void initialize() `

Discard the current precision format and set the `Fix` number to zero.

There are a few functions for backward compatibility:

`void set_ovflow(const char*) `

Set the overflow using a name.

`void Set_MASK(int value) `

Set the rounding type. Same functionality as `set_rounding()`.

Comparison function:

`int compare (const Fix& a, const Fix& b) `

Compare two `Fix` numbers. Return -1 if a < b, 0 if a = b, 1 if a > b.

The following functions are for use with the error condition fields:

`int ovf_occurred() `

Return `TRUE` if an overflow has occurred as the result of some operation like addition or assignment.

`int invalid() `

Return `TRUE` if the current value of the `Fix` number is invalid due to it having an improper precision format, or if some operation caused a divide by zero.

`int dbz() `
Return `TRUE` if a divide by zero error occurred.

`void clear_errors() `

Reset all error bit fields to zero.

#### Operators:

`Fix& operator = (const Fix& arg) `

Assignment operator. If `*this` does not have its precision format set (i.e. it is uninitialized), the source `Fix` is copied. Otherwise, the source `Fix` value is converted to the existing precision. Either truncation or rounding takes place, based on the value of the rounding bit of the current object. Overflow results either in saturation, "zero saturation" (replacing the result with zero), or a warning error message, depending on the overflow field of the object. In these cases, `ovf_occurred` will return `TRUE` on the result.

`Fix& operator = (double arg) `

Assignment operator. The double value is first converted to a default precision `Fix` number and then assigned to `*this`.

The function of these arithmetic operators should be self-explanatory:

`Fix& operator += (const Fix&) `

`Fix& operator -= (const Fix&) `

`Fix& operator *= (const Fix&) `

`Fix& operator *= (int) `

`Fix& operator /= (const Fix&) `

`Fix operator + (const Fix&, const Fix&) `

`Fix operator - (const Fix&, const Fix&) `

`Fix operator * (const Fix&, const Fix&) `

`Fix operator * (const Fix&, int) `

`Fix operator * (int, const Fix&) `

`Fix operator / (const Fix&, const Fix&) `

`Fix operator - (const Fix&) // unary minus `

`int operator == (const Fix& a, const Fix& b) `

`int operator != (const Fix& a, const Fix& b) `

`int operator >= (const Fix& a, const Fix& b) `

`int operator <= (const Fix& a, const Fix& b) `

`int operator > (const Fix& a, const Fix& b) `

`int operator < (const Fix& a, const Fix& b) `

Note:

#### Conversions:

`operator int() const `

Return the value of the `Fix` number as an integer, truncating towards zero.

`operator float() const `

`operator double() const `

Convert to a float or a double, creating an exact result when possible.

`void complement() `

Replace the current value by its complement.

#### Fix overflow, rounding, and errors.

The `Fix` class defines the following enumerated values for overflow handling:

`Fix::ovf_saturate `

`Fix::ovf_zero_saturate `

`Fix::ovf_wrapped `

`Fix::ovf_warning `

They may be used as arguments to the `set_overflow` method, as in the following example:

out.set_overflow(Fix::ovf_saturate); The member function

int overflow() const; returns the overflow type. This returned result can be compared against the above enumerated values. Overflow types may also be specified as strings, using the method

void set_ovflow(const char* overflow_type); the `overflow_type` argument may be one of `saturate`, `zero_saturate`, `wrapped`, or `warning`.

void set_rounding(int value); If the argument is false, or has the value `Fix::mask_truncate`, truncation will occur. If the argument is nonzero (for example, if it has the value `Fix::mask_truncate_round`, rounding will occur. The older name `Set_MASK` is a synonym for `set_rounding`.

int ovf_occurred() const; int invalid() const; int dbz() const; The first function returns `TRUE` if there have been any overflows in computing the value. The second returns `TRUE` if the value is invalid, because of invalid precision parameters or a divide by zero. The third returns `TRUE` only for divide by zero.