Chapter 4 Type System

Table of Contents
Overview
Intrinsic Types (Future Direction)
Type Aliases (Future Direction)
Backend Preludes (Future Direction)
Literal Type Inference
References
Parameter Passing
Constructors
Closures (Lambdas)
Error Handling
Union Types
Generics
Lifetimes and Memory Regions

Overview

Scaly currently recognizes a fixed set of primitive types — integers, floats, booleans, and pointer types — which the LLVM-based emitter maps to the corresponding LLVM types. A possible future direction, described in the following sections, is to make the type system fully backend-agnostic, with even the primitive types supplied by a backend-specific prelude rather than built into the compiler.

Intrinsic Types (Future Direction)

Note: The mechanism described in this section and the two that follow (Type Aliases and Backend Preludes) is a planned design, not current behavior. Today the compiler knows its primitive types directly and targets LLVM only.

An intrinsic type is a type whose implementation is provided by the backend, not by Scaly code. Under this design, intrinsic types would be declared using the intrinsic keyword:


define i32 intrinsic
define i64 intrinsic
define f64 intrinsic
    

The compiler would not interpret these definitions — it would simply record that these types exist and are intrinsic. The backend (Emitter) would be responsible for mapping intrinsic types to the target platform's representation.

This design would enable Scaly to target any backend:

Type Aliases (Future Direction)

Type aliases provide human-friendly names for intrinsic or compound types:


define bool i1
define char i32       ; Unicode scalar value
define int i64        ; Platform word size
define size_t u64
    

The alias and its target are interchangeable — bool and i1 refer to the same type.

Backend Preludes (Future Direction)

Each backend provides a prelude file that defines the intrinsic types and their aliases for that target. The prelude is implicitly loaded before any user code.

Example 4-1. LLVM Prelude (excerpt)


; Intrinsic types (LLVM native)
define i1 intrinsic
define i8 intrinsic
define i16 intrinsic
define i32 intrinsic
define i64 intrinsic
define f32 intrinsic
define f64 intrinsic
define ptr intrinsic

; Human-friendly aliases
define bool i1
define char i32
define int i64
define size_t u64
      

Example 4-2. JavaScript Prelude (excerpt)


; Intrinsic types (JavaScript native)
define number intrinsic
define string intrinsic
define boolean intrinsic

; Aliases for compatibility
define int number
define float number
define bool boolean
      

Literal Type Inference

Numeric and other literals do not have an inherent type. Their type is inferred from context:


function double(x: i32) returns i32 { x * 2 }

double(42)           ; 42 inferred as i32 from parameter type

let y: i64 100       ; 100 inferred as i64 from annotation
let z y + 50         ; 50 inferred as i64 to match y
    

If the type cannot be inferred, the compiler requires an explicit annotation:


let x 42             ; ERROR: cannot infer type for integer literal
let x: i32 42        ; OK: type explicitly annotated
    

Scaly does not support type suffixes on literals (such as 42i32). This keeps literals clean and encourages explicit type annotations where they matter.

References

By default, values flow through functions by value: a function parameter gets read-only access to its argument, and a function returns a value.

To refer to an existing value without copying it, Scaly provides the reference type ref[T] — a non-owning, borrowed reference to a T. References are used for parameters and for nullable fields, written ref[T]?. Like pointer[T], a reference is represented as a machine pointer, but it carries borrow semantics rather than the unchecked, low-level meaning of a raw pointer.

Low-Level Pointers

For low-level implementation of data structures like Page, HashMap, or List, the type pointer[T] is available. This is an escape hatch for implementors, not for everyday code.


; Low-level list node (internal implementation)
define Node[T] {
    data: T
    next: pointer[Node[T]]    ; raw pointer for linked structure
}
      

Nullable Values

For nullable values, use Option[T] — a proper sum type with Some(value) and None variants:


function find(list: List[T], predicate: function(T) returns bool) returns Option[T] {
    ; returns Some(item) if found, None otherwise
}

let result find(items, \x: x > 10)
choose result
    when value: Some: process(value)
    else handle_not_found()
      

The compiler optimizes Option[T] to a simple nullable pointer — no space overhead for the tag.

Parameter Passing

All function parameters are borrowed — functions receive read-only access to their arguments. The caller retains ownership.


define Point { x: i32, y: i32 }

function distance(a: Point, b: Point) returns f64 {
    ; a and b are read-only views
    ; cannot modify them
    ...
}

let origin Point(0, 0)
let target Point(3, 4)
distance(origin, target)    ; origin and target unchanged
    

Implementation

The implementation of parameter passing depends on the execution context, but the semantics remain identical:

  • Same thread/stack: A pointer is passed. No copying occurs. The function reads through the pointer.

  • Different thread/GPU/remote: The entire data tree is copied to the target execution context. The function still has read-only access — same semantics, different mechanism.

This design means code doesn't change based on where it executes. A function that works locally works identically when distributed.

Mutation via Procedures

To modify data, use a procedure instead of a function. Procedures can declare parameters as mutable:


procedure move(p: mutable Point, dx: i32, dy: i32) {
    set p.x: p.x + dx
    set p.y: p.y + dy
}

var position Point(0, 0)
move(position, 5, 3)    ; position is now (5, 3)
      

The distinction between functions (pure, read-only) and procedures (may mutate) is explicit in the code. Readers immediately know which calls might have side effects.

Constructors

Constructors create instances of types. Scaly provides both implicit and explicit constructors.

Implicit Constructors

If all members of a type are public, an implicit constructor is generated that takes all fields as parameters in declaration order:


define Point { x: i32, y: i32 }

let p Point(10, 20)    ; implicit constructor
      

Explicit Constructors

Use init for explicit constructors when you need:

  • Private members (implicit constructor unavailable)

  • Default values for some fields

  • Different construction signatures


define Point
(
    x: i32
    y: i32
)
{
    init(value: i32) {      ; convenience constructor
        set this.x: value
        set this.y: value
    }
}

let p1 Point(10, 20)    ; first init
let p2 Point(5)         ; second init - Point(5, 5)
      

The this Prefix

The this. prefix is optional when unambiguous, but recommended for clarity:


init(x: i32, y: i32) {
    set this.x: x    ; clear: field x gets parameter x
    set this.y: y
}
      

When parameter names shadow field names, this. is required to disambiguate. Avoid set x: x — it's confusing even if technically resolvable.

Complete Initialization

All fields must be initialized by any constructor — implicit or explicit. The compiler enforces this. There are no implicit default values (no automatic 0, false, or null):


define Point { x: i32, y: i32 }

init(x: i32) {
    set this.x: x
    ; ERROR: field 'y' not initialized
}
      

Constructor Return

init implicitly returns this, enabling direct binding:


let p Point(10, 20)    ; init returns the new Point
      

Constructed Value Lifetime

The lifetime of a constructed value is inferred from context:

  • In a block: Local lifetime (current block)

  • Last statement / return position: Call lifetime (return page)

  • Explicit annotation: As specified


function example() returns Point {
    let temp Point(1, 2)     ; local - dies at block end
    return Point(3, 4)       ; call - inferred from return position
}
      

Page-Parameterized Constructors (init#)

Some types are value types (can live on the stack) but need to allocate internal data on a page. The init# syntax supports this pattern:


define String(data: pointer[char], length: size_t)
{
    ; init# takes an implicit page parameter as first argument
    init#(page, text: pointer[const_char])
    {
        let len strlen(text)
        set data: page.allocate(len + 1, 1) as pointer[char]
        memcpy(data, text, len)
        set length: len
    }
}
      

At call sites, the lifetime modifier determines which page is passed:

  • String#("hello") — passes caller's page (rp)

  • String$("world") — passes local page

  • String^hashMap("key") — passes the named page

The init# pattern is distinct from regular heap allocation. The struct itself can live on the stack — only its internal data needs a page. This is ideal for types like String, Array, or other containers where the wrapper is small but the content may be large.

Note: init$, init^, and init! are reserved for future use and will produce an error.

Closures (Lambdas)

Closures are anonymous functions defined with backslash syntax:


\x: x * 2                    ; single parameter
\x y: x + y                  ; multiple parameters
\: 42                        ; no parameters
    

Capturing Variables

Closures can capture variables from their enclosing scope. Captured variables are treated as implicit borrowed parameters — read-only access, same as explicit function parameters:


let multiplier 10
let scale \x: x * multiplier    ; captures multiplier (read-only)

scale(5)    ; returns 50
      

Closures Are Pure

Closures cannot mutate captured variables. They are pure like functions, not imperative like procedures:


var count 0
let bad \: { set count: count + 1 }   ; ERROR: cannot mutate capture
      

Closure Lifetime

Since captures are borrowed, a closure cannot outlive its captured variables. The compiler enforces this through lifetime checking:


function makeCounter() returns (function() returns i32) {
    var count 0
    return \: count    ; ERROR: closure outlives captured 'count'
}
      

Implementation

Each closure is an anonymous struct containing its captures, with a call method implementing the body. Closures are monomorphized like other generic types.

Error Handling

Scaly uses explicit error handling. There are no exceptions or stack unwinding — errors are values returned from functions or procedures.

The throws Clause

Functions or procedures that can fail declare their error type with throws:


procedure parse(input: String) returns AST throws ParseError {
    if invalid(input) {
        throw InvalidSyntax(position, "expected expression")
    }
    ...
}
      

Under the hood, this is equivalent to returning Result[AST, ParseError], but with dedicated syntax for clarity.

Single Error Type

A function can only throw one error type. To represent multiple error kinds, use a union:


define FileError union
(
    NotFound: String
    PermissionDenied: String
    IoError: String
)

function readFile(path: String) returns String throws FileError {
    ...
}
      

This ensures a clear error signature and enables the try/when pattern for handling specific variants.

The try/when Pattern

Handle errors with try and when clauses:


try let ast parse(input)
    when e: InvalidSyntax: reportError(e)
    when e: UnexpectedEof: reportError(e)
      

If not all error variants are covered, an else clause is required:


try let ast parse(input)
    when e: InvalidSyntax: reportError(e)
    else panic("unhandled error")
      

Error Propagation

Use else throw to re-throw errors to the caller:


function process(input: String) returns Data throws ParseError {
    try let ast parse(input)
        else throw    ; re-throws ParseError to caller

    transform(ast)
}
      

When the error type matches exactly, the error propagates automatically. This simplifies deeply nested code like parsers and visitors.

Error Lifetime

Thrown values must have thrown lifetime (!). The compiler infers this from throw position. If inference fails, annotate explicitly:


throw ParseError.InvalidSyntax(pos, msg)!   ; explicit thrown lifetime

The caller provides the exception region where the error will be stored.

Union Types

A union type (also called sum type or tagged union) can hold one of several variants. Each variant optionally carries a payload of a single type (use a struct for several fields):


define Rectangle ( width: f64, height: f64 )

define Shape union
(
    Circle: f64
    Box: Rectangle
    Empty
)

let s Shape.Circle(5.0)
    

Memory Layout

A union is stored as a tag plus the largest variant:


Shape = { tag: u8, data: [size of largest variant] }
      

Unions can contain other unions. Nested unions contribute their full size (tag + data) when computing the parent union's size.

Pattern Matching with choose

Use choose-when to match on variants. Each when clause names a binding, then the variant, then the result; the binding is bound to the variant's payload:


choose s
    when r: Circle: computeCircleArea(r)
    when b: Box: b.width * b.height
    else 0.0

If not all variants are covered, an else clause is required:


choose s: when r: Circle: computeCircleArea(r) else 0.0    ; handles Box and Empty

Option Type

Option[T] is a union for nullable values:


define Option[T] union
(
    Some: T
    None
)

function find(list: List[T], pred: function(T) returns bool) returns Option[T] {
    ...
}

choose find(items, \x: x > 10)
    when value: Some: process(value)
    else handle_not_found()
      

Option Optimization

For non-pointer types, Option[T] optimizes to a pointer:

  • None = null pointer

  • Some(value) = pointer to value

Note that if T is itself a pointer, the Option Optimization leads to a pointer to another pointer which could be null itself.

Generics

Scaly supports generic types with type parameters in square brackets:


define List[T] {
    ...
}

define HashMap[K, V] {
    ...
}

let numbers List[i32]()
let names List[String]()
    

Monomorphization

Generics are implemented via monomorphization: each concrete instantiation becomes a completely separate type at compile time. List[i32] and List[String] share no code at runtime — each has its own specialized implementation.

Benefits:

  • No runtime overhead — no type descriptors or vtables

  • Full optimization — the compiler sees concrete types

  • No boxing — primitives stay primitives

Trade-offs:

  • Larger binaries — each instantiation duplicates code

  • Longer compile times — more code to generate

Name Mangling

The Planner generates unique mangled names for each instantiation, following Itanium ABI conventions:


List[i32]         → _ZN4ListIiE...
List[String]      → _ZN4ListI6StringE...
HashMap[String, i32] → _ZN7HashMapI6StringiE...
      

These names are compatible with c++filt for debugging.

Lifetimes and Memory Regions

Scaly uses Region-Based Memory Management (RBMM). Instead of garbage collection or manual malloc/free, values are allocated in memory regions (pages) that are deallocated in bulk when their owning scope exits.

Stack vs Page Allocation

The presence or absence of a lifetime suffix on a constructor call determines whether the value is stack-allocated or page-allocated:

No suffix = Stack allocation

The value is allocated on the stack frame. Fast, automatic cleanup when the function returns. Cannot outlive the current function.

Lifetime suffix = Page allocation

The value is allocated on a memory page. Can outlive the current block depending on which lifetime is used.


let stack_point Point(10, 20)     ; stack-allocated (by value)
let page_point Point$(10, 20)     ; page-allocated (local page)
      

Note: The lifetime suffix comes before the constructor parameters, not after. This follows the general pattern: Type + Generics + Lifetime + Parameters.

Lifetime Kinds

Local ($)

Value is allocated on a local page that lives until the end of the current block. The page is lazily allocated on first use and automatically deallocated when the block exits.

Caller (#)

Value is allocated on the caller's return page. It survives the function return and is managed by the caller. Used for returning heap-allocated values.

Thrown (!)

Value is allocated on the exception page. Used for error values that may be thrown and caught by the caller.

Reference (^name)

Value is allocated on the same page as the named variable. The named variable must itself be page-allocated (not stack-allocated). Used when adding values to collections.

Local Lifetime ($)

Local lifetime allocates on a page scoped to the current block. The page is lazily allocated (only when needed) and automatically freed when the block exits:


function process(input: String) {
    if condition {
        let parser Parser$(input)    ; allocated on local page
        parser.parse()
    }                                 ; page deallocated here

    ; parser and its page are gone
}
      

Important: Local lifetime ($) is forbidden on return types. A function cannot return a value with local lifetime because the local page is deallocated before the caller receives the value.


; ERROR: local lifetime ($) not allowed on return types
function bad() returns Point$ { ... }

; OK: use caller lifetime (#) or by-value return
function good() returns Point# { ... }
function also_good() returns Point { ... }
      

Caller Lifetime (#)

Values that must survive a function return use caller lifetime. The value is allocated on the caller's page, not the function's local page:


function createParser(input: String) returns pointer[Parser] {
    Parser#(input)    ; allocated on caller's page
}
      

The compiler may infer caller lifetime for values in return position when the return type specifies it.

Thrown Lifetime (!)

Error values use thrown lifetime. The value is allocated on a special exception page provided by the caller:


function parse(input: String) returns AST throws ParseError {
    if invalid(input) {
        throw ParseError("invalid syntax")!   ; thrown lifetime
    }
    ...
}
      

The compiler infers thrown lifetime for values in throw position.

Reference Lifetime (^name)

When adding a value to a collection, the value must be allocated on the same page as the collection. Use reference lifetime with the collection's name:


let items Array$()                ; items is page-allocated
let item Car^items("red")         ; item allocated on same page as items
items.add(item)                   ; safe - same lifetime
      

Validation: The compiler verifies that the referenced variable (items in this example) is page-allocated. Referencing a stack-allocated variable is an error:


let stack_car Car("blue")         ; stack-allocated (no suffix)
let other Car^stack_car("red")    ; ERROR: cannot use ^stack_car
                                  ; stack_car is not page-allocated
      

This prevents dangling references: you cannot tie a value's lifetime to a stack variable that will be destroyed when the function returns.

By-Value Returns

Functions can return values by value (no lifetime annotation). This is the simplest approach for small types:


function createPoint(x: i32, y: i32) returns Point {
    Point(x, y)    ; constructed and returned by value
}
      

By-value return avoids page allocation entirely. The value is constructed directly in the caller's stack frame or register.

Lifetime Inference

The compiler infers lifetimes where possible:

  • No suffix on constructor = stack allocation (by value)

  • Throw position implies thrown lifetime (!)

  • Reference lifetime (^container) must always be explicit

  • $, # must be explicit on constructor calls