Migen (Milkymist Generator)
a Python toolbox for building complex digital hardware
======================================================

Background
==========
Even though the Milkymist system-on-chip [1] is technically successful,
it suffers from several limitations stemming from its implementation in
manually written Verilog HDL:

(1) The "event-driven" paradigm of today's dominant hardware descriptions
languages (Verilog and VHDL, collectively referred to as "V*HDL" in the
rest of this document) is often too general. Today's FPGA architectures
are optimized for the implementation of fully synchronous circuits. This
means that the bulk of the code for an efficient FPGA design falls into
three categories:
  (a) Combinatorial statements
  (b) Synchronous statements
  (c) Initialization of registers at reset
V*HDL do not follow this organization. This means that a lot of
repetitive manual coding is needed, which brings sources of human errors,
petty issues, and confusion for beginners:
  - wire vs. reg in Verilog
  - forgetting to initialize a register at reset
  - deciding whether a combinatorial statement must go into a
    process/always block or not
  - simulation mismatches with combinatorial processes/always blocks
  - and more...
A little-known fact about FPGAs is that many of them have to ability to
initialize their registers from the bitstream contents. This can be done
in a portable and standard way using an "initial" block in Verilog, and
by affecting a value at the signal declaration in VHDL. This renders an
explicit reset signal unnecessary in practice in some cases, which opens
the way for further design optimization. However, this form of
initialization is entirely not synthesizable for ASIC targets, and it is
not easy to switch between the two forms of reset using V*HDL.

(2) V*HDL support for composite types is very limited. Signals having a
record type in VHDL are unidirectional, which makes them clumsy to use
e.g. in bus interfaces. There is no record type support in Verilog, which
means that a lot of copy-and-paste has to be done when forwarding grouped
signals.

(3) V*HDL support for procedurally generated logic is extremely limited.
The most advanced forms of procedural generation of synthesizable logic
that V*HDL offers are CPP-style directives in Verilog, combinatorial
functions, and generate statements. Nothing really fancy, and it shows.
To give a few examples:
  - Building highly flexible bus interconnect is not possible. Even
arbitrating any given number of bus masters for commonplace protocols
such as Wishbone cannot be done with the tools at V*HDL puts at our
disposal. This requires manual recoding of parts of the arbiter to add or
remove a master, which is tedious and often cause frustrating errors.
Each occurence of the latter can easily cause one or two hours of lost
productivity when combined with the long compilation times of moderately
complex system-on-chip designs.
  - Building a memory infrastructure (including bus interconnect, bridges
and caches) that can automatically adapt itself at compile-time to any
word size of the SDRAM is clumsy and tedious.
  - Building register banks for control, status and interrupt management
of cores can also largely benefit from automation.
  - Many hardware acceleration problems can fit into the dataflow
programming model. Manual dataflow implementation in V*HDL has, again, a
lot of redundancy and potential for human errors. See the Milkymist
texture mapping unit [3][4] for an example of this. The amount of detail
to deal with manually also makes the design space exploration difficult,
and therefore hinders the design of efficient architectures.
  - Pre-computation of values, such as filter coefficients for DSP or
even simply trigonometric tables, must often be done using external tools
whose results are copy-and-pasted (in the best cases, automatically) into
the V*HDL source.

Enter Migen, a Python toolbox for building complex digital hardware. We
could have designed a brand new programming language, but that would have
been reinventing the wheel instead of being able to benefit from Python's
rich features and immense library. The price to pay is a slightly
cluttered syntax at times when writing descriptions in FHDL, but we
believe this is totally acceptable, particularly when compared to VHDL
;-)

Migen is made up of several related components, which are briefly
described below.

Migen FHDL
==========
The Fragmented Hardware Description Language (FHDL) is the lowest layer
of Migen. It consists of a formal system to describe signals, and
combinatorial and synchronous statements operating on them. The formal
system itself is low level and close to the synthesizable subset of
Verilog, and we then rely on Python algorithms to build complex
structures by combining FHDL elements and encapsulating them in
"fragments".
The FHDL module also contains a back-end to produce synthesizable
Verilog, and some basic analysis functions. It would be possible to
develop a VHDL back-end as well, though more difficult than for Verilog -
we are "cheating" a bit now as Verilog provides most of the FHDL
semantics.

FHDL differs from MyHDL [2] in fundamental ways. MyHDL follows the
event-driven paradigm of traditional HDLs (see Background, #1) while FHDL
separates the code into combinatorial statements, synchronous statements,
and reset values. In MyHDL, the logic is described directly in the Python
AST. The converter to Verilog or VHDL then examines the Python AST and
recognizes a subset of Python that it translates into V*HDL statements.
This seriously impedes the capability of MyHDL to generate logic
procedurally. With FHDL, you manipulate a custom AST from Python, and you
can more easily design algorithms that operate on it.

FHDL is made of several elements, which are briefly explained below.

BV
--
The bit vector (BV) object defines if a constant or signal is signed or
unsigned, and how many bits it has. This is useful e.g. to:
 - determine when to perform sign extension (FHDL uses the same rules as
Verilog).
 - determine the size of registers.
 - determine how many bits should be used by each value in
concatenations.

Constant
--------
This object should be self-explanatory. All constant objects contain a BV
object and a value. If no BV object is specified, one will be made up
using the following rules:
  - If the value is positive, the BV is unsigned and has the minimum
number of bits needed to represent the constant's value in the canonical
base-2 system.
  - If the value is negative, the BV is signed, and has the minimum
number of bits needed to represent the constant's value in the canonical
two's complement, base-2 system.

Signal
------
The signal object represents a value that is expected to change in the
circuit. It does exactly what Verilog's "wire" and "reg" and VHDL's
"signal" and "variable" do.

The main point of the signal object is that it is identified by its
Python ID (as returned by the id() function), and nothing else. It is the
responsibility of the V*HDL back-end to establish an injective mapping
between Python IDs and the V*HDL namespace. It should perform name
mangling to ensure this. The consequence of this is that signal objects
can safely become members of arbitrary Python classes, or be passed as
parameters to functions or methods that generate logic involving them.

The properties of a signal object are:
  - a bit vector description
  - a name, used as a hint for the V*HDL back-end name mangler.
  - a boolean "variable". If true, the signal will behave like a VHDL
variable, or a Verilog reg that uses blocking assignment. This parameter
only has an effect when the signal's value is modified in a synchronous
statement.
  - the signal's reset value. It must be an integer, and defaults to 0.
When the signal's value is modified with a synchronous statement, the
reset value is the initialization value of the associated register.
When the signal is assigned to in a conditional combinatorial statement
(If or Case), the reset value is the value that the signal has when no
condition that causes the signal to be driven is verified. This enforces
the absence of latches in designs. If the signal is permanently driven
using a combinatorial statement, the reset value has no effect.
  
The sole purpose of the name property is to make the generated V*HDL code
easier to understand and debug. From a purely functional point of view,
it is perfectly OK to have several signals with the same name property.
The back-end will generate a unique name for each object. If no name
property is specified, Migen will analyze the code that created the
signal object, and try to extract the variable or member name from there.
It then uses the module name that created the signal, a underscore, and
the variable name. For example, if we are in module "foo", the following
statements will create one or several signal(s) named "foo_bar":
  bar = Signal()
  self.bar = Signal()
  self.baz.bar = Signal()
  bar = [Signal() for x in range(42)]

Operators
---------
Operators are represented by the _Operator object, which generally should
not be used directly. Instead, most FHDL objects overload the usual
Python logic and arithmetic operators, which allows a much lighter syntax
to be used. For example, the expression:
  a * b + c
is equivalent to:
  _Operator('+', [_Operator('*', [a, b]), c])
  
Slices
------
Likewise, slices are represented by the _Slice object, which often should
not be used in favor of the Python slice operation [x:y].
Implicit indices using the forms [x], [x:] and [:y] are supported.
Beware! Slices work like Python slices, not like VHDL or Verilog slices.
The first bound is the index of the LSB and is inclusive. The second
bound is the index of MSB and is exclusive. In V*HDL, bounds are MSB:LSB
and both are inclusive.

Concatenations
--------------
Concatenations are done using the Cat object. To make the syntax lighter,
its constructor takes a variable number of arguments, which are the
signals to be concatenated together (you can use the Python '*' operator
to pass a list instead).
To be consistent with slices, the first signal is connected to the bits
with the lowest indices in the result. This is the opposite of the way
the '{}' construct works in Verilog.

Replications
------------
The Replicate object represents the equivalent of {count{expression}} in
Verilog.

Assignments
-----------
Assignments are represented with the _Assign object. Since using it
directly would result in a cluttered syntax, the preferred technique for
assignments is to use the eq() method provided by objects that can have a
value assigned to them. They are signals, and their combinations with the
slice and concatenation operators.
As an example, the statement:
  a[0].eq(b)
is equivalent to:
  _Assign(_Slice(a, 0, 1), b)

If statement
------------
The If object takes a first parameter which must be an expression
(combination of the Constant, Signal, _Operator, _Slice, etc. objects)
representing the condition, then a variable number of parameters
representing the statements (_Assign, If, Case, etc. objects) to be
executed when the condition is verified.

The If object defines a Else() method, which when called defines the
statements to be executed when the condition is not true. Those
statements are passed as parameters to the variadic method.

For convenience, there is also a Elif() method.

Example:
If(tx_count16 == 0,
    tx_bitcount.eq(tx_bitcount + 1),
    If(tx_bitcount == 8,
        self.tx.eq(1)
    ).Elif(tx_bitcount == 9,
        self.tx.eq(1),
        tx_busy.eq(0)
    ).Else(
        self.tx.eq(tx_reg[0]),
        tx_reg.eq(Cat(tx_reg[1:], 0))
    )
)

Case statement
--------------
The Case object constructor takes as first parameter the expression to be
tested, then a variable number of lists describing the various cases.

Each list contains an expression (typically a constant) describing the
value to be matched, followed by the statements to be executed when there
is a match. The head of the list can be the an instance of the Default
object.

Instances
---------
Instance objects represent the parametrized instantiation of a V*HDL
module, and the connection of its ports to FHDL signals. They are useful
in a number of cases:
  - reusing legacy or third-party V*HDL code.
  - using special FPGA features (DCM, ICAP, ...).
  - implementing logic that cannot be expressed with FHDL (asynchronous
    circuits, ...).
  - breaking down a Migen system into multiple sub-systems, possibly
    using different clock domains.

The properties of the instance object are:
  - the type of the instance (i.e. name of the instantiated module).
  - a list of output ports of the instantiated module. Each element of
    the list is a pair containing a string, which is the name of the
    module's port, and either an existing signal (on which the port will
    be connected to) or a BV (which will cause the creation of a new
    signal).
  - a list of input ports (likewise).
  - a list of (name, value) pairs for the parameters ("generics" in VHDL)
    of the module.
  - the name of the clock port of the module (if any). If this is
    specified, the port will be connected to the system clock.
  - the name of the reset port of the module (likewise).
  - the name of the instance (can be mangled like signal names).

Memories
--------
Memories (on-chip SRAM) are supported using a mechanism similar to
instances.

A memory object has the following parameters:
  - the width, which is the number of bits in each word.
  - the depth, which represents the number of words in the memory.
  - an optional list of integers used to initialize the memory.
  - a list of port descriptions.

Each port description contains:
  - the address signal (mandatory).
  - the data read signal (mandatory).
  - the write enable signal (optional). If the port is using masked
    writes, the width of the write enable signal should match the number
    of sub-words.
  - the data write signal (iff there is a write enable signal).
  - whether reads are synchronous (default) or asynchronous.
  - the read enable port (optional, ignored for asynchronous ports).
  - the write granularity (default 0), which defines the number of bits
    in each sub-word. If it is set to 0, the port is using whole-word
    writes only and the width of the write enable signal must be 1. This
    parameter is ignored if there is no write enable signal.
  - the mode of the port (default READ_FIRST, ignored for asynchronous
    ports). It can be:
     * READ_FIRST: during a write, the previous value is read.
     * WRITE_FIRST: the written value is returned.
     * NO_CHANGE: the data read signal keeps its previous value on a
       write.

Migen generates behavioural V*HDL code that should be compatible with all
simulators and, if the number of ports is <= 2, most FPGA synthesizers.
If a specific code is needed, the memory generator function can be
overriden using the memory_handler parameter of the conversion function.

Fragments
---------
A "fragment" is a unit of logic, which is composed of:
  - a list of combinatorial statements.
  - a list of synchronous statements.
  - a list of instances.
  - a list of memories.
  - a set of pads, which are signals intended to be connected to
    off-chip devices.

Fragments can reference arbitrary signals, including signals that are
referenced in other fragments. Fragments can be combined using the "+"
operator, which returns a new fragment containing the concatenation of
each pair of lists.

Fragments can be passed to the back-end for conversion to Verilog.

By convention, classes that generate logic implement a method called
"get_fragment". When called, this method builds a new fragment
implementing the desired functionality of the class, and returns it. This
convention allows fragments to be built automatically by combining the
fragments from all relevant objects in the local scope, by using the
autofragment module.

Migen Core Logic
================
Migen Core Logic is a convenience library of common logic circuits
implemented using FHDL:
  - a multi-cycle integer divider.
  - a round-robin arbiter, useful to build bus arbiters.
  - a multiplexer bank (multimux), useful to multiplex composite
    (grouped) signals.
  - a condition-triggered static scheduler of FHDL synchronous statements
    (timeline).

Migen Bus
=========
Migen Bus contains classes providing a common structure for master and
slave interfaces of the following buses:
  - Wishbone [5], the general purpose bus recommended by Opencores.
  - CSR-2, a low-bandwidth, resource-sensitive bus designed for
    accessing the configuration and status registers of cores from
    software.
  - ASMIbus, a split-transaction bus optimized for use with a
    high-performance, out-of-order SDRAM controller.

It also provides interconnect components for these buses, such as
arbiters and address decoders. The strength of the Migen procedurally
generated logic can be illustrated by the following example:
  wbcon = wishbone.InterconnectShared(
      [cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
      [(0, norflash.bus), (1, wishbone2asmi.wishbone),
      (3, wishbone2csr.wishbone)])
In this example, the interconnect component generates a 4-way round-robin
arbiter, multiplexes the master bus signals into a shared bus, determines
that the address decoding must occur on 2 bits, and connects all slave
interfaces to the shared bus, inserting the address decoder logic in the
bus cycle qualification signals and multiplexing the data return path. It
can recognize the signals in each core's bus interface thanks to the
common structure mandated by Migen Bus. All this happens automatically,
using only that much user code. The resulting interconnect logic can be
retrieved using wbcon.get_fragment(), and combined with the fragments
from the rest of the system.

Migen Bank
==========
Migen Bank is a system comparable to wishbone-gen [6], which automates
the creation of configuration and status register banks and
(TODO) interrupt/event managers implemented in cores.

Bank takes a description made up of a list of registers and generates
logic implementing it with a slave interface compatible with Migen Bus.

A register can be "raw", which means that the core has direct access to
it. It also means that the register width must be less or equal to the
bus word width. In that case, the register object provides the following
signals:
  - dev_r, which contains the data written from the bus interface.
  - dev_re, which is the strobe signal for dev_r. It is active for one
    cycle, after or during a write from the bus. dev_r is only valid when
    dev_re is high.
  - dev_w, which must provide at all times the value to be read from the
    bus.

Registers that are not raw are managed by Bank and contain fields. If the
sum of the widths of all fields attached to a register exceeds the bus
word width, the register will automatically be sliced into words of the
maximum size and implemented at consecutive bus addresses, MSB first.
Field objects have two parameters, access_bus and access_dev, determining
respectively the access policies for the bus and core sides. They can
take the values READ_ONLY, WRITE_ONLY and READ_WRITE.
If the device can read, the field object provides the dev_r signal, which
contains at all times the current value of the field (kept by the logic
generated by Bank).
If the device can write, the field object provides the following signals:
  - dev_w, which provides the value to be written into the field.
  - dev_we, which strobes the value into the field.

Migen Flow (TODO)
==========
Many hardware acceleration problems can be expressed in the dataflow
paradigm, that is, using a directed graph representing the flow of data
between actors.

Actors in Migen are written directly in FHDL. This maximizes the
flexibility: for example, an actor can implement a DMA master to read
data from system memory. It is conceivable that a CAL [7] to FHDL
compiler be implemented at some point, to support higher level
descriptions of some actors and reuse of third-party RVC-CAL
applications. [8] [9] [10]

Actors communicate by exchanging tokens, whose flow is typically
controlled using handshake signals (strobe/ack).

Each actor has a "scheduling model". It can be:
  - N-sequential: the actor fires when tokens are available at all its
    inputs, and it produces one output token after N cycles. It cannot
    accept new input tokens until it has produced its output. A
    multicycle integer divider would use this model.
  - N-pipelined: similar to the sequential model, but the actor can
    always accept new input tokens. It produces an output token N cycles
    of latency after accepting input tokens. A pipelined multiplier would
    use this model.
  - Dynamic: the general case, when no simple hypothesis can be made on
    the token flow behaviour of the actor. An actor accessing system
    memory on a shared bus would use this model.

Migen Flow automatically generates handshake logic for the first two
scheduling models. In the third case, the FHDL descriptions for the logic
driving the handshake signals must be provided by the actor.

If sequential or pipelined actors are connected together, Migen Flow will
attempt to find a static schedule, remove the handshake signals, optimize
away the control logic in each actor and replace it with a centralized
FSM implementing the static schedule.

An actor can be a composition of other actors.

Actor graphs are managed using the NetworkX [11] library.


References:
[ 1] http://milkymist.org
[ 2] http://www.myhdl.org
[ 3] http://milkymist.org/thesis/thesis.pdf
[ 4] http://www.xilinx.com/publications/archives/xcell/Xcell77.pdf p30-35
[ 5] http://cdn.opencores.org/downloads/wbspec_b4.pdf
[ 6] http://www.ohwr.org/projects/wishbone-gen
[ 7] http://opendf.svn.sourceforge.net/viewvc/opendf/trunk/doc/
     GentleIntro/GentleIntro.pdf
[ 8] http://orcc.sourceforge.net/
[ 9] http://orc-apps.sourceforge.net/
[10] http://opendf.sourceforge.net/
[11] http://networkx.lanl.gov/