[Elphel-support] mcontr353 question

Wed Nov 16 23:49:39 PST 2011

Marc,

I will need to look in the code to answer your question, the code is rather
messy in this part.  Below you will find my general comment that I could
provide without digging that code right now.

The memory controller was the part of the 313 camera and survived several
generations of the FPGA chips and sensor sizes, each time I had to fit the
new bits somewhere, trying to minimize overall changes. There was one major
change when I was making Theora encoder (it has 8- channel controller with
more advanced memory access interleaving with >90% average bandwidth
utilizat6ion) , but that branch is inactive now, and the current code is
incrementally updated since 2002 (from my very first experience with the
reconfigurable FPGA).

The memory is accessible by the software (through channel 3) - either as a
single 64MB "file" or with "fpcf -sr" and "fpcf -sw" commands. The access
for each channel was organized in one of 2 modes - either sequential (like
in channel 0) or in square 20x20 pixel tiles (used by channel2 -
compressor). In each mode the full address of the tile ("atomic" transfer)
is combined from start address (split between two control words) with
increment of 512 (bytes if I remember correctly). Then tileX and tileY,
with the bit size of each depending on the mode (in mode 0 tileX has less
bits than in mode 1, and tileY address is the opposite - in mode 0 it is
just line scan line number (14 bits total), in mode 1 - 20x20pixel (16
pixel period)  tile number that has only 10 bits. TileX is now covering up
to 8192 pixel long lines, so in mode 0 it needs 4 pits and in mode 1 - 8
bits.

Memory addresses are processed in 4 two-cycle steps, (each other clock
cycle) to ease timing requirements. Sequential processing requires the bits
that may influence the other ones to be handled simultaneously or earlier.
There are 2 distributed memories that are involved in handling memory
addresses - one is writable and readable from the CPU and read-only from
the memory controller (it holds each channel frame addresses/dimensions)
and the other one - readable and writable by the controller and read-only
for the CPU (current tileX and tileY for each channel). Reading/writing to
the registers 0x20..0x2f combines bits from both memories (multiplexed in
weird way to just give room for extra bits for the ever growing memories
and sensors) the driver include file x353.h has some comments on these
registers and I hope they match current state of the code - I was updating
these comments so many times.

BRAM buffers now 4  times the SDRAM read/wrirte block size, so compressor
channel 2 can read 4 blocks ahead of being processed and channel 0 can wait
for SDRAM access for that long too before buffer is overrun. All the memory
access code is in the file mcontr.v. The channels 0 and 2 are normally
programmed to be dependent - compressor makes sure that there is just
enough lines buffered so the latency is minimal. And this buffering is
different in normal mode (channel 0 has to be 20=16+4 lines ahead) and
"linescan mode" where there is no overlap between individual sub-frames of
the composite frame being compressed as a whole.

Hope this helps somewhat, I'll look in the code when I'll get a chance. If
my above explanations would help you to navigate the code and you'll get
some additional  questions before that - I'll try to combine those searches.

Andrey

On Wed, Nov 16, 2011 at 11:58 PM, Marc Reichenbach <
marc.reichenbach at informatik.uni-erlangen.de> wrote:

> Dear Elphel Team,
>
> I've got a small question about the memory controller in x353. The
> memory controllers has 4 channels, where channel 0 is for the
> sensor-output. The memorycontroller is controlled by the writepage
> signal, which stores all temporary data (stored in BRAM) to the DDR
> RAM. The adress to the memory controller is only 10 bit (2MSB for
> bank, 8 for local adressing). My question is, how are the global
> adresses (for the ddr ram) are calculated? For example in the
> simulation the data for bank 0 can be stored at 0+x, 160+x, 320+x and
> so on, where x is the local (BRAM) address. But how is the "offset"
> (0,160,320, ..) calculated. My first idea was, it is increased by
> every cycle where writepage is 1. But now I think, it is time
> depending. Could you please tell me, where this adress is calculated
> (which module/verilog file/line) and how it works?
>
> Thank you for your efforts,
>
> Marc
>
> PS: Sebastian Pichelhofer has asked me, if I can a little bit
> documented my research work (for example: the simulation with VCS).
> Should I do this in the wiki below UserProjects?
>
> _______________________________________________
> Support-list mailing list
> Support-list at support.elphel.com
> http://support.elphel.com/mailman/listinfo/support-list_support.elphel.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://support.elphel.com/pipermail/support-list_support.elphel.com/attachments/20111117/adc6986f/attachment-0002.html>