Week12: Weekly Report

This is the final week of GSoC! Need to finish everything ASAP. Instead of putting a report for this week, I will put up a complete final report which covers everything I did during these three months. Keep  look out for that!

Cheers

 

Advertisements

Week 11: Weekly Report

I have been trying to fix the errors from the last week and complete the Single Output Mixer Pipeline. This cover week 11, the week from 29th July to 5th August.

Major tasks accomplished this week:

  • Add extra DMA block to one HDMI_Out such that the output is synchronized and alignment errors are corrected.
  • In the process read up a whole bunch of documentation on implementation of VGA Core from MIgen docs, and the HDMi_Out core is based on a similar kind of connections https://migen.readthedocs.io/en/latest/casestudies.html
  • Changed layout and instantiated several modules to get two DMA engine blocks per HDMI_OUT
  • Block diagrams currently being updated, current ones in the doc https://docs.google.com/document/d/1g1c2IwCVxVzSHWdXbZ746HP-fnM4y1WqFuBZNkLi5mw/edit
  • Initially doing this for two outputs, but failing to make this work after several Xilinx specific errors in MAP stage.
  • Fixed the errors being caused, those were because of incorrect use of

    platform.request("hdmi_out", 1), because these pins aren’t used.
  • Current status, DMA engine is supposed to take two base address for two video streams to be mixed, from
hdmi_out0_fi_base0_write() 

hdmi_out0_fi_base1_write()
  • In this hdmi_out0_fi_base1_write, somehow doesn’t work.

 

Next Week’s task:

Finish everything asap!

Week 10: Weekly Report

As I write this, I just noticed only two weeks (plus some more days) left in the official coding period of Google Summer of Code, some much time has already passed, hoping to complete the milestones by that time. This week’s focus was to first implement a static mixing of inputs, and once that is figured out in hardware, change the multiplier value from firmware to do dynamic mixing. 

This work pertains to work done from 23rd July to 29th July.

Adding modules to Video Pipeline:

Last week I was adding my defined modules for float arithmetic the input video pipeline, that is in the hdmi_in files. Though this was a good test to check the working, the modules are supposed to be added to output pipeline. (gateware/hdmi_out/phy.py).

Major tasks done this week,

  • Fixed bug that causes missing color in gradients: This was a bug in floating point multiplier unit which was causing random colors in the gradient to be missing. This wasn’t spotted in simulation before as dynamic testing with different inputs wasn’t done for testing floatpoint units. This bug was basically because of error in pipeline, stage 5 was copying a value from stage 3 instead of stage 4. A bit difficult to figure out this error, because when a lot of data is going in the pipeline, output is not very easily decipherable. What I noticed was that frac of current output was dependent on frac of previous outputs, And after that it was really easy to fix this.
  • Adding mixer block in hardware, with input to adder and mult hardwired
    Once the modules seemed to work perfectly in simulation, all the modules were added together with same layout (other inputs of add and mult hardwired to zero and one respectively). In this case the layout at floatadd and floatmult is always, rgb16f_layout. This is good, because, same layout means that we can easily connect using Record.connect() method. This was tested to work perfectly. Next task was to check if mixing was done correctly, for that I needed to connect modules with different layouts.
  • Figure out connecting blocks with different layouts:
    For connecting two PipelinedActor modules we generally use Record.connect() method which by default connects all the other signals (like ack, stb ) apart from payload signals. I first tried to define two sinks for floatadd module, this would have been perfect, but I encountered an error. After a lot of asking around and not getting the exact answer on how to proceed, I decided to dive myself into the libraried that define these classes and methods. I mainly focused on how Record.connect() was implemented and stuff the class the inputs were derived from.

General Documentation:
So apart from the usual payload signals, the source and sink of a pipelined module have four other signals defined, which control the flow of packet from source of master to sink of slave. These signals are namely stb, ack, sop, eop. Information about stb and ack can be found here. The other two, sop and eop, refer to start of packed and end of packet. So we basically want to know the equivalent connections for Record.connect()

# This is the Record.connect() method

Record.connect(ycbcr2rgb.source, rgb2rgb16f.sink),

def rgb_layout(dw):
     return [("r", dw), ("g", dw), ("b", dw)]

# This is alternate way of doing equivalent connections

rgb2rgb16f.sink.r.eq(ycbcr2rgb.source.r),
rgb2rgb16f.sink.g.eq(ycbcr2rgb.source.g),
rgb2rgb16f.sink.b.eq(ycbcr2rgb.source.b),

rgb2rgb16f.sink.stb.eq(ycbcr2rgb.source.stb),
ycbcr2rgb.source.ack.eq(rgb2rgb16f.sink.ack),
rgb2rgb16f.sink.sop.eq(ycbcr2rgb.source.sop),
rgb2rgb16f.sink.eop.eq(ycbcr2rgb.source.eop),


More information about significance and implementation of these signals is in this document (page 23).

Bug Fixing:

  • The rgb16f2rgb, unit didn’t have a mechanism to treat overflows of float values correctly. When loaded with float values greater than 1.0, the output was coming out as 0, but it is supposed to saturate to 255. This was changed in rgb16f2rgb file, by adding a simple condition at the output.
  • While, adding eq statements for Record.connect() equivalence, as discussed above, I was adding the .eq statements in opsis_video.py file, but didn’t seem to working. After about a day of random tries here and there, I found out that I hadn’t wrapped .eq statements in a self.comb += []. Caused a lot of delay because, each compilation cycle in hardware took about 20 minutes and that made it very difficult to spot the error.

Static Mixing:
Things seemed to be mixing well after this. Though there is some alignment problem due to timing inconsistency, which I need to figure out. For static testing I hardwired the mult with 0.5 value and connected output from HDMI_OUT0 and HDMI_OUT1 to floatadd unit of HDMI_OUT0. Later added a CSRStorage register, to dynamically vary the multiplier value from firmware to create a fade like effect. Added supporting functions to ci.c and other functions for maintaining timing.


Link to Output pics link
Link to fade video video

 

Week 9: Weekly Report

Continuing with my regular weekly report, I am now in 9th week of Google Summer of Code. My regular college semester has started from this week and I am required to go to lab for my thesis work, so there haven’t been a lot of updates here. Hoping to complete a lot of things during the upcoming weekend. This report pertains to work done from 16th to 22nd July.

First half of the week was spent fixing errors and issues in previous work. One issue I was facing since a long time was the regarding testing and porting to litex. Like always the errors were some silly mistake on my part, and took longer than required to debug.

Testing of float16 using CSR:

The idea was to use the pipelined hardware block to compute floating point multiplication of two numbers in hardware. The numbers were given through firmware by using CSR registers defined in hardware. After correctly initialising floatmult datapath in target/opsis_base.py, and adding the relevant csr functions in ci.c, I expected the output to be correct but somehow input was not transferring across. Which I thought was because incorrect implementation of CSR registers as wires, but it turned out way of connecting at input side was incorrect.

   # Incorrect way
        self.comb += [
            self._float_in2.storage.eq(sink.in1),
            self._float_in2.storage.eq(sink.in2),
            self._float_out.status.eq(source.out)
        ]

   # Correct way
        self.comb += [
            sink.in1.eq(self._float_in1.storage),
            sink.in2.eq(self._float_in2.storage),
            self._float_out.status.eq(source.out)
        ]

Porting migen designs to litex for review

I was facing an error while porting my floating point arithmetic migen code to litex for review. Command line output of that error, link. Since this was coming up only while converting to litex, I thought there was some error with litex functionality. Later found out that this was because int variables being generated using numpy functions and hence the int variables were of numpy.uint type which was not correctly recognise by Signal object.

Although I had started working on run length encoding things after this, my mentor mithro suggested that I should first look at implementing mixer equation using a static mask for now.

Video Pipeline:

The first step was to add all the submodules to video pipeline to check that they follow the timing constraints. I found that the existing color space conversion was done in gateware/hdmi_in/analysis.py in the FrameExtraction class. For testing purpose I added the rgbtofloat16 module and reverse module to this pipeline, connecting output of one to input to another in the hope of getting the exact same input. This also meant that the float16 color space conversion modules were tested in the hardware for the first time. The next step was to subsequently test floatmult and floatadd units by adding them in pipeline.

There were some issues in uploading the bit stream, which I eventually circumvented and here is a link to picture output when all the things are added in video pipeline. The picture is not too sharp, and I feel that there is a missing color in the gradient. I am gonna check this for static image in simulation.

Issue: The generated bitstream (which took close to 20 minutes each time), once uploaded didn’t work and the USB connection was even lost after this. My mentor mithro suggested that I should use the lightweight videomixer target, ($echo TARGET=video), and the problem might be because of FPGA resetting FX2 IC after it boots. This was basically bypassing the problem.

Task for Upcoming Week:

Right now I am trying to figure out where exactly to add video mixer equation in the bunch of gateware python files. I think I should add this at output side, because in that way I can operate on pattern input as well. The idea is that I will setup the standard video pipeline with one input for floatadd left open, for hdmi_out as well. And in the target/opsis_video.py file I will connect the unconnected input of floatadd of hdmi_out0 to output of floatmult of hdmi_out1 and vice versa. Once this is figured out, I will add CSR register to set the multiplier value from firmware.

Weekly Report: Week 8

This was a short week as I had to go out of town for three days. I have mostly worked on floating point conversions and arithmetic modules this week.

Float 16 Arithmetic: link

  • The float 16 multiplier which I had already completed in migen, I ported it to litex to send out a pull request and get it reviewed. While porting I was getting a error while trying to run the testbench. I have added this pastebin link in the pull request comments as well. From my understanding, somewhere in the streamer something  is not defined as a Signal object. Which is causing this error. This was working fine in the migen environment, and I suppose there is some difference in implementation of streamer in litex.
  • Completed a five stage pipelined version of 16 bit floating point adder. This has been done under the migen environment of HDMI2USB-misoc-firmware. After some bug fixing, this has been tested to be working in the migen simulation environment.  Also ported this to litex for sending out the pull request and getting it reviewed. I encounter the similar error, which I encountered while porting the float mult module.
  • I sent the pull request later in the week, made the required changes as mentioned in the comments. This was mostly correct variable names and adding module level and class docstrings.

Float 16 Color Space Conversion: link

  • I had already sent out a pull request for this last week, but since I was focussing on other this laster last week, couldn’t look at the pull request comments.  Among many things I added a generic submodule for leading one detector. Earlier this was done using a number of if else conditions, and now done using a simple for loop, though I feel that this has made the simulation slower. Fixed some variable names and added more for loops to clean up the code.
  • From the the discussions I had with mithro and _florent_, it was suggested that it might be good to explore this implementation using a lookup table. At least for the int8 to float16 conversion, as the size will be only 256*16bits. I added the contents of lookup table using the already defined int8 to float16 test model functions I wrote for testing. Can do a similar thing for the reverse conversion, but the size of lookup table (65536*8bits) might be too big to be better than the earlier implementation.

Weekly Report: Week 7

After a bit of break during week 6, as I was unwell, I am back to my standard GSoC schedule now. There were a lot of things left to be reviewed from last two weeks so most of the time this week went in getting code reviewed and making the required changes.

Float16 Arithmetic: code

This work is continued from last week. I had defined a multiplier which worked for normal values, but gave incorrect results when the output is in the subnormal range of float16 variable. This was fixed by adding some extra conditions in the code. To test and debug the code efficiently added functions to convert unsigned int16 data type (the format in which data is streamed) to standard float and binary representation. This also helped to effectively verify the code for various test cases. This was code was tested to work fine on simulation, to test this on hardware, few more things were added to gateware and firmware to test the working.

In gateware, first added CSR storage and status registers. CSR is added as a part of bus support in migen and designed for accessing configuration and status registers of cores from software. Figured out a way to correctly initialize the FloatMult module in opsis_base, and checked the added functions in csr.h file for accessing operands and results of float multiplier. These functions for writing to multiplier operands and reading from multiplier outputs were added to ci.c file in lm32 firmware.

While testing this I figured out that, when output is held to a constant value in gateware code, it shows up correctly in firmware output. But when a previous stage’s output is held to constant value output remains as zero, which is the reset value. I suspect that in some part of initialization of CSR registers, something or other is inferred as an wire and not holding its value. I assume this is a trivial mistake in initializing something in migen.

I had also started writing code for adders likewise. Still a work in progress, hoping to get back to this once all the existing things get reviewed.

Heartbeat Module: code

This pull request is in review since a lot of time now, hopefully in the final stages now. Apart from many other things I got to learn about usage of volatile data type in C. I had based my heartbeat code upon variables defined by pattern.c file, which used a volatile integer for framebuffer variable. Defining a variable volatile is required when we want the memory access to be sequential in the manner defined in code. But this was not the case here, not in pattern file as well, logged a github issue for this, which describe this issue in detail.

This week’s work on heartbeat involved cleaning up parts of code and adding documentation. After incorporating _florent_’s suggestions I no longer needed to make those changes hdmi_in and pattern c files. Thus, I won’t be needing to change those in this pull request which is a good practice to track changes. Also added documentation for setting desired heartbeat frequency.  Link

Design Document on Float16: code

Continuing with the review going on, made small changes here and there to improve the overall feel of the document and make it more easier for beginners in the field to understand. Added a whole bunch of links and added a github link to codes for generating the synthesis report results. This documentation currently explains in detail the FPGA mapping of a float16 multiplier, also adding a similar description for FPGA mapping of a float adder.

Float16 CSC Conversion:

Ported this design to litex for merging with _florent_’s litevideo repository. Code review underway at this pull request. After _florent_’s suggestion I might add a look up table for this conversion instead of doing this using pipelined hardware and other things. I had earlier considered this but wasn’t sure about the available memory space and bandwidth. The size of look up table for uint8 to float16 will be a lookup table of size (16bits)*(256 locations) and (8bits)*(65536 locations), this can be further reduced if we cleverly consider the range of float16 number. The former is still okay, but latter seems a bit expensive.

Run Length Encoding:

After several weeks of reviews and comments on my work, my mentor, mithro added most of the code required for the run length encoding. I spent most of the time of a day trying to understand the code. Both the python and the C code used a lot of things here and there and I couldn’t imagine myself writing a code like that. I was supposed to work further on the code to add documentation and adding functions for image transformation. Still a work in progress. Will be adding the pull request soon.

Weekly Report: Week5 & Week6

Week 5 was mostly focussed on completing the midterm task. I had been so preoccupied with completing midterm tasks, coupled with a network outage during the crucial Thursday night during the midterm week, I was unable to present a weekly report last week. I continued with completing the midterm tasks till Monday this week. Feeling unwell since a few days since then. (maybe due to severe lack of sleep chasing midterm deadlines). Slight delay in adding this report as I was waiting to get it reviewed by my mentor.

Heartbeat Module: link

I had been dealing with glitchy output in the heartbeat module last week (Video). After incorporating suggestions from _florent_’s comments and some more independent bug fixing I was able to get a stable glitch free output on the screen. Here is a link to the pull request which contains marginal updates with each new commit.

One change I made was to modify the way framebuffer was allotted its base address, instead of copying it from input side (hdmi_in0_fi_base), I read the updated address from each of three designated outputs, (for eg hdmi_out0_fi_base). This still didn’t correct the glitch. But gave me the idea that my heartbeat glitch can be corrected by calling the hb_fill() at a much faster rate to match the HDMI input frame rate. I made some changes in the timing part of the code, with some other changes the heartbeat module works perfectly. The next step is to get it tested on someone else’s hardware and merge it with the main repository.

Float16 CSC conversion: code

Building up on my previous week’s work on float16 conversion, I was basically building up upon the existing pipelined structure designed for YCbCr color space conversion. While going through the code I realized how easy it was for me to understand and the elaborate test modules built to test the color space conversion using a lena.png image. My conversion modules for rgb2rgb16 and reverse were built up upon these.

The pipeline latency is two for both the conversions. I have basically described a single module for one conversion and reused it as submodule for complete rgb to rgb16 space conversion. The modules are designed specifically keeping in mind that input pixel is a 8 bit number and won’t necessarily work for a general integer. The input int8 numbers in the range 0 to 255 are mapped to 0-1 in their corresponding float conversion for added precision in that range in case of float. The design is implemented as described in the float16 design document.

For testing, there is already described test setup which uses streamer and logger to send and receive packed RGB data. To test the working of our hardware model, we need a standard model in described in software, this serves two purposes, one of comparing the hardware design with software model and other of converting the float data back to RGB pixel data to be later saved as an image.

Float16 Design document: link

Added FPGA mapping to design document which was missing until last time. This was added for the basic arithmetic operations and currently, covers multiplication. I have currently described a five stage pipeline for float16 multiplication and identified the components which are likely to use higher hardware resources. These are pretty simple components like 6 bit adders or 11 bit multipliers, I wrote simple Verilog codes for them, and took a look at the Xilinx XST synthesis report. I included both the hardware resource usage and the combinational delay of that element so that pipeline can be optimized for highest throughput.

Float16 Arithmetic: code

Since I had already worked with designing hardware in migen and completely described the pipelined hardware, this was pretty easy to describe. It is important to take care of edge cases here as they are described IEEE standard document. For example subnormal numbers or the description of NaN or Inf in float16 format.

The already existing test module based on images wouldn’t have been sufficient for the task here, as it will not necessarily cover all the edge cases. It is important to note the format in which bits are streamed for to the design under test hardware module. The streamer accepts integer data type, and corresponding 16 bit representation of that integer data type should correspond to the float16 value we are intending to send, and hence it became really difficult to debug things with this system. Hence I described two functions which convert to and from, float data type to int16 data type which is actually streamed and logged.

RLE Encoding and Decoding: code

After several code reviews, I finally have an object orientated implementation for generating an object of Repeat class. This class also defines generate() methods to return a list of repeated elements. Also defined is a Pixel class which is used to represent individual pixels, and returns a list of that pixel value, when the gen() function is called. Also added required documentation and python doc tests, to test the class methods and other functions.

Weekly Report: Week4

Only one week left to complete midterm tasks. Heartbeat is integrated with firmware and ready for merge with upstream. Documentation on 16 bit float is ready, with a need for few minor edits.

Updated just the Opsis firmware

Continuing with last week’s work on Opsis, I focussed on understanding how to upload a updated firmware without recompiling the gateware (which takes 20-30 minutes). I encountered some issues with correct FX2 USB IC modes. I will start will steps necessary to successfully load an updated firmware to opsis.

  1. Last week I had discussed how to upload gateware to opsis using make load-gateware. It is important that we have a .bit bitstream file for uploading the gateware. Next do make load-gateware, this is what we had achieved last week. Make sure that the FX2 USB IC is in correct jtag mode for this. This steps step needs to be done before we hack around with firmware code, this is because Opsis might be shipped with a previous version of gateware and current firmware might not be compatible.
    $ make load-gateware
    
  2. To test the if the firmware is correctly updated or not, I added some lines to pattern.c, in the pattern_fill_framebuffer() function. After this use HDMI2USB-mode-switch discussed last week to switch to serial mode.

    $ ./<path>/hdmi2usb-mode-switch --mode=serial
    

    Note that the make load-firmware command for uploading just firmware works only when FX2 USB IC is in serial mode. Toggle between jtag and serial mode once or twice to set the FX2 USB IC in correct serial mode.

  3. Next type make load-firmware, this will compile any updated C files and load the updated firmware. If the compilation and build steps are successful, it opens the serial terminal and we get HDMI2USB> prompt

    $ make load-firmware
  4. At the HDMI2USB prompt, type reboot. It should say something like
    Booting from serial…
    sL5DdSMmkekro
    [FLTERM] Received firmware download request from the device.

    HDMI2USB> reboot
  5. If you get [FLTERM] Got unknown reply ‘T’ from the device, aborting then there has been an error downloading via the serial port. To resolve this, we need to switch to jtag and then back to serial mode of the FX2 USB IC using the HDMI2USB-mode-switch tool. This is similar to what we did for uploading gateware which used to jtag usb mode.

    $ ./<path>/hdmi2usb-mode-switch --mode=jtag
    $ ./<path>/hdmi2usb-mode-switch --mode=serial

You should get updated pattern after this on the specified output. This means that firmware update was successful. You can also check the version information.

Adding heartbeat functionality to firmware

I have been working on adding the heartbeat functionality(#234) to the firmware for several weeks now. The C code was almost ready. Initially I was unable to load firmware with a new file for functions. This was because Makefile in firmware/lm32/ has predefined list of objects or the *.o files, which required including the heartbeat.o file manually.

Once I had figured out how how to load firmware, the first natural step was to understand how an individual pixel can be modified by making some changes in firmware. Tested this on pattern_fill_framebuffer() by making small changes to particular locations of framebuffer. Once I had figured out how to generate the required square at right bottom corner, I looked for ways to integrate it with timing functions to implement beating.

I enabled the pattern_service() function in processor_service() function to understand the timing related stuff. Once I had figured out the correct timing frequency, I added the timing code to my heartbeat function and tested it on pattern. This looked to be working fine. working video

Tested the same for hdmi_in0 as input, there were some glitches in the output of hdmi_in0, which maybe due to timing inconsistency or some kind of memory conflict. Added support for all the three possible inputs, hdmi_in0, hdmi_in1 and pattern. I added the three heartbeat functions in processor_service function. There is some glitch in output when all three run together. There might be better ways to integrate the heartbeat function. glichy video

Next, I made the required changes in ci.c files to get an option for enabling or disabling heartbeat at HDMI2USB> prompt.

Documentation of 16 bit float

The pixel data for RGB as well as YCbCr corresponds to 8 bit for each channel. Since we will be doing a number of operations on this pixel data it is important that data type gives us sufficient precision. Hence we convert the 8 bit pixel data to 16 bit float for further operations like gamma correction and linear mixing. I have documented the 16 bit float format in detail in the document link. This documents covers in detail the actual 16 bit float format, conversion and implementation of arithmetic operations.

Adding support for rgb16f to/from xxx in gateware/csc

I have already documented methods for this conversion in the design document linked in the previous section. I started with understanding the existing methods for conversion of rgb to ycbcr and other modules. This was first time I was dealing with a Migen code, so it was pretty alien to me in the first time.

The existing modules in csc are implementing the necessary conversion using pipelining, with some modules taking as much as 8 pipelined stages. The float conversion modules will follow very similar steps, and hence I started building up upon the existing modules by making changes wherever necessary. Although I had started with migen related things to learn something new and interesting, I went back to completing other things ongoing in my project.

Some of migen documentation I found online,

RLE Code: Couldn’t proceed due to lack of time this week. I will try to complete this first thing in the upcoming week.

Upcoming Week’s Task

  • Adding conversion modules for 16 bit float conversion in csc module.
  • Add migen code for multiplication
  • Cleanup of RLE codes
  • Update design documentation

Weekly Report: Week3

So three weeks have already gone by and things seem to be picking up pace now. The midterm evaluation is two weeks from now and one of the thing was to have a clear set of goals for the mid term evaluation.

Opsis Board finally works!

After having spent a lot of time in changing communication modes and toggling power supply I was finally able to get the Opsis working in its default mode. So last week I was stuck at a point, as I was unable to upload the bit stream. I was getting this error at make load-gateware.

$ make load-gateware
..
..
====== Building for: ======
Platform:  opsis
Target:    opsis_hdmi2usb
Subtarget: HDMI2USBSoC
CPU type:  lm32
===========================
Open On-Chip Debugger 0.10.0-dev-00248-gf3b38ff (2016-04-03-07:23)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Warn : Adapter driver 'usb_blaster' did not declare which transports it allows; assuming legacy JTAG-only
Info : only one transport option; autoselect 'jtag'
Warn : incomplete ublast_vid_pid configuration
jtagspi_program
Info : usb blaster interface using libftdi
Error: unable to open ftdi device: device not found

The error says that load-gateware script can’t find the opsis device. This was happening because the Opsis USB was not in the correct mode required for uploading bitstream. I used some of the scripts pointed out by CarlFK for changing the required mode. Following this, I was able to change the mode to “jtag” using this, but I got a different error at  make load-gateware. (see the terminal output). When trying to read the DNA, using the same script, and got an incorrect DNA. This meant that the JTAG connection wasn’t set up correctly.  

DNA = 110000001100000011000000110000001100000011000000110000001 (0x181818181818181)

After talking to my mentor mithro, I tried again with updated HDMI2USB-mode-switch repo. This also included a Makefile to install all the relevant tools and rules for correct JTAG connection. This is the HDMI2USB-mode-switch tool for doing mode-switch. I had a hard time completing the installation of required dependencies, due to having a <space> in path to directory. Lesson learnt: Never create folder names with spaces in them. The installation steps generated a Python file in HDMI2USB-mode-switch/conda/bin/, which was supposed to run with,

$ ./<path>/hdmi2usb-mode-switch --mode=jtag

After this I encountered another error. This happens when udev rules are not installed. These come along with the HDMI2USB-mode-switch installation makefile. I had missed this in the makefile while installing other tools. This was quickly resolved.

$ make load-gateware
..
====== Building for: ======
Platform:  opsis
Target:    opsis_hdmi2usb
Subtarget: HDMI2USBSoC
CPU type:  lm32
===========================
Open On-Chip Debugger 0.10.0-dev-00248-gf3b38ff (2016-04-03-07:23)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Warn : Adapter driver 'usb_blaster' did not declare which transports it allows; assuming legacy JTAG-only
Info : only one transport option; autoselect 'jtag'
Warn : incomplete ublast_vid_pid configuration
jtagspi_program
Info : usb blaster interface using libftdi
Error: unable to open ftdi device: inappropriate permissions on device!

One other common error I encountered was, this.

$ make load-gateware
..
..
====== Building for: ======
Platform:  opsis
Target:    opsis_hdmi2usb
Subtarget: HDMI2USBSoC
CPU type:  lm32
===========================
Open On-Chip Debugger 0.10.0-dev-00248-gf3b38ff (2016-04-03-07:23)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Warn : Adapter driver 'usb_blaster' did not declare which transports it allows; assuming legacy JTAG-only
Info : only one transport option; autoselect 'jtag'
Warn : incomplete ublast_vid_pid configuration
jtagspi_program
Info : usb blaster interface using libftdi
Info : This adapter doesn't support configurable speed
Info : TAP xc6s.tap does not have IDCODE
..
Info : TAP auto19.tap does not have IDCODE
Warn : Unexpected idcode after end of chain: 21 0x00100000
Warn : Unexpected idcode after end of chain: 53 0x14049800
Warn : Unexpected idcode after end of chain: 85 0xfffffa20
Error: double-check your JTAG setup (interface, speed, ...)
Error: Trying to use configured scan chain anyway...
Error: xc6s.tap: IR capture error; saw 0x03 not 0x01
Warn : Bypassing JTAG setup events due to errors
loaded file build/opsis_hdmi2usb-hdmi2usbsoc-opsis.bit to pld device 0 in 31s 968720us

The warning means that the JTAG firmware on the FX2 has gotten confused. Use the mode-switch tool to switch to serial mode and then back to the jtag mode like this.

hdmi2usb-mode-switch --mode=serial
hdmi2usb-mode-switch --mode=jtag

This has to be repeated several times (up to three) to get the USB into correct jtag mode. So you basically have two terminal windows, one to change USB mode and other to do make load-gateware. One important observation was that if I followed certain steps than make load-gateware works without much repetitions.

  1. mode=jtag, make load-gateware, this gives the above warning.
  2. mode=serial, make load-gateware, while this is in progress (this won’t work ideally)
  3. mode=jtag, the make load-gateware will fail as usb mode was changed during process.
  4. Now make load-gateware, this should work.

Important step is to break the make load-gateware process when in serial mode by changing the mode back to jtag. Not sure why this is happening, but now won’t have to repeat a set of commands a lot of times.

After this make load-gateware succeeds, next connected to lm32 softcore using make connect-lm32, tried the bunch of options available to play around with. Added an image here, of the working setup. Still need to figure out how to load gateware and firmware independently.

Run Length Encoding (RLE)

Github Link

A very simple way to understand run length encoding is to follow this wikipedia link. We are trying to do even better that this in the sense that we will have several levels of RLE applied over and over.

For example:

Simple Example
[AABBAABB]->[#2A#2B#2A#2B]->[repeat(2,A), repeat(2,B), repeat(2,A), repeat(2,B)]

Modified RLE for hdmi2usb-mask-gen

[AABBAABB]->[#2A#2B#2A#2B]->[repeat(2,A), repeat(2,B), repeat(2,A), repeat(2,B)] ->  
[repeat( 2, ( repeat(2,A), repeat(2,B) ) ) ]

Last week I had explored the python PIL library to do basically convert an general RGB image to a list of 1s and 0s. This list was then used to  generate the B/W image again. What remained was write a function to encode a list in a run length encoded form and decode it back from the encoded list to generate the image again.

As given in the Hardware Fader Design doc, we should have a repeat class, which can be then used encode multiple levels of run length encoding. In the case of multiple levels of RLE, decoding should be pretty easy, but encoding is not that straightforward.

I am sticking with single level RLE for now, the encoded values are stored in a 2D list, in which first column corresponds to pixel count and second column corresponds to pixel value. The rows are arranged sequentially, with mapping starting from left top to right bottom along horizontal lines.

Using this idea, I was able to do the complete python mask gen setup:
1. RGB–>B/W->Matrix(1s and 0s)->1D list

  1. 1D list –(RLE encoded)–> [2xn] list with specific format
  2. [2xn] list –(RLE Decoded)–>1D list
  3. 1D list –> B/W image

So the basic idea behind RLE is that the relevant masks when copied to and from memory will consume a lot of memory bandwidth which we are already lacking, hence the mask sequences are generated in an run length encoded form by the C firmware. These encoded values are then decoded by an equivalent hardware block, written in migen. All this should be under the timing constraints of a given frame rate.

Once some python functions were done, next step was to write the C firmware.

The idea is to have function like,

struct wipe* generate_wipe(enum wipe_style type, int position, int length)

This should do two things,

  1. Generate a 2D array corresponding the wipe style and current position as compared to length.
  2. Encode the mapped 1D array according to the run length encoding (as implemented in Python) in a suitable struct.

I have done the two things independently, there is still some issues with integrating them together in a function. The problem is mostly regarding passing a 2D array or a struct array in function.

Upcoming Week’s Task

  • Test heartbeat on hardware
  • Complete the C code for generating RLE mask from templates, cleanup hdmi2usb-mask-gen
  • Documentation on 16 bit binary representation
  • Conversion of RGB+YUV to 16bit float and reverse in Migen

Weekly Report: Week2

Timer configuration in lm32: Timer is one of the many peripherals of the lm32 cpu core, as seen the block diagram figure. The C firmware can access the timer parameters using CSR, or specifically the csr.h file. CSR ( short for Control and Status Register) is a way for lm32 firmware to connect with the existing hardware. Using this we can use data values generated in hardware directly via the C firmware.

We find that several timer parameters are stored from location 0xe0002000 onwards. There are two timing functions defined in <time.c> (or its corresponding header <time.h>). One is time_init(), used in main.c to initialize the timer. Other is elapsed(), it counts if the timer has elapsed a certain number of ticks.

While exploring the implementation of timing function, it was noted that the timer functions counts down and not up. It basically is a counter which counts down at the trigger of system clock. The timer also has an interrupt which goes off when timer_value is 0.

Timer0 functions in csr.h

Function Name Function task Used in
timer0_load Timer value when not enabled, this is the initial timer value when just started time_init()
timer0_reload This is the value timer reloads to when it reaches zero. time_init(), elapsed()
timer0_en Enables the down counting of timer time_init()
timer0_update_value Updates the timer value from timer block to CSR registers elapsed()
timer0_value This gives the current value of timer elapsed()

LatticeMico32Core

Block diagram of lm32 and its peripherals. link

Opsis Board: My Opsis board was supposed to be shipped on 28th May, but the Numato guys were able to ship the board by 25th of May so I received my Opsis board this week itself. Yay!

I figured I didn’t have the relevant cable to initiate the testing, so I brought a bunch of cables (HDMI, USB B type, Adapter) for the Opsis connections. Later found out that power adapter is a bit shaky and sometimes disconnects due to nearby disturbances (Need to replace!). Also arranged for a HDMI monitor through a friend, in exchange for my VGA monitor. I was told that the current firmware might not work with VGA to HDMI converters so this had to be done.

The Opsis was preloaded with color bar pattern on both the HDMI outputs. This was easily tested. Next was to figure out USB connection mode. The first thing was to plug in the USB and figure out what ID the board appears on. Refer to this link for list of USB IDs. Among the several USB modes there are two USB modes relevant here,

Mode Vendor ID Product ID Device ID Serial No
HDMI2USB.tv 0x2A19 0x5442 See Device ID table Device MAC
Older ixo-usb-jtag 0x16C0 0x06AD 0x4 hw_opsis

 

I was initially getting the Older ixo-usb-jtag USB mode, this was corresponding to the Van Ooijen Technische Informatica in the list of vendor names. The dmesg output for this  mode looked like this,

$ dmesg | grep -i USB 

[68853.645703] usb 1-4: new high-speed USB device number 35 using xhci_hcd
[68853.774157] usb 1-4: config 1 interface 0 altsetting 0 bulk endpoint 0x81 has invalid maxpacket 64
[68853.774164] usb 1-4: config 1 interface 0 altsetting 0 bulk endpoint 0x2 has invalid maxpacket 64
[68853.774518] usb 1-4: New USB device found, idVendor=16c0, idProduct=06ad
[68853.774523] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[68853.774527] usb 1-4: Product: USB-JTAG-IF
[68853.774530] usb 1-4: Manufacturer: ixo.de
[68853.774532] usb 1-4: SerialNumber: hw_opsis

I figured this happens when USB is already connected during power cycle. So I disconnected the USB during the next power cycle, waited for about 20 seconds in the OFF mode and then turned ON, followed by connecting the USB. I found the HDMI2USB.tv mode using dmesg. Something like this.

$ dmesg | grep -i USB 

[68680.927026] usb 1-4: new high-speed USB device number 28 using xhci_hcd
[68681.055627] usb 1-4: config 1 interface 2 altsetting 0 endpoint 0x81 has an invalid bInterval 64, changing to 10
[68681.056185] usb 1-4: New USB device found, idVendor=2a19, idProduct=5442
[68681.056191] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[68681.056195] usb 1-4: Product: HDMI2USB.tv - Numato Opsis Board
[68681.056198] usb 1-4: Manufacturer: TimVideos.us
[68681.056200] usb 1-4: SerialNumber: ffffd88039570494
[68681.056896] uvcvideo: Found UVC 1.00 device HDMI2USB.tv - Numato Opsis Board (2a19:5442)
[68681.057778] cdc_acm 1-4:1.2: ttyACM0: USB ACM device

Note the Vendor ID and Product ID as mentioned in the table and found here.

This was the mode I was supposed to be in. I had already built the bit stream for Opsis from the latest github repo, the make load-gateware was showing some errors. Tried the solution suggested by CarlFK. Still the Opsis doesn’t seem to load the bit stream. This will be continued later in the week.

Python PIL Library: One of the other tasks scheduled for this week was to explore RLE encoding for 2D images using python. So the idea is that we can store the mask data in the memory in a run length encoded form. Initially we assume that each of the mask values are binary, ie they take only 1 or 0. I have used the Python PIL library for this. The initial task was to get a black and white (and not greyscale) image which can represent the mask. First the image is converted to a grayscale image, then it is converted to black and white using a suitable threshold. This black and white image is equivalent to a mask. Using PIL library functions, I generate a list of 1’s and 0’s, which correspond to the mask data derived from the images.

This list can now be run length encoded using a suitable function. The actual run length encoding part will be done after this. Now the opposite conversion should also be possible, hence we use the list of 1’s and 0’s to generate the image back. The current code is inverting the color, this was done to verify the working of code. The next task is to figure out optimum run length encoding for the list of numbers. <Github Link>

Plans for next week:

  • Get the Opsis board working
  • Complete python-pil-mask generator
  • Resolve the issues in heartbeat code
  • C code to generate mask for wipes
  • 16 bit conversion in Migen