- To solve the problem of misalignment in output, need to make some changes in HDMI_OUT framebuffer core
- As per suggestions, I will adding another dma engine to one HDMIOut, such that the pixel read from both the stream is syncronous and timing mis alignments don’t occur.
- Trying to understand the various cores at video output, to add another DMA reader
- After much reading and understanding added another dma reader along with a bunch of changes
- Adding new modules wherever necesary and changing layout for the rest of them
- Some block diagrams in the link
- Added a link to video here
- Need to find solution to this, but its already 7:20 in the morning.
As I write this, I just noticed only two weeks (plus some more days) left in the official coding period of Google Summer of Code, some much time has already passed, hoping to complete the milestones by that time. This week’s focus was to first implement a static mixing of inputs, and once that is figured out in hardware, change the multiplier value from firmware to do dynamic mixing.
This work pertains to work done from 23rd July to 29th July.
Adding modules to Video Pipeline:
Last week I was adding my defined modules for float arithmetic the input video pipeline, that is in the hdmi_in files. Though this was a good test to check the working, the modules are supposed to be added to output pipeline. (gateware/hdmi_out/phy.py).
Major tasks done this week,
- Fixed bug that causes missing color in gradients: This was a bug in floating point multiplier unit which was causing random colors in the gradient to be missing. This wasn’t spotted in simulation before as dynamic testing with different inputs wasn’t done for testing floatpoint units. This bug was basically because of error in pipeline, stage 5 was copying a value from stage 3 instead of stage 4. A bit difficult to figure out this error, because when a lot of data is going in the pipeline, output is not very easily decipherable. What I noticed was that frac of current output was dependent on frac of previous outputs, And after that it was really easy to fix this.
- Adding mixer block in hardware, with input to adder and mult hardwired
Once the modules seemed to work perfectly in simulation, all the modules were added together with same layout (other inputs of add and mult hardwired to zero and one respectively). In this case the layout at floatadd and floatmult is always, rgb16f_layout. This is good, because, same layout means that we can easily connect using Record.connect() method. This was tested to work perfectly. Next task was to check if mixing was done correctly, for that I needed to connect modules with different layouts.
- Figure out connecting blocks with different layouts:
For connecting two PipelinedActor modules we generally use Record.connect() method which by default connects all the other signals (like ack, stb ) apart from payload signals. I first tried to define two sinks for floatadd module, this would have been perfect, but I encountered an error. After a lot of asking around and not getting the exact answer on how to proceed, I decided to dive myself into the libraried that define these classes and methods. I mainly focused on how Record.connect() was implemented and stuff the class the inputs were derived from.
So apart from the usual payload signals, the source and sink of a pipelined module have four other signals defined, which control the flow of packet from source of master to sink of slave. These signals are namely stb, ack, sop, eop. Information about stb and ack can be found here. The other two, sop and eop, refer to start of packed and end of packet. So we basically want to know the equivalent connections for Record.connect()
# This is the Record.connect() method Record.connect(ycbcr2rgb.source, rgb2rgb16f.sink), def rgb_layout(dw): return [("r", dw), ("g", dw), ("b", dw)] # This is alternate way of doing equivalent connections rgb2rgb16f.sink.r.eq(ycbcr2rgb.source.r), rgb2rgb16f.sink.g.eq(ycbcr2rgb.source.g), rgb2rgb16f.sink.b.eq(ycbcr2rgb.source.b), rgb2rgb16f.sink.stb.eq(ycbcr2rgb.source.stb), ycbcr2rgb.source.ack.eq(rgb2rgb16f.sink.ack), rgb2rgb16f.sink.sop.eq(ycbcr2rgb.source.sop), rgb2rgb16f.sink.eop.eq(ycbcr2rgb.source.eop),
More information about significance and implementation of these signals is in this document (page 23).
- The rgb16f2rgb, unit didn’t have a mechanism to treat overflows of float values correctly. When loaded with float values greater than 1.0, the output was coming out as 0, but it is supposed to saturate to 255. This was changed in rgb16f2rgb file, by adding a simple condition at the output.
- While, adding eq statements for Record.connect() equivalence, as discussed above, I was adding the .eq statements in opsis_video.py file, but didn’t seem to working. After about a day of random tries here and there, I found out that I hadn’t wrapped .eq statements in a self.comb += . Caused a lot of delay because, each compilation cycle in hardware took about 20 minutes and that made it very difficult to spot the error.
Things seemed to be mixing well after this. Though there is some alignment problem due to timing inconsistency, which I need to figure out. For static testing I hardwired the mult with 0.5 value and connected output from HDMI_OUT0 and HDMI_OUT1 to floatadd unit of HDMI_OUT0. Later added a CSRStorage register, to dynamically vary the multiplier value from firmware to create a fade like effect. Added supporting functions to ci.c and other functions for maintaining timing.
- Updating my resume for the Institute Placement Season, first line in resume is about GSOC 🙂
Continuing with my regular weekly report, I am now in 9th week of Google Summer of Code. My regular college semester has started from this week and I am required to go to lab for my thesis work, so there haven’t been a lot of updates here. Hoping to complete a lot of things during the upcoming weekend. This report pertains to work done from 16th to 22nd July.
First half of the week was spent fixing errors and issues in previous work. One issue I was facing since a long time was the regarding testing and porting to litex. Like always the errors were some silly mistake on my part, and took longer than required to debug.
Testing of float16 using CSR:
The idea was to use the pipelined hardware block to compute floating point multiplication of two numbers in hardware. The numbers were given through firmware by using CSR registers defined in hardware. After correctly initialising floatmult datapath in target/opsis_base.py, and adding the relevant csr functions in ci.c, I expected the output to be correct but somehow input was not transferring across. Which I thought was because incorrect implementation of CSR registers as wires, but it turned out way of connecting at input side was incorrect. # Incorrect way self.comb += [ self._float_in2.storage.eq(sink.in1), self._float_in2.storage.eq(sink.in2), self._float_out.status.eq(source.out) ] # Correct way self.comb += [ sink.in1.eq(self._float_in1.storage), sink.in2.eq(self._float_in2.storage), self._float_out.status.eq(source.out) ]
Porting migen designs to litex for review
I was facing an error while porting my floating point arithmetic migen code to litex for review. Command line output of that error, link. Since this was coming up only while converting to litex, I thought there was some error with litex functionality. Later found out that this was because int variables being generated using numpy functions and hence the int variables were of numpy.uint type which was not correctly recognise by Signal object.
Although I had started working on run length encoding things after this, my mentor mithro suggested that I should first look at implementing mixer equation using a static mask for now.
The first step was to add all the submodules to video pipeline to check that they follow the timing constraints. I found that the existing color space conversion was done in gateware/hdmi_in/analysis.py in the FrameExtraction class. For testing purpose I added the rgbtofloat16 module and reverse module to this pipeline, connecting output of one to input to another in the hope of getting the exact same input. This also meant that the float16 color space conversion modules were tested in the hardware for the first time. The next step was to subsequently test floatmult and floatadd units by adding them in pipeline.
There were some issues in uploading the bit stream, which I eventually circumvented and here is a link to picture output when all the things are added in video pipeline. The picture is not too sharp, and I feel that there is a missing color in the gradient. I am gonna check this for static image in simulation.
Issue: The generated bitstream (which took close to 20 minutes each time), once uploaded didn’t work and the USB connection was even lost after this. My mentor mithro suggested that I should use the lightweight videomixer target, ($echo TARGET=video), and the problem might be because of FPGA resetting FX2 IC after it boots. This was basically bypassing the problem.
Task for Upcoming Week:
Right now I am trying to figure out where exactly to add video mixer equation in the bunch of gateware python files. I think I should add this at output side, because in that way I can operate on pattern input as well. The idea is that I will setup the standard video pipeline with one input for floatadd left open, for hdmi_out as well. And in the target/opsis_video.py file I will connect the unconnected input of floatadd of hdmi_out0 to output of floatmult of hdmi_out1 and vice versa. Once this is figured out, I will add CSR register to set the multiplier value from firmware.
- (Spotted and) Solved overflow bug in rgb16f to rgb conversion module, the module was not considering greater than 1 float inputs in design, and was giving 0 output for them.
- Spotted a crucial mistake in connections of mixer block, was not including eq. statements inside a self.comb += .
- Added support in firmware to see dynamic fade in and out between two inputs.
- Some problem with sync and acks as the position of output screen is not correct.
- Trying to figure out the correct way to connect two PipelinedActor modules, without using Record.connect().
- Trying to get some help from #m-labs people
- Reading up on other things mean while, encoder hardware code and gamma correction
- Fixed a small bug in pipelined implementation of floating point multiplier. This bug was causing a color to be missing in gradient shades, and later in simulation it was found that this happened only when pipelines are given different inputs in each clock cycle. The bug was caused dues to taking a input from stage before the previous stage in pipeline.
- Adding mixer block in hardware. Facing some issues, documented the description of errors neatly here,
- Started working on adding my predefined float16 conversion and floating point arithmetic modules to video pipeline.
- After few errors everything added perfectly to the pipeline.
- This is defined for only one input, now figuring out how to connect values from two hdmi_in
This was a short week as I had to go out of town for three days. I have mostly worked on floating point conversions and arithmetic modules this week.
Float 16 Arithmetic: link
- The float 16 multiplier which I had already completed in migen, I ported it to litex to send out a pull request and get it reviewed. While porting I was getting a error while trying to run the testbench. I have added this pastebin link in the pull request comments as well. From my understanding, somewhere in the streamer something is not defined as a Signal object. Which is causing this error. This was working fine in the migen environment, and I suppose there is some difference in implementation of streamer in litex.
- Completed a five stage pipelined version of 16 bit floating point adder. This has been done under the migen environment of HDMI2USB-misoc-firmware. After some bug fixing, this has been tested to be working in the migen simulation environment. Also ported this to litex for sending out the pull request and getting it reviewed. I encounter the similar error, which I encountered while porting the float mult module.
- I sent the pull request later in the week, made the required changes as mentioned in the comments. This was mostly correct variable names and adding module level and class docstrings.
Float 16 Color Space Conversion: link
- I had already sent out a pull request for this last week, but since I was focussing on other this laster last week, couldn’t look at the pull request comments. Among many things I added a generic submodule for leading one detector. Earlier this was done using a number of if else conditions, and now done using a simple for loop, though I feel that this has made the simulation slower. Fixed some variable names and added more for loops to clean up the code.
- From the the discussions I had with mithro and _florent_, it was suggested that it might be good to explore this implementation using a lookup table. At least for the int8 to float16 conversion, as the size will be only 256*16bits. I added the contents of lookup table using the already defined int8 to float16 test model functions I wrote for testing. Can do a similar thing for the reverse conversion, but the size of lookup table (65536*8bits) might be too big to be better than the earlier implementation.
Fixed the CSR issue for hardware testing of float arithmetic module. Code