One of the first tasks for project would be to integrate a linear mixer block in the existing hdmi2usb firmware or gateware. As per discussions with mithro it was decided that we should always try to implement things in C firmware first, rather than doing it in gateware, as it is easy to understand and maintain things in C firmware. Firmware refers to the high level C code for lm32 soft core and gateware refers to everything written in hardware description language, uploaded directly on FPGA.
I started with reviewing the existing C firmware code. Initially this wasn’t in a very structured way and reading someone else’s code turned out to be a lot more difficult than I would have imagined. My approach was to understand where in the firmware I can change the existing pixels values. To make things more simpler, I focused on implementing the heartbeat pixel <add link for issue> for now. The natural approach was to track from the point they enter the system at HDMI input hence I started with hdmi_in file and later built up upon it.
C firmware code: So the firmware supports 3 possible inputs (HDMI_IN0, HDMI_IN1, PATTERN) and 3 possible outputs (HDMI_OUT0, HDMI_OUT1, ENCODER). Pattern is inbuilt generated, predefined patterns generally cover all the colors and used for testing and encoder is JPEG encoded values for transmission over USB, encoder is used to save bandwidth.
Here is basic memory architecture, as seen in the attached image each input has predefined memory locations at 0x01, 0x02, 0x03 for in0, in1 and pattern respectively. At each of these location predefined memory is allocated for 4 frame of 1920×1080 pixels. Each pixel takes 16bit of memory (not 24 bit because 4:2:2 chroma sub sampling). Also each memory location stores 32 bit of data. Hence, each memory location stores 2 pixels.
So I figured out that the lm32 core is not involved in actual copying of pixels, the pixels are copied using the existing DMA engine in the gateware. DMA engine is a hardware block which can copy data from memory location X to memory location Y, also auto incrementing pointers at one or both the locations. A generic DMA operates without CPU being involved and hence CPU overhead is minimum.
So, this basically meant that I can’t modify pixels when they are being transferred from input to memory because they are not accessible via the C code, as CPU is not involved in transfer. So this basically means that, to modify the pixels we need to access the memory again. Since we have figured out the mapping of inputs in the existing memory locations, we can change pixel values at appropriate locations (4×4 pixels at right bottom) in memory to red.
Color Space conversion: The input color space format is the standard 8bit RGB format. Our human vision system is more sensitive to overall intensity rather than color difference. So to save bandwidth and memory space, the input pixel color space is stored in YCbCr format. YCbCr, Y is luma component and Cb Cr are the blue difference and red difference chroma components. There is a linear relationship between RCB and YCbCr color space and they are converted using one of the gateware conversion blocks. csc block
Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system’s lower acuity for color differences than for luminance. In the standard YCbCr each of channels are sampled at same rate, we use chroma subsampling to implement lesser resolution for Cb and Cr channels. The current firmware uses the YCbCr 4:2:2 format for storing the input pixel data. This means that for every 4 pixels, we store Y (luma) for all four pixels, and we store Cb and Cr (chroma) for only 2 pixels. Now using this information we try to understand how space is allocated for storing pixel values. Each location in memory can store 4 bytes, we are currently using 8 bit resolution for each of channels, hence each memory location can store data for two pixels. Y(luma) for both the pixels and Cr and Cb which is average of the Cr and Cb of the two pixels. Detailed explanation
Once this thing is understood, we can simply change the pixels at appropriate locations as per the color format to get a red box at the right bottom. To implement the beating thing, we now need to understand the timing related functions of lm32. This will be the natural next step in the process.