April 2, 2017
This article aims to leave some practical notes for juniors who will experiment the CPU Experiment next year.
In the Department of Information Science at the University of Tokyo, bachelor 3rd students have to create their CPU and compiler as an assignment. The goal of the class is to run a lay-tracing program with their architecture as fast as possible. Following sections describe what I wish I knew when the experiment started. I hope your good luck.
These books are also useful in other courses.
specs and user guides
Links may be old. Search by your own with guide number. User guides and product guides are those of Xilinx. If a provider change, then search those of the new provider. Especially, be careful that different providers support different subsets of functionality in HDL.
In the CPU Experiment, each team is composed of about four members (this may change next year). Each team must have members who are in charge of core architecture and who of a compiler. Who takes charge of them are chosen by candidacy system first, and next, team members are decided at random. What role other two members handle are arbitral. Each team can decide their position. It was common that the other two members are in charge of a simulator and a library.
Even though assigning a member to be in charge of a simulator is common in many years, I thought this doesn’t work well. There are mainly two points.
Though these two did not occur in my team, it seemed common in many other teams. As a result, it is not rare that a member in charge of core creates his simulator. Based on those, I think that it’s better to make the core responsible for a simulator and creates another position instead of the simulator. Some candidates for that are described in the next section.
In this section, I’ll describe three points to keep in your mind when you develop the development schedule.
First of all, you should make a minimal architecture that satisfies project requirements first. Though it may seem too easy for you, it is beneficial.
Second, use much time for comprehending specifications of FPGA, HDL(SystemVerilog), IP cores, and so on. Though it is tedious and takes a long period, making full use of them is quite important to improve the efficiency of your jobs. A month of learning is worth years of blind works. Do not rush to complete your CPU, I recommend. Even creating a position who finds out the usage of tools and looks through specifications is a good choice, I think.
Third, testing takes much time than you think. Simulation with the simulator embedded in IDE is slow. Making good test cases also takes time. On combined test, it’s hard to detect where and when the bug came to light. Assigning members a job to help to make test cases and doing simulation will be much help.
In designing CPU, it is crucial to detect bottleneck and improve where the bottleneck is. Though I created a CPU with functionality like out-of-order, superscalar, speculative execution e.t.c., not all of them much contributed to speed up despite the hardness of their implementation. Primarily, I felt that the superscalar, which was most difficult to implement, did not contribute well to increasing speed. Before you try to make a complex CPU, think again and again whether it improves speed and search any other cheaper methods which can alter the complex one.
In the implementation of CPU, unifying the way of communication between modules will help. I’ll show you a basic pattern I used for that purpose. It passes data only when both a valid flag and a ready flag is high. Of course, you don’t have to follow this but I recommend you to think your own way.
A queue was a component I used repeatedly. I adopted an implementation which does not move elements but only the pointers, like ‘commit ring’, for all of them. I found it makes it easier to follow the waveform on the IDE’s simulator. (Though I planned to put a digram of it, I won’t because of my laziness.)