New Optimization Chip Tackles Machine Learning, 5G Routing
“If you want to solve a large problem with a lot of data—say one million data points with one million variables—ADMM allows you to break it up into smaller subproblems,” he says. “You can cut it down into 1,000 variables with 1,000 data points.” Each subproblem is solved and the results incorporated in a “consensus” step with the other subproblems to reach an interim solution. With that interim solution now incorporated in the subproblems, the process is repeated over and over until the algorithm arrives at the optimal solution.
In a typical CPU or GPU, ADMM is limited because it requires the movement of a lot of data. So instead the Georgia Tech group developed a system with a “near-memory” architecture.
“The ADMM framework as a method of solving optimization problems maps nicely to a many-core architecture where you have memory and logic in close proximity with some communications channels in between these cores,” says Raychowdhury.
The test chip was made up of a grid of 49 “optimization processing units,” cores designed to perform ADMM and containing their own high-bandwidth memory. The units were connected to each other in a way that speeds ADMM. Portions of data are distributed to each unit, and they set about solving their individual subproblems. Their results are then gathered, and the data is adjusted and resent to the optimization units to perform the next iteration. The network that connects the 49 units is specifically designed to speed this gather and scatter process.
The chip could be scaled up to do its work in the cloud—adding more cores—or shrunk down to solve problems closer to the edge of the Internet, Raychowdhury says. The main constraint in optimizing the number of cores in the prototype, he jokes, was his graduate students’ time.