Monday, February 16, 2015

Proposed Proof-of-Principle Board Design

The final design goal I have in mind is to reproduce the asynchronous processor array structure of the GA144 chip on a PCB size scale using FPGA parts as the individual cell elements. The most natural structure for such a design would be an 8 x 8 array of processors. But a project like this could easily cost up to about $2000. That’s more money than I can afford to spend on just an experiment. So before I walked down that pathway, I want to do a small proof-of-principle prototype to confirm that my hardware design ideas will work on the larger scale.

A proof-of-principle prototype needs to be quick, simple, and, while reduced in size, still retain enough functionality to be able to verify the design ideas for the full-sized end design. At this level, a number of design constraints come into play.

First is the allowed board size. I work through Sunstone Circuits for my PC board fabrication needs. Their pricing structure for four layer PCB's works out so that for boards 10-in² or less, the price is half of the next size up. Since cost is a hard constraint for this project, this sets my board size to 10-in² or less. Sunstone also has a minimum order of two boards. So if I am clever and layout a single board in a symmetric fashion, I can take the second board and attach it to the first, end-to-end, and effectively have a 20-in² PCB. I’ve already tried this trick in my PCB layout tools, and it works great.

Second, the choice for the FPGA part is pretty straightforward for me. The last few years I’ve been working exclusively with FPGA parts from Lattice Semiconductor. Since I have all the software and programming tools in place for Lattice parts, this is the natural choice for me.

Since I’m going to hand-solder these parts, any kind of ball grid package is off the table as a design choice. This leaves the leaded TQFP parts as my only option.

In terms of logical complexity, my goal is to create individual cells in my asynchronous array which are complete stack-based processors with sufficient memory to be able to run more than simple routines. Also, having DSP functionality in my FPGA part would allow it to work as a neural network element.

Third, since an 8 x 8 array will be a total of 64 processors, cost will be a critical consideration. Adding all of these constraints together and going through Lattice Semiconductor’s product offerings, there is one part that stands out; that is the ECP2 family of processors, with the particular part choice being the LFE2-6E-5T144C

Fourth is the design of a ring oscillator. This turned out to be not as straightforward as I thought going into this project. This is a subject which I think will deserve a separate blog post. But for now, the choice seems to have come down to an active delay line as the timing source for my oscillator.

And lastly, there is the question of how to get data and updated program code into and out of the array. While I do have a UART written in Verilog that I can port to one of the FPGA parts, I’ve chosen to leave that out of the array elements and tack on an eight-bit PIC processor from Microchip. The whole idea is to do this proof-of-principle project as quickly as possible. So even though adding an extra processor to the array design might seem like extra work, it isn’t. The reason why is that I’ve been doing design work for the PIC processors for years, and I have a lot of source code as well as schematics and test software written in LabVIEW already that I can immediately make use of.

So this is where the design is: a single board, of approximately 9.5-in², that will contain two FPGA parts and one PIC18 part, along with RS-232 drivers and a 1.2 volt power supply for the FPGAs. One board laid out in a symmetric fashion like this can be rotated and soldered to itself to form a 2 x 2 array of FPGAs with PIC processors on diagonal corners. As work goes, I will post the schematics, part lists and artwork Gerber files on my regular website, http://www.WildIrisDiscovery.com/

Building a Useful Asynchronous Array, Design Project or Mythical Quest?

Over the years I have not been shy about expressing my opinions on the GA144 and the earlier SEAForth parts. But now that I’m embarking on creating my own asynchronous array processor design, I find myself facing probably the same questions that the designers of the GA144 must have had to face as well.

In years gone by, at various SVFIG meetings, I had a chance to encounter most, if not all, of the original designers of the GA144 chip. All of these individuals struck me as top-tier design engineers. So it’s always been a mystery to me how such talent could have come up with such a deficient chip design. As I hope I’ve made clear already, I believe the core architecture of the GA144 is a brilliant piece of engineering. But it never seemed that the people who made it stopped to give a single thought about how or for what you would use a chip like this. I’ve been doing embedded systems design for a decade and a half now, and there’s a lot of standard design challenges that always have to be addressed, no matter what the final intended use of your embedded processor. But when I look at the GA144’s I/O pinout, it gives absolutely no indication that the designers of the chip gave any thought at all to the kind of circuits and/or applications that such a chip might be embedded into.

The assumption I’ve worked under all of these years was that the designers of the GA144 must have faced some very severe engineering design compromises when creating this part, and that its design came down to the question of trying to make it a general-purpose processor or an application-specific processor array. Not being able to make the hard choice between one or the other of these design directions, they chose to give the GA144 a little bit of both and, as a result, ended up with a part that was good for neither of these application options.

What the GA144 wanted to be was a SOC, a system-on-a-chip processor. If you go on the web and look at the various projects that have been done using the GA144, (for example, the video in the last blog post), you’ll see that they are all using this part in the same manner you might use a SOC processor part. That being the case, the folks at GreenArrays would have been better served by placing fewer cores on the die, giving each core a little more program memory and functionality. With fewer cores on the die, that would have allowed more of the individual cells to have pin-out access to the edge of the chip. Unfortunately for the commercial success of the GA144, there is no end of SOC processors on the market already. If the GA144 was to compete in that arena, its design would have to have been quite a bit different than it is.

But now that I’m starting on my own design, I am struck by the possibility that the reason the GA144 seems to be an ASIC chip with no specific application in mind is because there was in fact no application for such a processor architecture in the first place.

The reason I started this project to build a board size equivalent of the GA144 is because I, too, am star struck by the potential for what can be done with a processor architecture based on an asynchronous array of processors. It just seems there has to be something out there that such an architecture should be uniquely good for, but the GA144’s lack of pinouts doesn’t let you imagine any designs that would take advantage of its array structure to begin with. I thought naïvely that if I could create an asynchronous array structure with all the pins brought out to the edges, that the use for such a part might then become evident. But thinking about this the last couple of weeks, I’ve run into nothing but dead ends.

Areas where you might use an array of processors like this are either in neural networks or in parallel processing applications. But neural networks are best done in software running on high-end general-purpose computer systems. And parallel processing applications, if implemented in hardware, are best done using synchronous arrays. The only advantage to an asynchronous array over a synchronous one is that individual cells only draw power when they are active, and this advantage only comes into play if most of the cells are inactive at any one time.

What this all points to, is that any application an asynchronous array might be useful for doesn’t fall into any of the standard categories that you’ll find in the literature for processor architectures. I’ve been taking advantage of my alumni privileges at UC Santa Cruz to visit the library and search the literature. So far, I have not run across any paper or proceedings that describe a process that could best be implemented in an asynchronous array of processors.

In other words, after all of these years of criticizing the GA144 and SEAForth parts, I’m now coming face-to-face with the reality that, in fact, there is no application for which these parts are uniquely suited. So my opinion is now starting to think that the GreenArrays design team must have fallen under the same spell that I have been; star struck by the potential for such an array architecture, but not having any idea about what such a processor architecture might be good for. Rather than let go of the array processor concept and focus on creating a SOC targeted design, they chose instead to adopt the “Field of Dreams” business model; that is, if you build it they will come. They did the best design they could with the resources they had and then hoped that once the chip was out there, that someone, more inspired than they, would find an application for it. But sadly, that outcome ended up not being the case.

It appears that this endeavor to build an asynchronous processor array is becoming less a design project and more of a quest for that fabled application for which such an array would be uniquely useful.

The project is thus breaking down into several aspects. The first aspect is the creation of some kind of working hardware. I’m one of those people who thinks best with their hands. It’s a lot easier for me to express my creative ideas in building a piece of working hardware than it is to just work things out in my head, and then write them down on paper.

The second aspect will be a thorough search of current and past engineering literature on the subject of processor arrays to see if anyone has done anything like this before. If this search comes up empty, then a third aspect will be to look for inspiration in any direction I can find it.

Saturday, February 7, 2015

The GA144

The inspiration for the direction I’m heading in design-wise is the GA144 processor; an asynchronous array of 144 separate processors arranged in an 8 x 18 matrix, all fabricated together on a single die. So to understand where my design ideas are headed, it might be best to start with a description of the GA144, both its strengths and deficiencies.

For those not familiar with the GA144 from GreenArrays, here is the best short exposition of how the GA144 chip works and how it’s programmed that I’ve seen. "FD 2014 Daniel Kalny".

More information can be found at the GreenArrays web site, www.greenarraychips.com/

The GA144 is without a doubt one of the most frustratingly disappointing pieces of silicone ever made. So much so, that I’ve personally taken to calling it the Stephen Hawking of computer chips, a brilliant mind stuck in a useless body.

There is no end of projects that can be built around the GA144. The above YouTube video from Daniel Kalny is a great example of this. But the sad truth is that from a commercial point of view, the same functionality can be fit into any number of standard processor parts from companies like Microchip, Silicon Labs, TI and etc. that are both cheaper cost-wise and easier to program using already industry standard compiler tools. So, despite its potential, the GA144 has remained a silicon oddity without attaining any commercial success.

An integrated chip this intriguing in potential begs to be used for something. But the big question is, what? This is where the problems start for the GA144. Every commercial application you might think of ends up requiring more I/O pins then the designers of this part gave it. In other words, the GA144 seems to be an ASIC part designed with no specific application in mind.

The only way to get signals/data into or out of this part is through a handful of cells along the edges of the array. Which means trying to get signals to any of the internal cells of the array requires your data stream to pass through all the cells in between the edge and the one you’re targeting for data transfer. So a lot of the cells in the array end up functioning simply as connections between adjacent cells. This would be okay if each of the cells had sufficient program memory to be able to store more functionality than just acting as one member of a bucket brigade transferring data across the array.

Another frustrating aspect of the GA144 is that it was not laid out in a symmetric fashion. That is, you can’t tile single GA144s together to form a much larger array because the top/bottom and right/left edges of the chip don’t match up pin to pin. This means if you did try and use the GA144 as a tile element in a much larger array, neighboring GA144s would be forced to talk to each other through a single SERDES link between one chip and the other. This again might not be a stopping point either, except for the fact the GA144 only has two SERDES links, and both are located on the same side of the part. As I ran into design details like this last observation, it just made me want to pull my hair and scream, “What were you idiots thinking when you made this part?!”

On the other hand, there are some amazing things about the GA144. First is the asynchronous operation of the individual cells. Each cell has its own ring oscillator for its internal clock. This ring oscillator only turns on when the cell is accessed by one of its neighbors, and it only stays on until the cell completes its current program call. The cell then goes to sleep and waits until it is accessed again. The result is you have an array whose current draw can be as little one-percent or less than that of a comparable FPGA part.

This by itself might not seem like a big deal, but from a hardware design point of view, this is huge. It’s not uncommon for processors and FPGA parts to draw currents on the order of amps. For a single processor on a board, such current draws are not an issue. But if you want to start creating large arrays of processors, you’re very quickly looking at thousands of amps of current to power your processor array. This becomes a huge wall to designing large processor arrays. The low current draw for the asynchronous array concept means that such chips can be tiled by the thousands and still be run on a few tens of amps power supply.

Another positive about the asynchronous array concept is that most of the new parts get their speed into the gigahertz range by making use of pipelining in their internal structures. The asynchronous array is naturally pipelined just by its construction. Each cell in the array does its little thing and then passes the result on to the next neighboring cells. In this way, a process passes like a wave through the asynchronous array, starting from one edge and flowing through till the result comes out the opposite edge. One can take advantage of this by having multiple waves of processing going on simultaneously. Another trick for matrix operations is to have the matrix element come in one side of the array while constants for the matrix operation are flowing in from a different edge of the array, with the result flowing out yet another edge of the array.

But for the GA144, trying to use this trick for matrix multiplications just doesn’t work. Again, it comes back to the fact that there are not enough I/O pins around the edges of the array to get data in and out of the processor at a pace that can keep up with how fast the GA144 can go.

And yet one more aspect of the GA144 that ends up just teasing you with its potential is that each of the cells is a stack-based processor element. For those not familiar with programming for a stack-based engine, the best example might be the old HP calculators that ran what is called a Reverse Polish Notation programming structure. Rather than store data to be processed in registers, everything is pushed onto or popped off of a stack; with the ALU element just working on the top of the stack.

For example, for a register-based processor, you would write {4 + 5 =}, but for a stack-based engine you would just write {4 5 +}. Programs written for stack-based engines can be very compact and run very fast. But they can also be frustratingly impossible for most programmers to work with because they force them to pay absolute attention to the order that operations are done in. In other words, when programming for a stack-based processor, you can’t just give variables names and then let your compiler tools worry about the exact machine level code that your program generates.

The linked YouTube video above contains a number of examples of program coding for a stack-based processor. The reason such code examples look so cryptic to those not familiar with programming in such languages is that the visual clues most programmers look for when reading a piece of source code aren’t there; those visual clues are hidden, so to speak, in the order that the operations are performed.

This latter observation is why stack-based programming languages like Forth never made it commercially. Writing in assembly for a register-based engine is already beyond the ability and patience of most programmers. Then, adding the extra frustration of also having to keep track of the order you do things in becomes “a bridge too far”.

On the other hand, when you’re working at the level of a tiny processor core that’s trying to make maximum use of the silicon resources available, stack-based engines come into their own and are probably the most efficient processing structure to use at this cellular level.

My goal for the next few months is to see if I can re-create the GA144 asynchronous array structure using discrete FPGA parts; in other words, recreate on a printed-circuit-board size scale, the structure that’s found in the GA144 part at the silicone level. By going this design route, I can then give myself access to all the input/output pathways that the pin-out of the GA144 part doesn’t give you. I will thus be able to explore the full range of functional possibilities that such an asynchronous array structure can bring to the table. The other advantage is that, by using discrete FPGA’s for the cells, I can give myself many more options in terms of programming at the cellular level of the array. (More on this last point in future posts.)

New Directions

After letting this blog sit idle for several years, it’s time to revive it and send it off in a new direction. Between work and my kids, I haven’t had any time for personal projects for quite some time now. But since turning 63, I have been able to sign up for Social Security. That’s taken some of the financial pressure off my work schedule, so I now have a window of time again in my life to pursue some personal interests.

In the decade and a half since I started working on robotics, the field has matured significantly. Some of the ideas that I was playing with, like distributed processing, which was still unimplemented in any commercially available products 10-15 years ago, are now appearing as off-the-shelf products in the industrial robotics market. So I don’t feel I have anything more to offer in that direction.

As far as homeschool robotics and physics lab support, companies like LEGO, Pasco, VEX and Pitsco/TETRIX have fully taken over and now completely dominate the educational market in the public school, private school and homeschool arenas. So again, I don’t think I have anything as an engineer/designer to offer.

And while it is still something I’m strongly interested in, advancement in agricultural robotics has to wait on machine vision getting a couple of orders magnitude better than it is right now.

Even though I haven’t posted anything new here over the last few years, I’ve still been watching each of those areas in industry listed in my header. And in my wanderings in the fields of robotics and AI, there is still one area that remains open and undeveloped, and that’s where I’m going to focus my energies next and see what I might be able to produce.

This unexplored area falls under the heading of neural networks. This gap in development that I sense can be best expressed by noting that neural networks have, for all practical purposes, become a software/programming field. There is essentially no one working to develop the underlying hardware necessary to implement a practical neural network. The reason I feel I have an advantage is that I’m essentially a hardware designer first and foremost. And when I look at the software side, I see things that could be implemented in hardware much more effectively. But as I search through the literature on Google, I don’t see anyone developing these kinds of ideas at the hardware level.

In all of the hardware examples of neural networks I come across, the individual cells are hardwired together, right from the start. No one seems to have developed an underlying hardware platform that can spawn new connections between individual cells, grow them, and/or prune them when they prove no longer necessary. It would seem that if one could come up with a hardware system that could evolve new neural connections the same way the brain can, this would be the ideal platform for neural networks. I have some ideas on implementing precisely this kind of structure in hardware. So for the next few months, that is the direction I’m going to head in. And I will use this blog as a sort of journal to post my progress (If there is any, that is).

I may not have any luck in this endeavor, but it will be fun nonetheless.