Thursday, February 23, 2017

Langton's Ant, Part 1

Good news is that a full simulation routine, written in C, and running under Linux is working. As a validation test for my coding I successfully ran Langton’s Ant, the results were animated, a video made, and posted to YouTube. 

The first pass of my simulation routine used Ncurses for text graphics display.

After I had debugged my code using a simple text graphics display, I redid the code to generate binary output data-files. Those files were in turn converted into bitmap graphic file format using a program I’ve written in LabVIEW. Lastly, with those files in turn, I could then animate using the software package Frames.

Frames might not be the best way to generate animations, but it was left over from our family’s home-schooling days when my son was working with stop motion animation. It’s a frustrating package to use, but I have it and it works… if you’re patient enough. After the animation was done, the actual video was put together using Adobe Premier Elements: version 9. Again, a software package left over from my son’s home-schooling days.

The actual simulation program was running 2 weeks ago, but it took me a week of messing around to get the animation sequence into a form I was satisfied with. I haven’t used these animation tools in five years. It took several days just for me to get back up to speed with using my old software.

The next step for me and this project is to create a Langton’s Ant program that starts out already in its stable “highway” configuration. Since the “highway” is a 104 step repeating pattern, one would expect that there will be countless ways to seed a Langton’s Ant program in this way. The question though, is there a minimal set? Or is that even a meaningful question? The next step in this direction is to recast Langton’s Ant as a two-dimensional Turing machine and analyze the system from that point of view.

For anyone finding this blog via the YouTube video post, I think I should emphasize again that Langton’s Ant is not my goal. My interests are in the underlying hardware that it will take to create a large asynchronous array of simple processors. I’m just using Langton’s Ant at this point as a simple and accessible test example.

I have recently come across the work of Dave Ackley. I’ve watched all of his posted YouTube videos and have downloaded a couple of his published papers. This paper in particular is what I’m working through right now: “A Movable Architecture for Robust Spatial Computing”, David H. Ackley*, Daniel C. Cannon and Lance R. Williams  If this is any indication, I have a lot of work ahead to catch up to the current state of the art in this field.

One way to characterize this project of mine is to see it as trying to develop the underlying silicon hardware, FPGA or ASIC, that you would need to re-create in hardware what Professor Ackley is trying to do in software. But I can’t speak for him about that. Maybe someday I’ll have a chance to talk to him and find out. 

On the more academic side, I’m steadily working my way through Hamilton’s book. I’m struggling a little more with the video lecture series; mostly because I find it very hard to sit still and watch them on the computer screen. It’s clear my brain has gotten pretty rusty over the last 25 years since I left grad school. My goal at this point is, over the next six months, to come up to speed on Turing machines and all the mathematics that goes behind them.

Sunday, January 29, 2017

Array Processor Simulation: Current Progress

The first pass of a simulation routine, written in LabVIEW, is working. I probably spent more time debugging the handshaking routine than anything else, but it’s working now. I can generate a seed code-worm that starts at cell (0, 0) then travels across to a destination cell.

In this debug stage, I need to display all the registers so I can watch what is happening within each of the individual cells. This verbose display mode means that I can only fit a 4x4 display array on my monitor. But a 4x4 array, at this first pass stage, was good enough to get things working.

I have an old Windows-XP legacy system that I kept available for periodic maintenance work on projects that I did for clients years earlier. Now that I’m retied and don’t need it anymore; time to turn it into Linux system. I’ve ordered a new hard drive and will upgrade this old XP system to Linux running Debian. The hard drive should arrive next week. At which point I’ll start the second phase of the array processor simulation project.

On the cellular automata front, it only took a little bit of research to find out that Stephen Wolfram, of Mathematica fame, has already done a significant amount of work in this area. I checked his book out from the library, “A New Kind of Science” and have started reading it.

My intent is to start with duplicating some of the classic cellular automata featured on YouTube, like the Game of Life and Langton’s Ant. The challenge is going to be how to visualize the time evolution of such systems; especially in the case of arrays on the order of 100x100 and bigger. Needing to display the progress of such a large array on a standard monitor means I can’t use much more than a 10x10 pixel area per display cell. Outputting numbers in this case is out of the question. So, what I have to explore next is ways to use colors to encode what each cell is doing.

For the simple examples I will be starting with, just displaying either a light or dark shaded area will be sufficient. But as I progress to more complicated relationships between the cells I’ll need to try something different. One obvious trick is to use the assembly language “org” instruction to partition various sections of each cell’s program-code into different segments of program-memory. Then assign different colors to each of these code segments; that way encoding the program-counter as a color will indicate which subroutine is active at any one time. What other cell-variables will be relevant to visualize, and how to visualize them with color-encoding is something I need to work on in the weeks ahead.

Next, is to start making “evolution” videos of these two-dimensional cellular automata and watch to see if any patterns appear to emerge. And regarding this effort, something worth noting is that when it comes to pattern recognition, the human brain is the best visual processing system we have access to. Even Google, Facebook, Stanford, MIT and the rest, even with their unique access to massive amounts of computational horsepower, and even using advanced techniques like deep learning, are still struggling to visually recognize a cat in a picture; something that humans can do quite easily.

I need to remember that my goal is not to just recreate work others have already done, but to explore the possibilities of using code-worms as a way of stabilizing the evolution of cellular automata so that they will stay stationary and not move off screen. If I can get that far, then the next question to explore will be whether these stationary automata can be used as computational engines to process an input to an output stream? Specifically, I’m curious if an automata could be used to function as a Szilard’s Engine or a Maxwell’s demon.

On a second note: It has been over 20 years since I left physics and went into engineering. I’m afraid my math skills have gotten quite rusty. So, in an effort to get my skills backup to par, I’ll be working through my old textbook on math logic “Logic for Mathematicians” by A. G. Hamilton. and starting to work through the YouTube video series on the “Theory of Computation.”

Sunday, January 22, 2017

Array Processor Simulation: First Pass

My first pass at a simulation program is mostly done now. All I need to do next is to start running it, debug any problems with the coding, and see what it does. This first “proof-of-principle” simulation program I’ve done in LabVIEW. Yes I know, LabVIEW? The reasons are that it’s a software tool I have, I’m very accustomed to using it, and it’s a great tool to put together a Windows program very quickly. This first pass is only intended to get a feel for the direction I’m trying to take this programming.

There are a number of initial choices about the construction of an asynchronous-array-of-simple-processors that I don’t have any good answers for. I’ll just have to try different things out first and see if they work or not. And that sort of effort is best done with the simplest programming environment you have to work with.

One of those choices is the machine instruction set I’ll need to give each of the simple processor cells. There is also the choice of how the I/O handshaking works between each of the neighboring cells.

For my case, I’m opting for a stack-based architecture using a dual stack arrangement with the program memory and data memory mapped to the same physical block of RAM. Mapping data space and program space together will allow the program code to modify itself on the fly.

If it all works, then the next step is to go to a simulation routine written in C running on a Linux desktop system. At that point, I’ll be able to take some screenshots and make source code available on my webpage.

The one thing that became obvious very quickly, once I started coding, was that whatever machine instruction set I give each simple processor cell, it will have to be both rotationally and translationally symmetric with respect to right/left and up/down directions. This forces some constraints on the design. One of the interesting implications of this requirement for symmetry is that the individual cells in such a processor array can’t have absolute array-location addresses. And any stable programming that can exist in this array has to be relative and not absolute in its addressing of neighboring cells. Anyway, fun stuff to think about.

Writing the simulation routine is only half of the effort. The second part is writing the programs that will run in the individual processor cells. But in this regard, most of the work has already been done. I already have a generic assembler that I created years ago that takes a file written in assembly code format, along with a list of mnemonic/machine-code associations, then outputs a machine code file that I can import into each processor cell’s program memory.

Over the years, I’ve written in Verilog a number of small processor cores that were embedded into the FPGA designs I've worked on. In these design situations, I needed such a generic assembler to create the code files to run in these minimal cores.

Just as an aside, it’s fun to work at the HDL-Verilog level of programming. You can create custom processor cores that have just the machine instructions that you want; no more, no less. This really lets you dial-in the performance of your FPGA part!

Slightly off-topic now… Why my interest in asynchronous array processors as a likely candidate for a learning system? There have been a number of attempts to create chips that mimic neural networks at the silicon level. But these attempts to mimic the human brain fall short in one important aspect, the nerve cells that form your brain can grow and/or prune the neural connections between them. A neural network built on an IC chip can’t do this. Whatever connections there are between individual cells, these connections are permanent and cannot be modified later via software.

But there is a trick in asynchronous arrays that can get around this hardware level limitation. While the individual cells within an asynchronous array only connect to their nearest neighbors, there is a concept, what you might call a “code-worm”, where a data packet can originate in one cell and travel across the array to a destination cell that is not a nearest neighbor. These code-worms can be spawned or pruned on a real-time basis and can be used to reproduce the functionality of the axons forming the neural connections in the brain.

So that was my first realization, that you can use an asynchronous array like you would a neural network with code-worms acting like the axon connections between the individual neurons. So not only can you train such a neural network using the standard methods, but a neural network, constructed as an asynchronous array, would also be able to change connections between the cells as well. It would seem that a neural network built this way would be much closer to how our brains function than the techniques currently used.

So this is the goal of the first proof-of-principle simulation program; to see if I can get this code-worm concept to go.

The idea of using an asynchronous array as an ecosystem for cellular automata came later to me. But it’s also an extremely fascinating one which I’d like to pursue for its own sake.

Thursday, January 19, 2017

Getting ready to start blogging again

Now that I have retired from a full-time commitment to engineering design work, I have the time to return to thinking about agricultural robotics. Much of the current field of agricultural robotics is focused on enhancing, with robots, those aspects of the farming process that have already been automated to some extent; for example, things like taking a tractor and replacing the operator with an AI system.

In other words, the commercial potential in agricultural robotics currently lies mostly with leveraging existing technology with the addition of robotic components.

But the real challenge in agricultural robotics, and the aspect of this field that intrigues me the most, is the challenge of replacing the individual field worker responsible for picking our food crops. This challenge is not unique to the farm industry. Any industrial working environment which takes place out-of-doors, like logging, fishing, mining, or construction will present the same challenge, that is, there is no fixed working situation that is repeatable or predictable over time. Whatever tasks robots engaged in such work will be required to do, they will be required to constantly modified their programming, on a real-time basis, and adjust their activities accordingly.

So the key challenge replacing the individual field worker is one of creating a system that can learn and modify its programming while at the same time running the tasks it’s been expected to do. With this in mind, in the coming months I want to tackle issues related to machine intelligence and learning systems.

This goes back to a project I started several years ago but had to put on hold; that was the very large asynchronous array of simple processors.

Since I don’t have a job anymore with an income stream, building hardware cost too much money to keep going in that direction. So I’m renewing the project but in simulation. So my goal in the coming weeks is to start putting together a simulation program to create a very large array of simple processors.

Such an array becomes an ecosystem, so to speak for cellular automata to exist; much like John Conway’s Game of Life. But the twist I want to explore is the possibility that the kinds of stable structures that form in such an environment might actually be able to act as computing agents. I am not sure if anyone has explored this option. So one of my to-do’s is to get up to the library at UC Santa Cruz, start reviewing literature, and see if anyone are worked on this possibility.

This is the project for the hardware engineer in me. But for the physicist in me, I want to pursue the deeper question of whether not such a computational structure could then qualify, in some abstract sense, as a life form. The starting down that abstract mathematical path, I will be spending time looking at Landauer’s principle versus Maxwell’s demon as a life form.

Tuesday, January 17, 2017

There will never be such a thing as a truly autonomous self-driving car.

The idea of autonomous self-driving cars has got to be one of the biggest Pandora’s Box of unintended consequences that you can imagine for society. I think this video makes for a good teaching moment on why this is true.

The public’s fascination with autonomous self-driving cars is in large part driven by a mistaken assumption about how they work. Any truly autonomous vehicle will never be able to go much faster than 25 to 30 mph. The reason this is so has nothing to do with technology and everything to do with the physics of how these autonomous vehicles navigate through traffic. The only way an autonomous vehicle can navigate is through a combination of Google maps stored in its memory combined with sensors which look at the traffic and other landmarks and objects around it. In other words, the car basically drives by looking at the bumper of the car in front of it. The YouTube video link is a good example of this phenomenon. But we humans don’t drive that way. We have a “look-ahead” ability that can see many cars ahead, and under the right road conditions, we can see traffic half a mile or more ahead. This gives a human driver a much larger time margin to make critical decisions in; a time margin that self-driving cars currently are not endowed with. This is what allows commuter traffic flow to hit 75/80 miles an hour on the freeway. But if a human was to drive using only the information that self-driving cars currently have to work from, i.e. just watching the bumper of the car in front of you, there would be no end of car wrecks before that human had gone even a mile. It’s this fact that self-driving cars do not have a “look-ahead” ability that will forever and always limit their speeds to 25 to 30 mph or less in traffic.

The simple and obvious way to get around this limitation is to endow self-driving cars with their own “look-ahead” ability. There are a couple of complementary proposals floating around on how to do this. The first is to implement a vehicle to vehicle communications protocol that allows cars ahead to talk to the cars behind letting them know of upcoming traffic conditions. The other proposal is to have a regional traffic control computer that monitors the traffic for a whole road system and then controls traffic flow locally by a series of radio cell towers communicating directly to the individual cars. If things like this were implemented, then commuter traffic speeds for self-driving cars could easily and safely approach those of the German Autobahn.

But the “gotcha” in all of this is that once these proposals become implemented your car is no longer autonomous!!! That is, before the popular vision of autonomous self-driving vehicles, that everyone seems to envision, can come into being, individual cars will have to fall under some kind of outside control. And as soon as this happens, the government will know where you are, where you’re going, and where you have been. This is the world of 1984 in spades.

The scary thing is, given the society we live in today, there are in fact no end of people who would gladly, and without a second thought, give up their privacy for the chance to ride in a self-driving car.

Saturday, January 14, 2017

About this Blog

I come to the field of robotics with a somewhat unique background. In the years it took for me to graduate with my bachelor’s degrees from Humboldt State University, I supported myself by working in the woods as a logger. All of those years working as a rigging man on cable logging operations has left me with a deep appreciation of the complexity of the tasks that individuals engaged in skilled manual labor must perform. Also working with, around and on heavy equipment has given me an appreciation for what it takes to run machinery as well.

Very few if any of those individuals that are working at the academic level of robotics engineering have ever spent any time in their life actually doing, for a living, the jobs that they are attempting to reproduce robotically. As a result, their notions about how easily certain jobs will be replaced with robots are drawn from only the most trivial levels of knowledge of what those tasks actually entail.

Because of this two things happen. First, because their understanding of the engineering problem is based on incomplete or even a false knowledge, their efforts will fail to produce a working solution. And second, publically expressing their opinions on how easily certain occupations will be replaced with robots only comes across as condescending and patronizing to those working people whose careers are on the receiving end of such engineering efforts.

The truth is, there is no robotic system in existence that can perform with the speed and dexterity of a human being engaged in the hand-picking of a food crop. And given current technology, it’s not even on the horizon that such a robot could be built.

While, to an outside observer, what a field worker does may appear as mindlessly repetitive, it is in fact anything but. Every plant is different. Every row is different. Even the same field on different days, is different. Once a robot leaves the factory floor and moves out into the farm field, the mine shaft, the logging site, or the construction site, everything changes.

A factory floor mounted robot can be programmed with a fixed tool-path, since everything the robot needs to do can be placed with the necessary precision ahead of time. But once you move outdoors, the ability to control the placement of tasks no longer is an option; so any robot that is intended to be run in an industrial situation outdoors has to be able to continually adjust its programming on a moment by moment basis. In other words, any agricultural robot, intended to replace the field worker picking food crops, will have to be able to run in a continuously learning mode.

This is where my interest in agricultural robotics turns into an interest in machine learning. And then by extension, to the question of what underlying computational hardware architectures lend themselves to machine learning. So, much of my interest in agricultural robotics overlaps with an interest in machine learning and hardware design.

While these will be the main topics of discussion on my blog, my interest doesn’t stop there. As a physicist, I’m deeply excited about the Theory of Computation, so I will, from time to time, post on more abstract elements of machine learning as well.

Glen Reuschling.

Sunday, March 8, 2015

Very Large Asynchronous Arrays of Simple Processors

I’ve spent the last month searching on the web for applications that a large asynchronous array might be useful for. Having tried a number of different combinations of search terms, the title of this blog post represents the most promising of those collections.

And as I spent time thinking about the idea of a large asynchronous array, it seems the question is breaking into two aspects; the answer to one depending on the answer to the other. One question being, what would such an array look like at the hardware level? And its mirror question, what would you use such an array for?

When you approach the problem from the hardware level, a number of very intriguing conclusions come to the forefront.

First, if you are going to create a large asynchronous array, what kind of part packages at the IC level would you want to tile with? On one hand, you want to be able to have sufficient signal pins so that each simple processor cell in your array has full access to all its neighbors across any IC pin-to-pin transition. But even a single simple processor cell will have on the order of 100 signal pins. This would imply that your individual IC chips would be limited to no more than four simple cells per die. And if you used ball grid parts, even a small asynchronous array would start to take up quite a large PC board area. There’s also the challenge that one loses a lot of speed crossing a chip-to-chip transition. A quick order-of-magnitude estimate suggests that crossing a chip-to-chip barrier reduces your potential clock speeds by a factor of 10.

One way to deal with this clock speed loss crossing a chip-to-chip transition is to maximize the number of individual simple processor cells you place on a single die. But then you run into the problem that the GA144 has; that is, the lack of sufficient I/O to tile directly one GA144 to another; which again causes you severe loss in speed when crossing a chip-to-chip boundary.

What this conundrum is trying to tell us is that our very large asynchronous array wants to be a wafer scale construct. That is, you deal with the loss of speed between chips by just not cutting your wafer up into chips in the first place. This is where things get intriguing: subtracting out the area taken up by bonding pads, the GA144 is approximately in area. This means you could tile 400 GA144’s onto a 10-cm x 10-cm wafer scale chip. This in turn translates into 57,600 individual F18A processor cells on an area about the size of two postcards. Wow! Letting our imagination run a little further, imagine stacking 20 such layers together in a 3-D arrangement. We now have something on the order of 1 million F18A processor cells in a volume about the size of a small book.

The one thing that invariably kills the design of any large processor array is heat. But this is the area where asynchronous arrays come to the forefront. So again using the GA144 as a worked example, its typical quiescent current draw is 7-µA. That means that for our 10-cm x 10-cm wafer scale chip, its typical quiescent current draw will be on the order of 3-mA; a ridiculously small number.

But what about the case when the GA144 is running? The typical full-on current draw for a single F18A cell is 3.75-mA. Multiplying this current draw by the number of individual F18A cells that could be tiled on our 10-cm x 10-cm chip gives a total current draw of about 200-A. That, of course, would probably melt our array. But if only 2 to 4 percent of our individual cells were running at any one time, the total power dissipation at 1.2-V would still be less than 10-watts. This is a very doable amount of heat to dissipate.

What the asynchronous array hardware wants to be is a wafer-sized chip upon which tens of thousands of individual simple processor cells have been placed, with the qualification that only a few percent of these individual processor cells are active at any one moment. What this, in turn, tells us about any application that we might run on our asynchronous array, is that it must be “sparse” in its operation.

The one downside of a wafer-sized chip is the inevitable presence of defects in the fabrication process. This implies that for any large array, there will be at least some dead processor cells, so the neighbors to such a dead cell will have to have, as part of their programming, a way to route around it. This is something that will have to be included in the basic design and programming of an individual simple processor cell.