Cognitive Regularities 2: Perceptual explanations
|Rajesh Kasturirangan||Jun 19, 2011|
The last fifty years have seen a great expansion of our knowledge of the mind/brain and it relationship to the external world. Significant advances have been made on the experimental front, in areas as diverse as Psychophysics, Neurophysiology and Cognitive Psychology. However, the corresponding advances in theory have not materialized. Some of the disciplines within the mind/brain sciences have developed deep theories of the corresponding mental faculty- linguistics and vision being two notable examples. However, these theories are specific to the particular mental faculty and do not generalize well. Ideally, a complete theory of the mind/brain should provide an unified account of the mind/brain within an explanatory level and also across explanatory levels1.
The main claim being advanced here is that mental systems as well as the world are driven by “strong” regularities. Consequently, there is no principled reason to think of mental processes as being different from world processes, i.e., there is no special mind-world barrier. It seems quite possible that the underlying dynamics of mental and world processes are constrained by the same regularities. If so, one of the tasks of the cognitive theorist is to explicate these common principles2.
That is to say, we should have a theory that leads to horizontal as well as vertical unification. By horizontal unification I mean a set of principles that capture the inherent structures that are common to the various subsystems of the mind/brain of an organism as well as its environmentat a given level. The problem of vertical unification is that of relating principles across levels. In this piece, I restrict myself to the question of horizontal unification at one particular level, an abstract level that I call natural structure.
I prefer to call this level “natural structure” instead of the often used terms “computation” or “information processing” for a reason. Although we speak of computational constraints, these are often driven by the underlying generative processes and the structures that support these processes. Computation as it is usually conceived does not always capture the inherent form of these processes. For example, many natural constraints are geometric in nature, for example, finding part boundaries in images or describing the shape of a smooth object (Koenderink, Hoffman-Richards). Another example is the relationship between the size of an animal and the speed at which it walks/runs (d’Arcy Thompson). All of these constraints are modal regularities (Richards). There is a common underlying architecture behind all of these structures and processes. Some of these processes may well be computational, but others may not be so. Nevertheless, at an abstract level, they have the same form. The emphasis on computational schemes or on representational forms is mistaken. Mental structures should be regarded as explanatory structures that are not tied specifically to logic, language, pictures etc., but rather to models that are constrained by the intrinsic regularities of the mind and the world.
2. The Main Assumptions. The arguments here rest of six assumptions that I think are common to all good models, whether they be perception subsystems or a scientists model of some aspect of the world.
The world is regular: These world regularities require an explanation.
Generative Processes: The regularities are a result of generative processes.
Closed World: In a given model, there are a set of variables, quite likely a small number, that capture all the relevant regularities.
Universal Principles: There are universal principles that determine the set of generative processes.
Symmetry within the closed world: The universal principles apply uniformly to all objects in the closed world.
Horizontal levels: Every model has a particular scale3of operation, i.e., it is valid only at a certain scale or set of scales. The scales are part of the definition of the closed world. The universal principles are valid only for those scales.
3. An Outline of the Common Architecture:How do these assumptions translate into a proposal for the study of the mind-world relationship?In the approach taken here the basic architectural unit of the perceiver/world is a quasi-modular (QM) process. Each QM process consists of a frame, a strong dynamical procedure and a feature lattice (of non-accidental features). The main difference between world processes and mental processes comes in the relationship between the feature lattice and the dynamical procedure. The dynamical procedures in world processes causally produce regularities that are then organized into a structure lattice, while mental processes are “explanations” of features in a preference lattice. Since mental processes have to perform in real time in a rapidly changing environment, they are selective in what they choose to explain and are also highly context sensitive. In any given situation, there are many processes that are interacting simultaneously. The interactions take place at interfaces. Each interface is a map between the feature lattice of one QM process to the feature lattice of the other QM process. In particular, the structure lattice/preference lattice correspondence is the analog of representation of world regularities by mental regularities. Note however, that the dynamics of the system is not driven by the feature lattice correspondence.
The dynamical account of mental function has many advantages over the representational account. It takes for granted that the mind as well as the world are highly structured entities with striking regularities of their own. If that is the case, it becomes impossible to hold on to the notion that mental regularities mirror the world regularities, as the representational account demands. It seems to us that in order for the structure of the organism to be isomorphic to the structure of the world, a very impoverished version of organismic and world structure would have to be true4. Linking the mind and world by means of the correspondence between the structure and the preference lattices is a way to understand and quantify the mind-world relationship without imposing the extra burden of representation. The structure/preference lattice correspondence keeps perception/cognition robust since the explanatory processes are tied to non-accidental features in the preference lattice. Non-accidental features in the preference lattice correspond to non-accidental features in the structure lattice that in turn is tied to processes in the world. Therefore, the processes postulated by the new theory do not change the survival value of perception/cognition for the organism.
Remark: Postulating the existence of a computational level seems like a restatement of the hardware-software distinction, or of the existence of different levels of analysis. In both of these cases, the reason for abstraction is epistemological and practical, i.e., it is assumed that we abstract away from reality because information processing is hard to discover amidst all the biological and physical details. However, not all abstractions are epistemological in nature. For example, Newton’s laws of motion are also abstractions, but everybody thought that they described the world directly. Let us call the two kinds of abstraction described above as weak abstraction and strong abstraction respectively.
The centrality of computational structure/design implies that it is a strong abstraction and that describes an aspect of nature5.
Every process takes place in a spatio-temporal framework6. The range of possible spatio-temporal frameworks is enormous, from concrete objects, e.g., a canvas for painting pictures, to abstract mathematical objects like vector spaces. A frame is a spatio-temporal framework that is anexplicitembodiment of certain aspects of the spatio-temporal structure of the world/perceiver. It serves two purposes, the first of which is to provide a setting in which processes, actions and events can take place. Second, because it makes some computational objects explicit, it filters out unnecessary elements of the world/perceiver. In other words, only those objects that are made explicit can participate in a process that is enabled by a given frame. To make a statistical metaphor, a frame is a meta-prior that constrains and determines the space of all acceptable hypotheses. For taxonomical purposes7, we can divide frames into two kinds- world frames and mental frames.
A prototypical example of a world frame is a sand dune in a desert. Many dynamical processes take place on the surface and interior of a dune. So, what is the explicit structure of the dune? The explicit structure of the dune is a three dimensional co-ordinate frame along with the relative location of the particles of sand in the dune. In order for a dynamic process to leave its mark on a dune, it needs to move some particles of sand. For example, a lizard walking on the surface of the dune leaves a trace while sunlight does not. Note that this frame is inherently computational. The actual physical make up of sand is irrelevant. A computer simulation of small, three-dimensional particles affords the same processes as a real dune. Good examples of mental frames are image centered and object centered reference frames for representing objects. A more interesting example is the canonical vertical axis imposed by human’s prior to the formation of the percept (see figure 1).
Most frames support both static and dynamic procedures. For example, three generic points in two-dimensional space form a triangle, an example of a relationship that is satisfied by the three points. A relationship is a static procedure since the process by which the relationship came into being is not specified in the relationship itself. At the same time, the three points could be a product of a transformation applied to some other three points, in which case, the relationship is an outcome of a process. We can roughly classify procedures as follows-
Type1 procedures: These consist of nothing more than a frame F and individual objects supported by the frame, for example, three points in two dimensional space, where the relationship between the three points is not explicit.
Type2 procedures: Here the procedure consists of some objects, O, and a set of relations, R, where each relation is a predicate over the set of objects, for example, three points in two dimensional space that (explicitly) form the vertices of a triangle.
Type3 procedures: In a Type3 procedure, P consists of a generative procedure that generates the given relations between objects. Assume that the triangle in a type2 procedure is produced by stretching and dilating a standard equilateral triangle, whose sides are all of length 1. In this case, the generative procedure consists of the “stretch” and “dilate” transformations.
Type4 procedures: A Type4 procedure is the strongest possible procedure. Not only do we have a generative procedure but also the underlying dynamics is known, i.e., we know the mechanism that leads to the generative procedure. In the triangle example, this is equivalent to saying that the equations that generate the stretch and dilate transforms are known, along with the order in which they are applied to produce the final triangle. The dynamical mechanism is a meta-procedure, that generates the given generative procedure in a type 3 procedure in the context of the frame F. From now on, type-1 and type-2 procedures will be called static procedures while type-3 and type-4 procedures will be called dynamic procedures.
Within a given frame, relationships between objects or events can be divided into two classes- generic and non-accidental. Generic relationships are those that have a high probability of occurring, given the constraints imposed by the frame. Non-accidental relationships are those that have a low probability of occurring in the context of the frame. Consequently, non-accidental features are strong indicators of further constraints in the form of static or dynamic procedures. For example, three points in 2D space are generically not collinear. Therefore, co-linearity is a non-accidental feature of three points in 2D space. Some features are more non-accidental than others. For example, if we take four points in 2D space, the generic relationship is that of a quadrilateral. A rectangle is a non-accidental configuration, since the opposite sides are parallel. However, a square is even more non-accidental since all the sides are of the same length. The features in a given frame 8 can be arranged into a lattice where the features lower in the lattice are more non-accidental than the ones above.
Definition. A dynamic process is a triple, (F, P,L), where F is a collection of frames, F , P is a collection of dynamic procedures and L is the lattice of generic and non-accidental features in the frame F.
So far, all the concepts have dealt with an individual QM process. However, in a typical situation in the real world, there are many different interacting processes. The architecture of QM processes should somehow reflect the fact that multiple interactions are the norm. This is where quasi-modularity comes into the picture. A modular process is one where the procedure is independent of any input coming from interactions with other processes, i.e., it is an automatic process. Modular processes are not very context sensitive and do not have the flexibility that is demanded in a rapidly changing environment. However, they are very robust since their input-output mapping is very well defined. As a consequence they can be used as components in a variety of complex tasks. A quasi-modular process is a generalization of a modular process that retains the robustness of modular processes while being flexible. In a quasi-modular process, the triple (F,P,L) comes with a finite (usually quite small) number of switches. Each switch opens a frame or starts a dynamic procedure. For a given process, Q, at a given time, the processes with which Q is currently interacting determine the set of “on” switches.
In particular, if the number of switches is zero, we end up with a modular process. Quasi-modular processes are highly robust since the dynamic procedures themselves are immune to change from external interaction. However, they are highly flexible since a small number of quasi-modular processes are enough to meet the combinatorial demands imposed by a changing environment 9.
Finally, we come to the relationship between the dynamic procedure, P, and the feature lattice, L. Here, we see the difference between world processes and mental processes. In the case of world processes, the relationship between non-accidental features and the dynamic procedures is very simple- each non-accidental feature is caused by a dynamical procedure, i.e., the dynamical procedure leaves the non-accidental feature as a trace. In the case of mental processes, the relationship is quite a bit more complicated. The origin of the complicated relationship between non-accidental feature and mental process stems from the fact that the mental process is designed for use by the organism. Therefore, the set of non-accidental features made explicit by a mental frame is much smaller and the relationship between the non-accidental feature and the mental procedure is that of explanation, not causation. Explanation is a concept that is relatively hard to pin down, however, a couple of things can be said about any explanatory process-
(1) An explanatory process is typically very selective in what it chooses to explain. However, if a feature is explained, it is likely to be a highly non-accidental feature. That is to say, the more non-accidental a feature gets, the probability that it is explained becomes greater. This makes good sense since at any given time, there are many world processes that are producing regularities. However, only a few are of importance to the organism and they are the ones that are worth explaining. Another way of stating the selectivity of explanatory processes is that they index into the right mental frame and therefore, into the right non-accidental features. Indexing is act of choosing the right mental representation or mental routine for a given stimulus. For example, if we are looking at an object, what is the feature worth noticing- the overall shape, some distinctive part or feature, or some other attributes like color or location? Each one of these features is non-accidental, so we have to make a choice between equals. It might well be possible to explain indexing using quasi-modular processes. In a large, interacting network of quasi-modular processes, it is still computationally feasible to access the “correct” frame because of the way the architecture is constrained 10.
(2) A good explanation is not the best inference from the set of non-accidental features. In particular, a good explanation is not always causally related to the non-accidental features it explains. There is a history of thinking about perception as a process of unconscious inference, starting from Helmholtz. In recent times, the role of inference has been stressed by many authors, e.g., Irvin Rock. Inferential processes are chains of counterfactual reasoning driven by non-accidental features. There is a causal relationship between an unconscious inference and the non-accidental feature, with the non-accidental feature acting as a cause. However, explanatory mechanisms do not have to be in any kind of causal relationship with their key features. For example, an explanatory mechanism could be a generative procedure driven by internal constraints that is triggered/indexed by a non-accidental feature. Consequently, the generative procedure could be largely independent of any non-accidental features. As a result of the relative independence of the non-accidental features and the explanatory procedure, a good explanation is more robust than a good inference. A good explanation is not causally related to any particular set of non-accidental features and it does not break down when there are non-accidental features that give rise to contradictory inferences. To give an example, let us take two explanations of planetary motion.
Every planet traces an ellipse as it revolves around the sun in such a way that the area covered is the same for a given interval of time, independent of the position of the planet in its orbit (Kepler’s law).
The motion of the planets around the sun (among other things), is governed by the law of universal gravitation.
The first is an example of inference from observational data while the second is not11. In fact, the second is a classic case of indexing into the right frame for explaining a problem by ignoring a lot of conflicting data. To go back to the notion of indexing, if the architecture of quasi-modular processes solves the problem of indexing into the right frame, there is no need to rely heavily on inference any more. However, in order to index into a dynamic procedure, it is important that the environment of the perceiver not be impoverished. In the absence of robust non-accidental features in the environment, the mental system may well index into an inferential mode, since that is the safest strategy. When robust non-accidental features are available, the mental system switches on a dynamic procedure because explanation is always better than inference in the presence of robust data12.
4. Evidence for the dynamical approach.The dynamical approach makes four strong claims about the structure of the perceiver and the world. They are-
The world as well as the mental system consist of a collection of interacting quasi-modular processes.
The architecture of world processes is quite similar to the architecture of mental processes. Both are triples, (F, P,L), of frame, dynamical procedure and feature lattice respectively. Consequently, there is no principled distinction between mental and world processes.
The main difference between mental processes and world processes is the relationship between the dynamic procedure and the feature lattice. While world processes leave non-accidental features as causal traces, mental processes “explain” non-accidental features.
At any given time, there are many processes, both world and mental, that are active. Pairs of processes can interact only at an interface, which is a map between the feature lattices of the two processes.
Claim 2 is the strongest claim of the four and the one that I cannot do much justice in this piece. Definitive evidence for this claim can only come from showing the power and elegance of QM processes as an explanatory framework. At a descriptive level, claim 2 holds for every mental sub-system. Every mental process that we know is highly robust, tied to a small set of non-accidental features while remaining context sensitive and interacting with a host of other mental processes. This is true of systems studied under the labels of depth perception, motion detection, object recognition, shape representation. Similarly, all world processes that impinge upon the perceiver are quite local, with a well-defined generative procedure. Statistical, geometric and logical principles have proved useful in all aspects of computational modeling of mental systems. Furthermore, they share the same structure13. The big question is whether the claim is true at a deeper level, i.e., are there common computational principles that apply to many if not all quasi-modular processes. This is a question that cannot be solved in a book, let alone one paper14. Evidence for the other three claims is easier to come by and I have gathered them into three subsections, one for each claim.
(1) It is pretty clear that every object in the world is the outcome of at least one process and participates in many other processes. In itself, this fact may not mean much, but the crucial observation is that the regularities (of objects etc.) in the world are largely an emergent property of the processes that shape them. Whether the objects be rocks in the middle of a stream, clouds in the sky or trees in a forest, each object has a characteristic shape that is entirely due to the process that caused it to come into being. One can come up with an infinite number of other examples all indicating that regularities in the world bear a strong imprint of the processes that cause them. Furthermore, for our purposes, it is equally important that these processes are all abstract, computational processes. The details of fluid mechanics are not important in determining the shape of a rock in a stream. A computer simulation of a fluid that preserves only a few of the physical properties of water produces rocks of the same shape. Fractal modeling produces shapes that are remarkably like real world clouds and mountains. This strongly suggests that the mental environment, and not just the perceiver, consists of processes that can be modeled at an abstract level.
Similarly, even the simplest percepts are part of a dynamical explanation, not a representation based explanation. One might think that perception is an explanatory process at the higher levels, when it overlaps with higher cognitive processes in general. Yet, it seems to me that even the simplest acts of perception involve some kind of dynamical explanation. The examples below illustrate my point.
In each pair, upon inspection, it is clear that the two shapes are the same. Yet, mentally they seem quite dissimilar. The most parsimonious explanation for the difference is that our visual system imposes a coordinate frame on the stimuli. The co-ordinate frame switches a different explanatory process each time and that leads to a different perception of form in each case. If indeed that is the case, two conclusions follow-
(i) The vertical co-ordinate frame is not a representation of some property of the stimulus or a simple inference from the stimulus. After all, the stimuli above vary drastically in their features and properties, so any process that leads to the inference of a vertical orientation has to be a complex process.
(ii) Our percept is an outcome of a process that involves the coordinate frame and the non-accidental features of the stimuli.
Similarly, consider the example below. Triangle A is nested inside triangle B. Figures 2b and 2c and 2d provide three examples of transforming triangle A into triangle B. Most observers would think of the transformation in 2b as being more natural than the ones in 2c or 2d. In 2e-2g, the same transformations are now applied to a set of nested curves. In this case, there seems to no clear choice of a natural transformation.
Examples like the one illustrated in figure 2 show a couple of things. First, we have a repertoire of transformations in our visual system. Second, in the presence of key features -in this case the vertices of a triangle- some transformations are more natural, i.e., some transformations are better explanations than others.
(3) It is quite clear that regularities in the world are traces of world processes. Whether it be a tree, a cloud, a chair or a building, every world object is an end point of a causal process. What is more important is that these processes can be modeled computationally using relatively simple universal rules. There is no need to get into the details of design, but it is true that the design any object in the real world can be replicated on a computer screen. The graphics industry depends on this fact. Of course, that is not to say that the physical process itself was replicated. What concerns us are the constraints at the level of design, not the actual process that was used to construct the object itself. In some case the two may be the same, e.g., the physics of sand dunes can be faithfully modeled in a computer simulation, but an isomorphism between the design level and the phyics is not necessary. For example consider automobile construction. Constructing a car is a process, but so is designing a car. The design process is not isomorphic to the construction process as the two have different constraints and causal trajectories. As it so happens, both processes end up producing the same item at the end, but that is not to discount the fact that it is the process constraints that largely regulate the end product and not the other way around 15. The study of the QM processes at a computational level is the study of the principles of biological design, albeit at an abstract level.
Similarly, there are numerous examples that show that perception is an explanation of a few non-accidental features in the stimulus and is not directly caused by the stimulus. Indeed, our intuition as perceivers seems to indicate that we represent the world in all its richness. It comes as a surprise that our representations are actually quite poor. It has been shown time and again that we neglect massive changes in the world, even if it happens in front our eyes (Rensink). The best explanation for the poverty of our representation is that perception consists of dynamic procedures that explain a few key features while filtering out everything else. Quick, process driven explanation of non-accidental features seems to be the norm rather than the exception. Furthermore, computational capacity seems to be irrelevant. Consider a situation where images are created using the LOGO program:
Most people classify the pictures into several different categories. It comes as a surprise that all of the pictures were generated using the same rule: take a line and rotate it at a fixed anglentimes, wherenis an arbitrary integer. Human observers are not able to use the underlying regularity to decode the generative process or to classify the various pictures as belonging to one class even though the task is not computationally intensive. Why is that so? After all, the probability thatnangles chosen at random are equal is much smaller than any other regularity that is present in the pictures. Nevertheless, we neglect this highly non-accidental feature in favor of others. In this case, perception is guided by internal processes tied to non-accidental features that are not necessarily the most important “world statistics”.
(4) The first part of the claim is trivial since it is obviously the case that in a natural environment there are many different processes going on in the world as well in our heads. As for the second part, it can be divided into two halves- interaction takes place only along interfaces, and the interaction is a map between feature lattices. The first half is the usual argument for modularity. Since this topic has been discussed quite intensively in the literature, there is no point in discussing it further.
All the novelty in claim 4 is in the second half. It is well known that non-accidental features in the image map onto non-accidental features in the distal stimulus. T-junctions map onto occlusions, minima of negative curvature map onto part boundaries and so on. However, the really interesting examples are maps between two mental processes. For example, consider the interface between the linguistic system and the visual system that is involved in reading written directions and then looking at a map to find the way -something most of us have done at some point in our lives. A typical written set of directions might say — “ Go straight on road X for about 2 miles, take a right one the fourth traffic light and then take a left at the next traffic light.” Most of us when reading an instruction like this try to find road X on the map, immediately start counting the intersections till we reach the fourth one and then jump to the next light on the road that is to the right. On the map itself intersections are always sharp discontinuities, usually right angles. In both cases, we are looking for non-accidental features. In the linguistic world they correspond to actions- turn left, turn right- and in the visual world they are sharp discontinuities in a map. We do not bother with the stretch of road in the middle whether we are reading directions or looking at straight pieces of a road in the map because it does not give us any useful information, i.e., it is a generic feature. This is even more striking when you realize that written directions rarely have extraneous information like “ The 2 mile stretch of road X is has this beautiful house on the right. Do not forget to look at it.” Directions are designed so the linguistic non-accidental features directly map onto visual non-accidental features. An obvious question is “how do we know if a feature is non-accidental or not? For all you know, the notion of non-accidental is truea posteriori, i.e., a linguistic term is non-accidental if it is mapped onto a visual feature.” What is really impressive is that actions are non-accidental in a frame that isintrinsicto language, while right angles are non-accidental in a frame intrinsic to vision. Therefore, the domain and the range of the linguistic-visual map are both well defined. No chicken and egg problem arises here.
4. Consequences of the dynamical approach. In this section, I use the dynamical approach to address two well known debates in cognitive science, namely, the contribution of innate knowledge and the role of representation. In the first case, the results are interesting but not surprising, while in the second, I believe it leads to a whole sale reevaluation of the importance of representation. Therefore, much more space is devoted to a discussion of the second topic.
(1) The role of innate knowledge. If the four claims at the beginning of section 3 are true, then the common architecture of QM processes impose severe constraints on individual mental processes as well as the perceiver-world system as a whole. In this sense, most of the structure is built into the system. However, the role of environmental input and learning during the lifetime of the organism is not to be minimized. Learning enters into the picture in two different ways. First, the correspondence between the feature lattices of two distinct QM processes is not determined beforehand. This has to be learnt by the perceiver and is clearly tied to the intricacies of the mental environment. Since the environment may be very different for two individuals selected at random, the mapping can between the same QM processes in the two individuals can be quite different. Second, each individual has to solve the indexing problem to his or her satisfaction. A quasi-module comes with a set of switches and an individual perceiver has to decide which frame is important in a given task, that in turn determines the switches that are turned on. The individual organism also has to order the different non-accidental features in a preference lattice. Furthermore, learning can result in a qualitative leap in performance. This is because quasi-modules that are connected by a robust interface can then be chained together to perform more complicated tasks, while weak interfaces will fail on these tasks. In this sense, in the dynamical approach, innate structure and learning are at two different levels. The overall structure at the design level is largely dictated by common architectural constraints while real time performance is molded by experience, sometimes strikingly so.
(2) The importance of representation. The term “ representation” is ubiquitous in cognitive science. There is no generally accepted definition of these terms but all representational theories make the following four assumptions about the relationship between a perceiver and his environment.
The world, W, (the “distal stimulus”) is a collection of objects. Objects have properties and are related to each other both spatially and causally.
The perceiver has access only to a projection of the world16, called the image or proximal stimulus, P.
The Mental system of the perceiver consists of an internal representation, R. R is related to W in anexplicitmanner by means of a correspondence F:WR, that allows the perceiver to make explicit certain properties of the world W. The goal of the mental scientist, apart from any questions about the biological substrate of R, is to answer the questions: “What is being represented by R” and “What is the structure of R?”
Representation is the primary goal of perception, i.e., the goal of the mental system is to use the proximal stimulus, P, to achieve a veridical representation, R, of W. Furthermore, the science of perception is the study of R and its relationship to W.
CR theories can be broadly classified into two types:
(a) Inverse optics (Marr, 1980). The goal of the perceiver is to invert the process that led to the creation of the proximal stimulus from the distal stimulus. Since this is an underdetermined problem, the perceiver imposes additional constraints in order to invert the image. In this scenario, inversion is unavoidably aprocess, and its end goal is to recreate the distal stimulus. Nevertheless, it is possible to separate the goals of the perceiver from the process that leads to the goal. For example, Marr separated the two by calling the study of the goals the “The computational level” while the process was part of the “ representational-algorithmic” level.
(b) Similarity based methods (Edelman,1998). Inverse optics turned out to be harder than anyone could have imagined in the late 70’s. As a consequence, mental scientists turned towards simpler methods of representation. The goal of the perceiver was no longer inversion of the image but rather to represent the relations/similarities between objects in the world in the form of distances between points in an abstract similarity space. The representation was to be effected in a way that there was an isomorphism between the distal relations that were represented and the geometry of the similarity space. Note that the similarity based method is inherently weaker than inverse optics, in the sense that the process by which the similarities are represented is relegated to a neural mechanism. Of course, being “weaker” in this sense made these theories computationally tractable and along with powerful new algorithms, they have led to advances in computer vision, psychophysics and neurobiology.
Is there a good reason to be so sanguine about the prospects of representation? I believe not. Given the richness of our sensory experience and the numerous ways in the senses confirm each other, it seems quite obvious that the goal of perception is to uncover the structure of the world. Nevertheless, this is a mistaken view of perception. The dynamical approach shows that representing of the external world is only a secondary aspect of perception. The primary purpose of perception is not representation but explanation.
The argument against representation consists of two main points, as follows-
(a)The world is process driven. Objects in the world are secondary at best and are sometimes very different from what we think they are. The first claim has already been addressed in section 3. As for the second, consider for example, the existence of natural processes that are multi-scale (like fractals) also show that the form/shape of world objects is often nothing like the piece-wise smooth bounded surfaces that we experience mentally. The fact that many (if not most) objects in the natural world are multi-scale is quite illuminating17. If representation is primary, it is surprising that we do not represent a highly robust statistic- that the “true” shape of real world objects is a multi-scale distribution. A consequence of multi-scale representation of shape is that the assumption of piece-wise smooth surfaces is completely wrong. However, we see multi-scale objects as piece-wise smooth objects plus some texture (Gilden).
Interestingly, there seems to be a clean break between human constructions -buildings, cars, chairs, tables etc., and natural world objects. Human constructions are invariably piece-wise smooth bounded surfaces in shape while natural world objects are often multi-scale. If indeed we shape our environment in a way that it conforms to our modes of perception, the smooth geometry of human constructions is a consequence of the structure of our minds and not a reflection of natural world statistics.
(b) Explanation always gets precedence over representation. As a result, representation can be surprisingly hard, even when it should be easy. This follows from the argument in section 3, for the non-inferential nature of mental processes. After why should the mental ignore a robust statistic in favor of other less non-accidental features? The only plausible reason is that the mental processes are highly selective in the features they explain, and the features that end up being explained are the ones selected by an indexed frame. As soon as primacy is ceded to an internal frame, the mental process is driven by the constraints of the frame, not of the world. This is also borne out by the examples in figure 1. The “world” non-accidental features lead to the inference that the objects in each pair are the same, yet the vertical frame imposed from within prevents the “ correct” percept from being formed.
To summarize, representation is secondary, and only acts as an interface between mental processes and world processes.Consequently, an exact match between the world and the perceiver is dependent on mental process selecting non-accidental features in a way that reflects the statistics of the world, i.e., the structure lattice. However, the structure and preference lattices are often not isomorphic. Nevertheless, perception is a highly robust process because it is always an explanation of non-accidental features even if the importance given to a non-accidental feature does not reflect the statistics of the world itself. We can capture the relationship between the between the perceiver and the world by the following diagram.
5. Conclusion. In this essay, I have argued that the basic unit of the perceiver-world system is not a static representation, but a dynamic, quasi-modular process. At the heart of a quasi-modular process lies a strong dynamical procedure that is strongly tied to non-accidental features of the accompanying spatio-temporal frame. Furthermore, the architecture of world processes is the same as that of mental processes. As a result, the perceiver-world distinction becomes a taxonomic one without any metaphysical implications. The dynamical approach allows us to understand the mental systems and the world for what they are- richly endowed structures that are inherently computational. One consequence of the richness of the perceiver and the world is that true representation is no longer a necessity, making representation a secondary goal for the mental system as well as decreasing its importance as an object of study for the natural scientist. I believe this opens the way for a theoretical psychology that is truly biological and that illustrates the central role that computation plays in biology.
1 A well known example of different explanatory levels is David Marr’s three levels of analysis, namely, computation, algorithms and implementation It is not clear that the correct levels are the ones that Marr talked about. In fact in this paper, we argue for a collapsing of the computational and algorithmic levels into one level, that we call “natural structure”. However, there has to be some division of the various structures into levels.
2Traditionally, this question has been posed in the context of the nature-nurture debate. Interestingly enough, both nativists as well as empiricists agree that the answer to the question is “No”, though they have different reasons. The modern empiricist (Churchland-Sejnowski, Crick) thinks that the problem of horizontal unification is a subset of the problem of vertical unification and that all answers will be couched in biological terms. The nativist answer (Fodor) is based on the modularity of the mind, which assumes that each has its own proprietary computations. In our opinion, the nativist as well as the empiricist positions are mistaken. Instead, we argue for a strong form of horizontal unification. In this approach, the mind and the world are highly structured entities and information/computation is central to the study of the mind-world relationship.
3 The term scale is not to be confused with actual physical scale, but as an abstract variable. For example, in the cognitive domain, information processing is a scale. That is to say there is a set of models for whom all the variables, generative processes and laws are computational.
4 There are many people who believe that at the computational level, the world and the organism are pretty simple, e.g., minimum description length (MDL) principles are implicitly based on this assumption.
5 There is a clear link between the strength of the computational processes and their metaphysical status. A convincing argument can be made that weak computational processes are by products of more fundamental biological processes. On the other hand, if the computational principles are quite strong, they cannot be explained away so easily, especially when there is no clear correspondence between current neural principles and strong computational principles. The existence of strong computational processes is a vindication of non-reductionism, either environmental or neural.
6 Spatio-temporal does not mean space-time as studied by physicists but rather any structure that supports spatial and causal processes. In particular, space-time is a spatio-temporal framework that is useful in the study of physics.
7 That is to say, no genuine distinction is implied in making this separation.
8 Only those features that are made explicit in a given frame are allowed to enter into relationships, whether they be generic or non-accidental.
9 Note that we need only 10 QM processes with 2 switches each to take care of 2 10 = 1024 alternatives. However, having multiple switches per quasi-module works only if three is an efficient way of turning on the right switch at the right time. Some recent work shows that this is feasible in an interacting network that is only slightly non-modular, see for example, Kasturirangan, R., Multiple Scales in Small World Networks, MIT AI Memo.
10 Kasturirangan, R., Multiple Scales in Small World Networks, MIT AI Memo.
11 Historically, for this reason, the explanatory leap on Newton’s part led to all sorts of controversies over action at a distance.
12 This is a situation that is normal in scientific research. There is no point in making strong theoretical claims in the absence of replicable data, because the data could turn out to be completely wrong. On the other hand, if the empirical data is replicable, it is better to use research time incorporating the available data in a strong theoretical framework than to spend it gathering more data.
13 See for example, General Pattern Theory by U.Grenander for one attempt at unification of statistical and geometric ideas.
14 A major hurdle has been the reluctance on the part of theorists to believe that the mind-world problem lies squarely within the scope of the natural sciences. Quite possibly, there are new principles to be discovered here that are as counter-intuitive as principles in any other science. The mind has always struck people as an object that they have direct access to as sentient beings. We need to drop that assumption and treat the computational/mental domain as an aspect of the natural world that needs to be studied with the same level of skepticism and rationality as the other sciences. When these criteria are applied, notions like the Turing test, general intelligence or neural implementation become very questionable as benchmarks for the study of the mind. However, there is no reason why we should not see deep regularities in computational systems as we have seen in all other aspects of the natural world.
15 There is an obvious objection to this claim, i.e., that it is consumer demand for the end product -cars- that drives design and production. That is true, however, it is also a misrepresentation of my argument. First of all, both the design and construction processes are largely independent of consumer demand. Secondly, consumers get to choose between an array of finished products. They do not dictate how cars are designed, which is mostly a function of the laws of physics and other constraints coming from their use. This is precisely a case where a key feature (consumer demand) triggers but does not cause a process to be set into motion.
16 I am using the word projection metaphorically. All that is meant is that the information that is available to the perceiver is not the same as the world itself.
17 This fact has been borne out repeatedly and is reflected both in the statistics of the world and the statisitcs of images. See Mandelbrot, Mumford, Gilden etc. Note that the lack of multi-scale object representation is not a question of mental acuity. Natural objects are multi-scale even when a high frequency cut-off is imposed. Furthermore, texture is represented at multiple scales. I wonder if texture is exactly that part of the world that is represented at many scales.