Modeling the interaction of learning systems in a reward-based virtual navigation task

Existence of allocentric and egocentric systems for human navigation, mediating spatial, and response learning, respectively, has so far been discussed. It is controversial whether navigational strategies and their underlying learning systems and, accordingly, the activation of their associated brain areas are independent/parallel or whether they functionally/causally interact in a competitive or in a cooperative manner to solve navigational tasks. The insights provided by neural networks involved in reward-based navigation attributed to individual involvement or interactions of learning systems have been surveyed. This paper characterizes the interactions of neural networks by constructing generative neural models and investigating their functional and effective connectivity patterns. A single-subject computer-based virtual reality environment was constructed to simulate a navigation task within a naturalistic large-scale space wherein participants were rewarded for using either a place, response, or mixed strategy at different navigational stages. First, functional analyses were undertaken to evaluate neural activities via mapping brain activation and making statistical inference. Effects of interest, spatial and response learning/retrieval, and their competition and cooperation were investigated. The optimal generative model was then estimated using dynamic casual modeling to quantify effective connectivities within the network. This analysis revealed how experimental conditions supported competition and cooperation strategies and how they modulated the underlying network. Results suggest that when navigational strategies cooperated, there were statistically significant, functional, and effective connectivities between hippocampus and striatum. However, when the strategies competed, effective connections were not established among these regions. Instead, connections between hippocampus/striatum and prefrontal cortex were strengthened. It can be inferred that a type of dynamical reconfiguration occurs within a network responsible for navigation when strategies interact either cooperatively or competitively. This supports adaptive causal organization of the brain when it is engaged with goal directed behavior.

Keywords

Navigational strategies

spatial and response learning

radial maze

functional interactions

effective connectivities

dynamic causal modeling

1. Introduction

Studies have shown that an interaction exists between allocentric and egocentric navigation systems and human/animals spontaneously and flexibly use either of them while trying to find their way in an environment [1]. The allocentric navigation system has a world-centered frame of references and supports a “spatial strategy”. This strategy involves use of environmental landmarks to navigate within an environment by forming relationships between different landmarks and orientating in relation to those landmarks. Alternatively, the egocentric system navigation system mediates a “response strategy” by referencing spatial locations in the external world with respect to individual body space. This strategy involves executing a series of movements induced by specific stimuli and supports a navigation system based on repetitive stimulus-response (S-R) associations as part of habit [2].

Convergent evidence from electrophysiological, lesion, and imaging studies show that navigation is mediated by a network of brain structures. Also, it is known that different navigational strategies are subserved by distinct neural networks, with the hippocampus and caudate nucleus being their the main nodes [3]. It is believed that the hippocampus mediates spatial learning and the caudate nucleus serves as a neural substrate that forms response learning [4]. Furthermore, it has been shown that with reward-based learning, activation of mesolimbic areas is associated with receipt of reward [5, 6]. The prefrontal cortex (PFC), is a region in this area which receives projections containing emotional and motivational inputs as well as reward-dependent modulation from the ventral tegmental area. This latter brain region has a role in reward evaluation of stimuli by sharing strong anatomical connections with reward circuitry including the amygdala and ventral striatum¹( ¹ A distinct region in the ventromedial prefrontal cortex, which includes the orbitofrontal cortex, a critical structure for rewarded or goal-directed behaviors.) [7].

It is still controversial as to whether these navigational strategies and their underlying learning systems (or their associated brain regions) are independent/parallel or whether they functionally interact in either a competitive or a cooperative manner to solve navigational tasks. Emerging evidence support that the hippocampus and caudate may function cooperatively during the processing attributed to each structure; this has been verified by animal studies that show learning in one system compensates the limitations of the other one[8,9]. On the other hand, the idea of competing strategies in navigation[1,10,11] has also been reported based on experiments revealing that inactivation of one system enhances learning by the remaining system. Furthermore, it has been reported that, due to environmental circumstances, to achieve successful navigation, flexible switching must occur between different navigation strategies as required by specific tasks[12]. Examples include: immediate and spontaneous task switching after the appearance or disappearance of relevant sensory cues, or when a subject learns progressively across trials to prefer one type of cue over another.

Experimental evidence in support of interaction among navigational strategies came largely from behavioral and lesion navigation studies. Common neurobehavioural paradigms include: the development of selective brain lesions. These cause selective learning or memory deficits and subsequently motivate animals to perform prescribed tasks within designed experimental platforms. On the human side, however, investigating spatial learning and memory can be achieved via neuroimaging (e.g. functional magnetic resonance imaging (fMRI)) to discover unseen human-specific neurocognitive facts[13,14]. fMRI can be used to non-invasively map brain function by the blood oxygen level dependent (BOLD) effect and employed to assess both the differential and simultaneous involvement of brain regions and their functional interactions more readily. fMRI studies can also help disentangle the roles of reward in reward-based learning and are readily used for the study of de novo learning without the requirement of prior training, as is the case with animal studies[15].

Functional imaging has previously been employed to investigate the neural basis of spatial and response learning strategies. Such studies have considered different aspects including sex differences, age influences, strategy preference, neural substrate engagement, gray matter content, etc. Some general features of human navigation include: Iaria[1] reported that for a task where both strategies could be used, 50% of young adults spontaneously used spatial strategy and the remainder used response strategy. Furthermore, individuals who used spatial strategy showed significant fMRI activity and increased gray matter in the hippocampus[8], while those who used a response strategy showed significant fMRI activity in the caudate nucleus indicating the response strategy was positively correlated with caudate nucleus gray matter. Another study tracked the correlation of navigational strategies with gray matter as a result of cognitive decline due to normal aging[2]. Further, assessment of switching between navigational strategies by subjects occurred when they navigated within a virtual plus maze task and demonstrated a specific age-related deficit[16]. Lawton[17] has reported sex differences when subjects adopting navigational strategies. It was found that, men used the orientation strategy, whereas women preferred to rely on the route strategy. The male preference for an orientation strategy advantaged them in pointing accuracy and they achieved better results in a task that involved spatial perception[17,18]. Furthermore, by dividing navigators into four groups of normal volunteers including good navigators (males and females) and poor navigators (males and females), Ohnishi[19] has concluded that poor navigators were not good at allocentric orientation particularly the use of cardinal direction, rather, they relied on the egocentric route strategy. While, good navigators were good at the orientation strategy and obtained good scores on a maze task.

There is evidence that hippocampal and striatal systems are differentially involved in different tasks[20]. There are also reports on how they are differentially involved at different stages of the same task; for example, hippocampal activation initially dominates but switches to the striatum with practice. There are also some reports on the type of interaction. Brown[21] has discussed the existence of cooperative interactions in the form of increased functional connectivity for overlapping mazes compared to non-overlapping mazes. Further, navigation in a virtual maze has been found to rely on spatial or response strategies when tested during an fMRI study which showed decreased hippocampal activity in favor of caudate in older adults[22]. Existing interactions when solving the navigational detour problem were reviewed. The authors concluded there was reciprocal interaction between the hippocampus and the PFC while the extent of such interaction depended on the complexity of a new route that had to be considered[23].

However, within- and between-subject characterization of the coactivation of brain regions, where both navigational strategies have been recruited cooperatively or competitively, has not been as studied. In this sense, less is known about the neural networks involved in a reward based navigation attributable to both learning systems. Also, functional relations between the neural substrates within such a network are almost unknown. Beyond this, addressing the issue of effective connectivities within these networks has not yet been considered. For this report, a first-person computer-based virtual reality environment was constructed to simulate a navigation task within a naturalistic and large-scale space wherein participants were rewarded for using either a place, or response, or mixed strategy across different stages of a task. It is employed to characterize how functional interregional interactions differ when a single strategy was adopted by a navigator, or when strategies are employed cooperatively, or competitively. A hypothesis driven analysis is performed. Dynamic causal modeling (DCM) is used to estimate the coupling strength of casual interactions within the aforementioned network. It is hypothesized that strategy-dependent differences in brain network activity can be attributed to dynamic patterns of brain connectivity including causal interactions.

2. Materials and methods

2.1. Subjects

Eight healthy volunteers (3 male mean age: 21.6 years, SD : 3.1, range: 19—25; 5 female mean age: 22.1 years, SD : 2.3, range: 20-25) with normal or corrected-to-normal vision and with no history of neurological or psychiatric disorders participated in this study²(Prior to this experiment, ten other subjects were recorded behaviorally and electrophysiologically to stabilize conditions and resolve possible bugs.). They all studied biomedical engineering at QIAU University. Subjects were asked to rate their own experience of computers and specifically of computer games on a 10-point scale from very inexperienced to very experienced. All subjects gave written informed consent prior to the study which was approved by the local ethical committee. Since the imaging time was relatively long, subjects were instructed not to move during scanning and also to read the instructions very carefully to improve the quality of scans. They all understood instructions without difficulty and no subjects were aware of the hypotheses at the time of testing.

2.2. Experimental procedure

The eight-arm radial arm maze, with arms extending from a central starting location, was selected due to the variety of memory and learning paradigms it provides. A computer-simulated first-person virtual reality radial maze was designed to simulate navigational tasks in a large-scale space. 3-D virtual mazes were developed using the MazeSuite application[24] including one maze for testing spatial learning, one for response learning, and another for the mixed strategy task. The experimental procedure is now described.

Prior to the main experiment, subjects spent 60 seconds navigating freely within a simple virtual environment. It was comprised of a few rooms and halls which were different from those that were to be used for the main task. Subjects manoeuvred by pressing the four arrow keys on a keyboard. This keyboard was a MR compatible touch pad that was interfaced with the software platform. This phase let subjects practice motor aspects of the task and learn to use the keyboard. Pretraining was not included in the analysis.

Subjects were first informed that they would find themselves at the center of a virtual maze with eight identical runways/arms extending outwards. Hidden rewards were accessible at the end of some arms. Reward was an amount of money to motivate subjects to do detailed and subtle search relying on spatial memory and/or habituation. Subjects were supposed to follow each task according to given instructions in order to obtain all rewards. Subjects generally performed two ‘training-probe-control' runs; one for supporting spatial navigation strategy and one for response strategy. There was also a mixed strategy task with one training and two test-trial (probe) stages. During training, subjects were instructed to learn rewarded arms of the maze by finding and remembering the rewards hidden at the end of each of four arms. In probe trials, subjects had to pick up rewards by remembering signs and signals from the training stage and following instructions. By revisiting a rewarded arm or choosing a non-rewarded arm in probe trials, subjects would lose the acquired reward. Control conditions were interleaved between experimental conditions and aimed to provide contrast with experimental conditions for statistical analysis. A detailed description for each run is given below.

Session 1: A radial maze with short walls surrounded by natural scenery (including sky, mountain, tree and . . . ) was constructed (Fig.1a and 1b). The landscape provides visual cues by which participants could infer their orientation and movement path with no need for any intramaze beacons. Subsequent to the pretraining phase, the next phase was initiated by showing subjects within-experiment instructions. Additionally, to an overall explanation for each subject at the start of his/her scan, a black page with yellow writing appeared for 9 seconds at the start of each session of scanning with instructions for the subject. The spatial layout of landmarks including reliable extra maze cues were learned in the training stage. At the probe stage, the relationship between the rewarded arms and the landmarks should have been remembered to solve the maze and find rewards. Rewards were hidden at the end of arms 1, 4, 6 and 7 and became visible as subjects touched the terminal wall in each arm. Subsequent to contact, a written message appeared to inform the subject whether they had earned a reward or not (Fig.1d). Once the end of an arm was reached, subjects were automatically transported to the middle of the center platform to initiate a new trial. Each new trial started with a random orientation of the subject's viewpoint. The randomization process shuffled the initial viewing perspective and destroyed any perception related to the starting position. Hence, subjects were compelled to employ the extra maze cues to select the rewarded arms in this particular session. The hypothesis for this experimental task was that subjects would use a single spatial strategy.

Fig.1.

(a)-(f) are sample images selected from indoor mazes of the virtual environment constructed to test navigational tasks; (a) and (b) are sample views of extramaze landmarks. (c) shows the intramaze entry. (d) visualizes a successful trial ending in reward. (e) and (f) are views of spatial and response controls respectively. (g) and (h) are outdoor views of the maze for spatial and response learning respectively.

Once the number of trials were sufficient that a subject reached a training criterion, acquiring 7/8 rewards in two consecutive runs (each including eight trials), training was considered to be completed and subjects were moved to the probe stage. The maximum number of trials in the training stage was 32. At the probe stage, the layout was structured in a slightly different perspective when compared with the learning stage, e.g., the flying eagle was perched on a transmission tower or the sun had crossed over one or more arms. However, since the spatial relationship between the salient landmarks and the target pathways remained the same, participants were still able to find the rewarded arm even when the perspective was altered from that of the learning phase. Subjects were asked to pick up rewards from those arms that had been unrewarded during the learning period and they could keep them only if they did not chose a wrong arm or did not reenter a previously visited arm. A once-entry 120-seconds probe trial was administrated in which subjects were informed of their correct selection by an audio message heard after they had traversed a rewarded arm to its end. Provided all rewards were chosen prior to the predefined time, this phase terminates.

Following the probe, the black screen with yellow text reappeared and informed participants about entering a control condition. By this text, participants were informed that there was no need to learn either the layout of the environment or the spatial relationship between the landmarks and the target paths in the given case. Subjects were not supposed to make any effort to anticipate the rewarded arms as they were visible from the center of the maze and the participants could simply follow four arms with mounted boards on the ending walls. This control condition was named "spatial control" since it shared the same features as the experiment designed to support the spatial strategy. A period of 90 seconds (the average time that our test group needed to obtain all 4 visible rewards) was pre-set for this control conditions.

Session 2: In this phase, subjects were led to use the response strategy as no extra-maze cues or landmarks existed to orient navigation. According to Jacobs[25], for human navigation within virtual reality, spatial (place) learning, which is based on distal cues, is not induced by the presence of intramaze cues. For this reason, locating a sign, such as a stimulus inside the maze, supported navigation based on a response strategy. In this experiment, the ambient background, including the sky was smoothed, walls around the radial maze were raised to conceal the landscape, and landmarks were removed. At each trial during the training stage, a rotating butterfly appeared in an arm to signal that the particular arm was rewarded. Subjects needed to follow the cue to obtain the reward (Fig.1c). The butterfly cued rewarded arms equally but randomly. Similar to the previous experiment, four of the eight arms contained rewards, although the sequence of rewarded arms was different to that of session 1 (arms 2, 4, 6 and 7). Up to sixteen trials were subsequently administrated to train subjects in this case. Notably, as subjects had been informed that the initial viewing was identical for every entrance, it was expected that they could use a system of counting or patterning to remember rewarded arms and their sequence. The hypothesis was that rewarded S-R behavior occured as each subject gradually learned particular body orientations in response to stimuli.

Similarly to the first experiment, reaching the training criterion (probability of getting 7 correct choices out of 8, or an accuracy of 87.5%) in two consecutive (4-trial) runs led subjects to the next probe stage. Otherwise, training was terminated after a maximum of 16 trials due to time limitation. By completing the learning stage of this session, subjects completed a test stage. To solve the maze in this cueless test, subjects needed to habitually remember the learned pattern in the absence of the moving butterfly. Subjects were given a 120-second period to obtain rewards from those arms that were not rewarded during the training phase (1-3-5-8). They were informed about the presence of rewards via an audio message once to the end of a goal arm. This was provided that a subject neither reentered a previously visited arm or made any mistake in remembering the prescribed pattern, in which case the reward obtained would be lost. Subsequent to the probe stage, a control condition was started. For this visuo-motor condition, named "response control", subjects were instructed to follow the yellow arrows on the ground at the start of rewarded arms. The four arrowed arms were to be entered in a maze tour that lasted 90 seconds. As subjects were explicitly asked to follow a prescribed route, it was assumed that they did not try to remember which target arms to select.

Session 3: At this stage of the experiment, both navigational strategies had been tested separately. Subjects had already become aware that they might be rewarded either for going to the correct place (which is cued by landmarks) or for making the correct response (left or right turns). In the third session, a "mixed strategy" task recruiting both navigational systems was implemented. This experimental paradigm tried to provide a condition in which both type of strategies interactively directed a subject to select a rewarded path. Subjects were informed, by initial instruction, that the third experiment was distinct from and also more difficult than the two previous experiments and that to achieve successful navigation they had to refer to their memory and use what they had previously learned. Subjects were unaware of the test procedure from the beginning. They were trained with the trials of both the previous experiments. Trials were interwoven and there were no more than three consecutive trials of either strategy. Each subject was given an equal number ($< 12$) of spatial and response trials. As shown in Fig.1a and 1b, subjects were cued by salient landmarks outside the maze during spatial trials. Alternatively, in response-related trials, subjects had to rely on being guided by a butterfly that appeared inside a maze arm. A new pattern of rewarded arms was programmed so that participants could not habitually retrieve previously learned patterns. Since this task is difficult compared to the previous two, the criterion for learning was reduced to 62.5%; i.e. 5 correct selections in two consecutive trials.

Two probe stages were designed for this test; one for supporting cooperative behavior and the another for competitive behavior. The probe trials in this phase were aimed to measure how participants could solve the eight arm maze by adopting both navigational strategies in an interactive manner. Cooperation occurs when both strategies result in entry to the same pattern of arms. During competition, however, each strategy results in a different sequence of rewarded arms. Participants were supposed to find rewards in the absence of any further instruction. To provide a situation where competition dominates in the first probe, individuals entered into the maze with the same viewing angle as for the learning stage. However, the layout of the maze had been rotated by as much as three arms. In such a situation, execution of a learned pattern of turns would no longer correspond to those arms that were cued via landmarks during the learning trials. For example, assume that within learning trials, turning left from the start position pointed to a rewarded arm which was also cued by a tree. After the rotation of the environment, a cue was no longer seen following a left turn. This probe was designed such that by only relying on the guidance of landmarks, subjects would be led to choose four different arms from those accessed when their preferred response strategy was employed. Making more than one error in obtaining all rewards - by adopting the appropriate strategy - means that competition occurred between the learning systems. In the cooperation-related probe trial, the initial view was randomized and the number of extra maze landmarks was reduced. In this situation, use of the remaining landmarks as well as numbering the rewarded arms compensated for the lack of information. Finding more than 2 rewards in this test revealed that the subjects could cooperatively employ both navigational systems. It is notable that for the probes of the last session, participants were not made aware of their performance. The use of either competitive or cooperative behavior was confirmed by the information obtained from a subject gave during an inquiry session after the scan. A summary of the experimental procedure is given in Table 1.

Table 1 A summary of the experimental tasks described in the text

2.3. Image acquisition

Subjects were scanned at the Tehran Emam-Khomeini hospital with a functional magnetic resonance imaging scanner while they were navigating in the virtual maze. Each scanning session lasted approximately 25 minutes in which participants were given experimental and control trials sequentially. Images were acquired using a 3T Siemens Magnetom TrioTim scanner (Siemens AG, Medical Solutions, Erlangen, Germany), 32-channel head coil. Subjects' heads were immobilized with an air cushion in the coil. Unfortunately, using large high resolution coils prevented subjects from wearing goggles for observing the monitor directly. To overcome this problem two connected mirrors with an angle of 60$ ^\circ $ mounted above the head coil were employed to allow subjects to see a screen in front of them on which the virtual environment was projected using a video projector. First, T1-weighted sagittal localizing scans were acquired for 10 minutes which were used for structural analysis. High-resolution anatomic images were acquired using a three-dimensional (3D) magnetization-prepared rapid acquisition with gradient echo (MPRAGE) sequence with repetition time (TR) = 1800 ms, echo time (TE) = 3.44 ms, inversion time (TI) = 1100 ms, flip angle (FA) = 7, in-plane resolution (IPR) = 1 $\times$ 1 mm, in-plane matrix (IPM) = 256 $\times$ 256 , field of view (FOV) = 256 mm with 176 sagittal slices of 1 mm thickness. Then functional T2^* -weighted BOLD images were acquired with a multi-band slice accelerated gradient-recalled echo planar imaging (EPI) sequence and the following parameters: TR = 3000 ms, TE = 30 ms, FA = 90, FOV = 220 mm, 64 axial slices of 3 mm thickness, IPR = 3.4 $\times$ 3.4 mm, IPM = 64 $\times$ 64 .

Since the number of trials differed for each subject to reach the learning criterion, the scan number varied from subject to subject. Hence, custom software was employed to record scanner frame times as well as keystrokes made by the experimenter. Images belonging to each stage of each session were reclassified and a few scans were discarded.

2.4. Image preprocessing

The signal changes due to the BOLD effect are small and noisy and susceptible to artefacts, therefore, a number of signal/image processing steps are required. Data was preprocessed using Statistical Parametric Mapping-SPM12 - software developed by The Wellcome Trust Centre for Neuroimaging at UCL (http://www.fil.ion.ucl.ac.uk/spm/). On average, 1800 fMRI volumes were acquired for each subject, of which the first five volumes were discarded for T1 equilibration. A few scans related to acute movements were also discarded. The T1 datasets were transformed to the standard Talairach stereotactic space of using an EPI template. BOLD images were reoriented to set the origin at the anterior commissure. The preprocessing procedure was done in the following order: (1) Images underwent slice time correction relative to the first slice acquired. (2) Realignment and unwarping was performed using the first slice of each run as a reference to correct head movement effects. (3) Data for each subject was next matched with their own individual high resolution structural image. In this step, a $ {T2^*} $-weighted mean image of unsmoothed images was co-registered with the corresponding anatomical T1-weighted image of the same individual. (4) The individual T1-image was used for segmentation to derive the transformation parameters for stereotaxic space using the SPM template³(Montreal Neurological Institute (MNI) Template. Segmentation using SPM requires spatially-aligned prior tissue probability maps.), which was then applied to the individual co-registered EPI images. (5) Images were then normalized to the standard templates (NMI). (6) Spatial smoothing was undertaken using a full width Gaussian kernel at half maximum 8 mm.

3. Analysis

3.1. Behavioral data analysis

Subjects participating in the study could solve the maze task at above chance levels. However, different success rates were obtained. After scanning procedures, they were debriefed about their experience and the behavior that they pursued to accurately select the rewarded arms and their preference for using extra-maze landmarks and/or their learning. Some specific questions were then asked to identify the kind of interaction that occurred among the navigational systems. The scoring of these questions, along with the measured performances, were employed to classify the accomplished task as spatial, response, or mixed strategy. An overview of the subjects' behavior while undergoing testing is now given.

At the first experiment, where spatial strategy was exploited, subjects reported locating rewards by referring to the landmarks and their relationships; e.g. a reward was at the right of the tree or a reward was found at the end of an arm running into the rock. Those who thoroughly used spatial memory did not mention the idea of same start position at center, or its use. In the second experiment, subjects learned how to use the same starting position and link rewards to the initial view. In this task, subjects remembered statements like these: turn to the right as much as three arms from the starting position or use a sequence of 1-4-5-7 to count the rewarded arms clockwise. After the scan, only, two subjects could verbalize the sequence of left and right turns that they had employed.

Five subjects reported that in the second experiment it was easy for them to memorize the location of the egocentric cue (the butterfly). While, others remarked that they preferred to remember the representation of environment and relationships between landmarks. Performance of each subject within the first and second experiments was assessed quantification of how correctly he/she could learn and retrieve each strategy. Various factors including, the number of trials taken to reach the training criterion, the number of correct arms visited in the retrieval phase, the average time to finish a trial in each phase, and the path length of each probe was measured. Also, any change in the slope of learning was measured by summing up the scores obtained at the end of each run over the total number of trials required to reach the pre-set learning criterion. Subject's average rating on these measures was reported Table 2. During the encoding or training stage of each experiment, subjects could learn the reward contingency of the arms up to the predetermined criteria. On average, participants required 12.87 $ \pm $ 2.35 and 11.37 $ \pm $ 2.1 trials to reach learning criteria for the spatial and response strategies, respectively. Moreover, by performing response learning, participants scored shorter average latency compared with the response learning experiment (p < 0.05). Statistical comparison of performance among single strategies was made with a paired t-test to determine whether engaging in special or response learning differed in terms of time and/or the number of trials. This test was significant ( T = - 2.5, p = 0.05 ). Meanwhile the retrieval or within probe stages, the behavioral performance in adopting a navigational strategy, was then evaluated. Average performance of retrieval was 62.5% for spatial and 71.8% for response learning. However, statistical comparison of retrieval tasks showed no significant differences between the allocentric and egocentric recall accuracy (T = 3.2, p = 0.1 ). Thus, time taken to finish spatial or response probes were not significantly different. Subjects were significantly faster in retrieval than in the corresponding encoding condition (paired t-test, p < 0.01).

Table 2 Behavioural performance of two single strategy tasks

Performance Factor	Spatial strategy Response strategy
Number of trials to reach learning criteria	12.87 ± 2.35	10.37 ± 2.1
Number of correctly visited arms	3.125 ± 0.8	3.375 ± 0.7
Average time to finish trial	13.46 ± 4.31	11.37 ± 3.65
Path length of each probe trial	19.44 ± 6.81	17.21 ± 4.83
Slope of change in learning	0.65 ± 0.14	0.70 ± 0.07
Time taken to complete each test	96.2 ± 11.3	88.87 ± 14.8

For the mixed strategy task, learning criteria were achieved within 15 $ \pm $ 4.15 trials on average. the learning of mixed strategies was measured by whether performance improvemed or deteriorated. When both strategies are cooperatively employed, it is expected that such performance is reflected by improved accuracy. For those who were not helped by the alternative strategy, the accuracy rate decreased significantly. The competition trial, on the other hand, was characterized by decreased accuracy. In the competition probe, accuracy is defined as the ratio of the number of correct selections from one category⁴(Either spatial or response; whichever is greater.) to the total number of rewards. Information about behavioral measurements and the subject's own statements after the scan were combined to determine the strategy that each participant chose to solve the maze. This makes sense as the mixed strategy task was more complex and confusing than either single strategy task.

Cooperative trials led to greater pattern completion; participants had to turn at the center to find a salient landmark, use the environmental landmark(s) to orient themselves and then count or pattern the arms to reach a reward. Hence, more time elapsed. This was significant in four out of eight subjects, i.e. only four subjects could interchangeably use either strategy. The latency during retrieval was also increased. Significantly fewer successful trials compared to single spatial strategy ($p<$0.05) and response strategy ($p<$ 0.01) were observed. Competition trials, on the other hand, were characterized by increased latency and decreased accuracy compared to using the single spatial (in 6 subjects ${T}=3.6$, $p<$ 0.01) and single response strategy (in 5 subject ${T} = 2.8$, $p<$0.03). In summary, when both strategies competed to form a behavior, participants noticed a mismatch between habitual learning and landmark related learning. Hence, they committed more errors and needed more time to solve the virtual maze compared to using a single strategy. In the competition probe, memorizing the egocentric cue was more frequently reported by subjects (88%) than remembering the layout of the environment (65%).

3.2. NeuroImage analysis

3.2.1. Maps of brain activity

fMRI images were analyzed using statistical parametric mapping (SPM). Functional MRI time series were often modeled using a general linear model (GLM) to conduct individual first-level parametric analysis for each subject. Relevant contrasts parameter images were then generated and subsequently subjected to a second-level random effects analysis to provide statistical inferences. The following statistical models were developed to detect and evaluate functional activations:

(1) An analysis of Performance-Independent Effects was conducted by contrasting both navigational strategies within learning stage. This analysis aimed to investigate patterns of functional neural activity related to separate usage of spatial or response strategies during learning. A GLM was designed including conditions of events formed by trials of both type of learning. For constructing the design matrix and its conditions, trial to trial scans were required. Fortunately, it was very straightforward to separate images related to every single trial-initiated at the center of maze-using “Maze analyzer” module of the Mazesuite. As the number of training trials varied between sessions and participants, the last eight trials of each task were analysed. The GLM included two regressors each formed by trials of the last two consecutive training runs that a participant had performed during the first two sessions. The time series were high-pass filtered (minimum cutoff period 128 s) to remove slow signal drifts and modeled as the weighted sum of regressors corresponding to effects of interest and potentially confounding factors. These regressors were used as covariates in the GLM and entered into the design matrix along with regressors based on estimates of head movement (the six motion parameters) obtained from the realignment procedure. The inclusion of regressors based on movement estimates prevents confounding factors from affecting the parameters estimated for the effects of interest.

After fitting the best model, subject-specific parameters pertaining to each regressor ($\beta$ s) were calculated for each voxel under the GLM assumption. Main effects of spatial and response learning were linearly contrasted by applying a $t$ contrast vector to the parameters to determine whether the estimated contrast is significantly different from zero. To threshold the statistical maps, the significance level was set at 0.05. In this test, two contrasting images were created for each person (Fig.2). They depicted the neural activity maps associated with spatial $>$ response (c) and response $>$ spatial (d). Table 3 presents the Talariach coordinates of the voxels of peak activation, the voxel sizes detected, and t values. Significant clusters were resolved into peak-height of the local maxima, and local maxima significant at the uncorrected level $p<$ 0.05 are reported. As can be seen, learning with spatial strategy is associated with activation of regions including hippocampus, orbitofrontal cortex, parahippocampal gyrus, middle temporal gyrus, cuneus and left crecuneus. Increased activity within the right hippocampus was more pronounced than that within left hippocampus (paired $t$-test using mean beta-values; $t(18)=3.71$, $p<$ 0.07). Significantly strong activation in the bilateral parahippocampal gyri and orbitofrontal cortex was also observed. On the other hand, it can be seen that response learning is associated with greater BOLD activity in prefrontal cortex (PFC), with activity more pronounced in dmPFC. Significant activity in the caudate nucleus ($t$=3.46, $ p<$ 0.06; Fig.2a was also observed. Increased activation in this region was expected as it has been consistently shown to be involved in tasks which require response learning. Further areas with increased activity included: the left postcentral gyrus, the left anterior insula, and bilateral parietal lobule. For the specific analysis of each cluster, percent signal change (PSC) values were computed for each cluster and are shown as bar graphs below each rendered image in Fig.2e and 2f. The error bars indicate BOLD activity changed across subjects in all the identified regions of each activity map.

Fig.2.

Activated regions for two main effect of encoding, spatial $>$ response (c) and response > spatial (d), are displayed via whole-brain three-dimensional rendering on the cortical surface. Highlighted voxels are significant at uncorrected p $<$ 0.05. (a) and (b) are sub-views showing activated voxel selected from two sample clusters: (a) displays the first eigenvariate of the extracted BPLD signal in PFC and (b) plots the adjusted data and fitted response for precuneus. Bar graphs in (d) and (f) present percent signal change (PSC) values for each cluster in spatial learning and response learning respectively. Regions in abbreviation are (CA: caudate, MPC: medial prefrontal cortex, SPL: Superior parietal lobule, IPL: Inferior parietal lobule, PoCG: Postcentralgyrus, PUT: putamen, AC/MFG: Anterior cingulate/medial frontal gyrus, HIP: hippocampus, OFC: orbitrofrontal cortex, PHG: parahippocampalgyrus, MTG: Middle temporal gyrus, PCUN: precuneus, CUN: cuneus).

Table 3 Regions with increased activity during learning with spatial response strategies. The table shows Talairach coordinates, the t values referred to the peak voxel of each cluster and the cluster extents given as numbers of functional voxels

(2) The next analysis looked at brain activity in recall (probe) trails of each individual strategy. This analysis evaluated performance-related activation. In this case, Activations were modeled within a number of selected areas in order to increase the sensitivity of analyses. Regions of interest (ROIs) were selected for this test by focusing on those regions that showed increased functional activation in the previous analysis (Fig.3a). I.e. regions which were involved in successful learning of each navigational strategy, were targeted within the left and right hemispheres. These ROIs (with the number of functional voxels) were selected: hippocampus left (50) and right (40), Orbitrofrontal cortex left (55) and right (30), Parahippocampal gyrus left (40) and right (40), caudate left (40) and right (35), Putamen left (35) and right (35), dmPFC left (100) and right-sided (80). The average BOLD time series was calculated for each selected cluster of voxels when each of the strategies was administrated. Then, the statistical significance of activity correlation among navigational strategies was assessed over each selected ROI. The Pearson correlation coefficients were computed among both series for each brain regions. Fig.3b depicts the results of correlation analysis and the corresponding levels. Post hoc statistical analysis informs about the significance of correlation sexist in the activation of right hippocampus, bilateral OFC, mPFC and caudate when both strategies were individually involved. This test is one step toward the determination of regions that might be activated in exploiting the interactive strategies.

Fig.3.

(a) Shows the axial, coronal and sagittal views of selected ROIs (regions with strong activity) and (b) Bar plot which graphically displays the computed Pearson correlation coefficients among a series of both single strategy probes for each ROI.

(3) Another functional analysis was done based on univariate SPM second level analysis which introduces the following DCM analysis. Six experimental conditions were included in this SPM. Two were learning conditions derived from trials of the encoding phase of the last session. Event times for these conditions were selected from the first repetitive trial of each learning type until the end. Spatial and response trials were separated in this phase to create strategy-dependent learning conditions. A reward condition was also defined for this SPM to be comprised of images taken when a reward was delivered at the end of a trial until a few seconds to the following trial. The last two task-dependent conditions, referred to as competition and cooperation, were subsequently definedfor probe trials. To increase accuracy, only those images taken in a region concentrated at the center of the radial maze and that covered 30% of each arm were entered into the GLM. The idea is that interactions among strategies are formed when a subject is making decisions at the center of maze or early following arm entrance. Estimates of head movement from the realignment procedure were also included in the design matrix. Regressors corresponded to stimulus functions convolved with a canonical hemodynamic response function. Subsequently, the GLM was estimated. In this analysis, the main effects of competition and cooperation were tested to identify significant activations. The size of the effect for both kinds of interactions was estimated using the GLM described and applied to each subject. As it can be inferred from the table given in Fig.4, the triple regions of hippocampus, striatum, and mPFC show significant activation at the uncorrected cluster level (p < 0.05). Further, in order to determine wheather the contrast in activation is seen on average, a second level analysis was undertaken. Random effects group analysis (SPM second level statistical analysis) was executed such that a contrast image of the first level analysis from each subject was fed into a GLM that implemented a one sample t-test. Statistical significance for the second level group analyses was defined as family-wise error-corrected cluster probability (p) less than 0.05 (two tail). The group effect size in this analysis, I.e. mean across all subjects, indicated a significant activation in allmentioned regions due to the interaction of strategies ($t <$ 0.05). It is worthwhile mentioning that the individual statistical parametric maps of this test, as computed within-subject contrasts, are employed for subject-specific ROI identification in subsequent effective connectivity analyses.

Fig.4.

The procedure of design matrix construction for the third functional test. (a) Two dimensional view of maze analyzer software; a region was defined around the center to select more realistic images. (b) Table extracted from Mazesuite software which indicates the entrance and exit time to the defined region.Information in table (b) has been turned into a graph (c) to visualize events during the whole probe. (d) and (e) are SPM graphics representing the brain maps and design matrix, respectively. Lower table describes significant clusters found during the interactive task for a subject.

3.2.2. DCM of navigational cooperation and completion

To characterize and compare context-dependent inter-regional interactions, this paper goes beyond the assessment of functional connections among neuronal regions by elucidating effective connectivity (i.e. the directional influences that regions have on each other). To determine how the hippocampus, striatum, and mPFC interact to control navigation, it is necessary to understand how information propagates through these regions; whether they compete via mutual inhibition or interact in such a way that activity in one region excites neuronal activity in another.

DCM, as a data-driven technique, is a general framework for inferring processes and mechanisms at the neuronal level from measurements of brain activity[26]. DCM is not directly employed for the measured time-series; rather, in DCM, the hidden neuronal dynamics are modeled and combined with a forward model that translates neuronal states into predicted measurements. In fMRI studies, this modeling is generally used to estimate which model of neuronal regions and interregional interactions best corresponds to observed hemodynamic responses. In particular, DCM can measure effective connectivity specific to certain experimental conditions. In summary, as generative models of brain responses, DCMs provide posterior estimates of effective strength of connections and their activity dependent modulations[26].

In DCM, a mathematical model of underlying neuronal connectivity among a priori selected sets of brain regions (called DCM nodes) is defined by a system of bilinear differential state equations with coefficients specified via three different sets of parameters: (i) input or extrinsic parameters that quantify how brain regions respond to external stimuli, (ii) endogenous or latent parameters that characterize context-independent inter-regional interactions, and (iii) modulatory parameters that measure changes in effective connectivity induced by the experimental conditions. The connectivity parameters can be estimated by fitting a “generative model” to the measurement data. For the key concepts and methodological issues associated with DCM, see[27,28]. for a mathematical explanation of DCM and its Bayesian statistical foundations, see [29,30].

Here, DCM was employed to model a network that mediates a mixed-strategy navigation task to characterize how functionally-related brain regions influence each other (directionally or reciprocally) and how neurophysiological mechanisms in terms of the system’s learning interactions are encoded by specific parameters (connectivity, strength). More importantly, it was proposed to determine whether there any differences within the effective connectivity of a network that underlies navigational strategy interaction when the type of interaction is either cooperation or competition; i.e. whether the modulatory inputs related to strategy cooperation or competition results in different coupling structures. The hypothesis is that in spite of existing similarities in structural and functional connectivities, some differences might be found in causal connectivities within a network of hippocampus, striatum and mPFC and subsequently some inferences could be made about parametric modulations occurred under the influence of experimental inputs/stimuli related to navigational strategies. Notably, solving mazes either by spatial or response or mixed strategy exerts different stimuli via extra maze landmark or intramaze beckon.

The nodes for DCM were selected based on previous neurocognitive knowledge about the active regions reported in navigational related tasks as well as the results of the functional analysis done in the previous section to define activation clusters via group analysis. The SPM described in the previous section was used to extract ROIs. The treshold of brain activations to be selected as DCM nods was set to uncorrected cluster p < 0.05. The cluster-defining t-threshold was also set to t = 3.5. Spheres of 8 mm radius around the individual peak activation of each region were inspected and subsequently the functional time series of the BOLD signal was extracted from each ROI in a subject-specific manner. By extracting the principal eigenvariate of each ROI, their functional activities were abstracted into a time-series. The same ROIs were used for each subject.

Following node selection, the structure of the DCM was designed. A fully connected network model was constructed with bidirectional connections between all regions. Inferences associated with model structure were not considered here. Instead, emphasis was placed on characterizing the context-sensitive modulations of model parameters. The model space is shown schematically in Fig.5, which indicates bidirectional effective connectivity amongst the hippocampus, striatum, and mPFC. This model is structured by direct input, as reward, to the mPFC, learning-related inputs that drive endogenous connections (spatial learning affects HIPP $ \leftrightarrow $ mPFC connections, whereas, the response learning affects STR $ \leftrightarrow $ mPFC connections), and modulatory effects of competitive or cooperative connectivity targeted connections between hippocampus and striatum. Four alternative models⁵(Two modulatory effects on two directional connectiivities.), that shared identical intrinsic connectivity, were specified. DCM uses an expectation maximisation (EM) algorithm to produce probabilistic estimates of the expected value of each parameter. The procedure of Bayesian model selection on the basis of free energy approximation summarizes "log-evidence" relative differences as posterior probability for each competing model[24]. Estimation over the four models was employed to find the highest posterior probability.

Fig.5.

Dynamic causal modeling: (a) Structure of model overlaid on the normalized brain template indicating bidirectional effective connectivity amongst the hippocampus, striatum, and mPFC. (b) Inputs to model including a reward input to the mPFC, two learning-related inputs driving endogenous connections and two modulatory effects of competition and cooperation targeting connections between hippocampus and striatum. (c) responses of nodes after model estimation.

The results of DCM comparison are illustrated in Fig.6. Four models, two with a cooperation modulatory effect (model 1 and model 2) and two with a competition effect (model 3 and model 4) were compared. As inferred, model 1 in which the cooperative modulatory effect is on direct interaction from hippocampus to caudate, was favored by Bayesian model selection in all participants who could employ the cooperative interactive strategy (Fig.6c and 6d). This means that when the model was modulated by cooperation effect of interest as an experimental manipulation, the connection from hippocampus to striatum was strengthen (significant connection strengths at 95% confidence were found) supporting the idea of co-activated systems. Parameters of the selected model were averaged over subjects using a Bayesian model averaging procedure [27] which are given in a table in Fig.6a. On the other hand, the competition effect does not drive any of the connections between the two regions (the connectivity parameters were not significantly greater than threshold of zero) supporting the idea that navigational systems operate independently in parallel. Albeit, connections from hippocampus and striatum to mPFC were strengthen (Fig.6b) by competition. Hence, it can be proposed parallel models with less effective connectivity preferably occur between hippocampus and striatum during the competitive task.

Fig.6.

DCM results: Models for cooperation and competition with almost identical architecture were compared, modulatory effect related to each interaction was assigned to HIPP $ \leftrightarrow $ STR connections. Model 1 includes cooperation effect on HIPP -$ {>} $ STR, Model 2 includes cooperation effect on STR -$ {>} $ HIPP, Models 3 and 4 include competition effect on HIPP -$ {>} $ STR and STR -$ {>} $ HIPP respectively. The BMS results were shown via bar plots on the right ((c) and (d)) and the table on the left top (a) reports the parameters of the winning model (a) as mean strength in Hz. Table in (b) also presents connections for a competition model (Models 3) for comparison.

4. Discussion

In the last two decades, little multidisciplinary functional neuroimaging research has aimed to identify the neural substrates and their interactions that subserve navigational learning from different perspectives. Here, it has first been answered how brain regions are functionally coupled with each other to mediate reward-based spatial and response learning, either independently and/or interactively. Further, analysis of effective connectivity was addressed to supplement complementary functional connectivity analyses by considering directional effective connectivities within a plausible network of brain regions. Assessing the modulation of effective connectivities in such network by experimental manipulation is very new in this field.

In goal-directed navigational tasks, the causal efficacy of the path that is taken, and the resultant outcome (i.e. reaching to goal) given the current state or context, is perceivable. The navigator selects the best choice and regulates behavior using goal representations. The internally represented goal motivates the navigator to solve hidden-goal tasks in different situations including usage of either or both navigational systems as required. Despite the relative clarity of knowledge about goal-directed navigation behavior, the brain function and the neuronal underpinnings of such behavior remained largely unknown.

The functional analysis reported here pursued three hypotheses under different environmental conditions and across subsequent visits to the environment. It was tested whether spatial/response learning is hippocampus/striatum dependent and which other brain regions might be involved when each type of learning was individually applied. The pattern of brain activation for spatial learning confirms the activity of hippocampus, parahippocampus and OFC for data, which is extremely consistent with previously reported fMRI studies on spatial learning[31,32,33]. On the other hand, response learning, which was most successful among all participants, was associated with increased activity in striatum and mPFC, confirming previous findings on habitual stimulus-response learning[34,35,36].

Subsequent functional analysis, bases on the first test, explored whether the selected regions might reveal increased activation and significant correlation among both single strategies. A number of ROIs, depicted in Fig.4, were candidate for an a priori model of functional connections. The correlation between averaged BOLD activities (time series) for spatial and response probes was computed for these ROIs. The higher the correlation, the more probable that a region was involved in interactive navigational tasks. The third functional hypothesis was further related to interaction among navigational strategies. A random effects analysis was undertaken at the group level during successful trials when two strategies were interactively employed. Based on functional analysis, it was found that the hippocampus, the striatum, and the mPFC are functionally connected when strategies either compete or cooperate. However, the characteristics of such connections and their variability are not then clear.

Consequently, the DCM technique was employed to quantify dynamic context-dependent modulations of connectivity. Such analysis may indicate the direction of information transmission between selected regions. The result of this analysis confirmed effective connectivity between hippocampus and striatum when navigational strategies cooperate. Although, similar patterns were not found for the alternative interaction (i.e. competition). This observation suggests that connectivity within a dynamic causal model of human navigation fundamentally differ when cooperation occurs-emerging regional cooperation or when competition occurs-causing antagonistic function. Therefore, effective connectivity between hippocampus and striatum might reveals aspects of the neural basis of navigation along with functional connectivity, when neither of them are attributed to structural connectivities. In[36], authors have discussed that the functional connectivity among hippocampus and caudate may stem from their common engagement with the frontal cortex. They supported the hypothesis that the frontal cortex serves as a common link between hippocampus and caudate. This corresponds to what was concluded here for competitive interaction among strategies, by showing increased reciprocal hippocampal and striatum connectivity with mPFC during successful competitive trials, wherein the stream of information transfer is toward/from the mPFC. Nevertheless, it is not yet possible to conclude their is homologous involvement of hippocampus and striatum in navigation. This means that the common engagement of brain regions, especially for cooperative tasks, even functionally and/or causally, does not mean that they have similar or equivalent roles in the interactions subserving navigation.

5. Conclusion

The exhaustive experimental paradigm of this study allowed the testing of a variety of hypotheses about the activation of brain regions associated with reward when spatial or response learning was independently or interactively exploited. Inclusion of multiple training, probe and control conditions for the adopting of different navigational strategies, in isolation and/or combination, was the strength of results reported here. The research highlights new findings about brain connectivity subserving human navigation.

To assess and discuss the results of such multi-stage testing, several points should be considered, and a few are noted here. Although every effort was made to ensure that any confusion or additional cognitive activity did not produce activation during the experiments, it is probable that not all efforts subjects made to solve the maze under controlled conditions were eliminated, including the use of alternatives. Therefore, further qualitative and quantitative evaluations and validations are required. This kind of experiment should be performed with a large testing group as subjects are highly variable their behavior, including their selections and actions and their adoption of a variety of cognitive strategies. More importantly, to ensure that the random effects analysis represent that the model parameters are probabilistically distributed in the population, it is very appropriate that such research should be tested on more data. However, such research is not feasible due to limitations in fMRI access. Furthermore, the natural variability in the strategies that were taken by navigators needs to be considered. In addition to experimental deficiencies, some conceptual issues are likely also present. There are many alternatives for the generative model of effective connectivities underlying the recorded data. Specificaly, for a task as complex as reported here, the balance between model complexity and accuracy is debatable. Although Bayesian model selection aims to select the best among a set of plausible alternatives, there are still concerns about model space, couplings, and modulatory effects with DCM modeling. Therefore, it is an open question that requires further organized research guided by hypothesis driven experiments.

Acknowledgments

This paper was funded by Qazvin Islamic Azad University. I thank Dr. Esmaeilzadehha very much for his regular cooperation and assistance. Also, many thanks to Dr. Oghabian who put at our disposal the imaging related equipment. I am indeed very thankful to the "Wellcome Trust Centre for Neuroimaging" for providing available educational videos and texts on fMRI data processing which extended the horizons of my view on this broad and complex subject. The data for this research was obtained with great difficulty. To do the experiments and obtain images of each subject, required the hospital to be paid as much as scanning a patient without insurance.

Conflict of Interest

All authors declare no conflicts of interest.

References

[1]

Iaria

, Petrides

, Dagher

, Pike

, Bohbot

( 2003) Cognitive strategies dependent on the hippocampus and caudate nucleus in human navigation: variability and change with practice. Journal of Neuroscience 23( 13), 5945-5952. 10.1007/s11282-012-0081-5

12843299

7d19bd727b1be1eb80108a0c7e521ab1

http%3A%2F%2Feuropepmc.org%2Fabstract%2FMED%2F12843299

http://www.jneurosci.org/lookup/doi/10.1523/JNEUROSCI.23-13-05945.2003