Concurrency in electrical neuroinformatics: parallel computation for studying the volume conduction of brain electrical fields in human head tissues

Advances in human brain neuroimaging for high‐temporal and high‐spatial resolutions will depend on localization of electroencephalography (EEG) signals to their cortex sources. The source localization inverse problem is inherently ill‐posed and depends critically on the modeling of human head electromagnetics. We present a systematic methodology to analyze the main factors and parameters that affect the EEG source‐mapping accuracy. These factors are not independent, and their effect must be evaluated in a unified way. To do so requires significant computational capabilities to explore the problem landscape, quantify uncertainty effects, and evaluate alternative algorithms. Bringing high‐performance computing to this domain is necessary to open new avenues for neuroinformatics research. The head electromagnetics forward problem is the heart of the source localization inverse. We present two parallel algorithms to address tissue inhomogeneity and impedance anisotropy. Highly accurate head modeling environments will enable new research and clinical neuroimaging applications. Cortex‐localized dense‐array EEG analysis is the next‐step in neuroimaging domains such as early childhood reading, understanding of resting‐state brain networks, and models of full brain function. Therapeutic treatments based on neurostimulation will also depend significantly on high‐performance computing integration. Copyright © 2015 John Wiley & Sons, Ltd.


INTRODUCTION
Advances in human brain science have been closely linked with new developments in neuroimaging technology. Indeed, the integration of psychological behavior with neural evidence in cognitive neuroscience research has led to fundamental insights of how the brain functions and manifests our physical and mental reality. However, in any empirical science, it is the resolution and precision of measurement instruments that inexorably define the leading edge of scientific discovery. Human neuroscience is no exception. Brain activity takes place at millisecond temporal and millimeter spatial scales through the reentrant, bidirectional interactions of functional neural networks distributed throughout the cortex and interconnected by a complex network of white matter fibers. Unfortunately, current noninvasive neuroimaging instruments are unable to observe dynamic brain key geometry and conductivity factors that affect the model accuracy and compares the numerical methods. Section 7 presents our general approach to studying the source localization problem and discusses two parallel head modeling algorithms we have developed, along with their verification, parallel performance, and reliability. Results are given in Section 8 for these algorithms and their application. Our objective is to demonstrate current steps toward a future viable realization of the general solution framework using parallel HPC technology. Conclusions and future directions are provided in Section 9.

SOURCE LOCALIZATION
Modern dense-array EEG (dEEG) technology, such as the geodesic sensor net [23] from Electrical Geodesics, Inc. shown in Figure 1, can measure microvolt potentials on the human scalp at up to 256 sensors every 1 ms or less.
EEG signals are the consequence of postsynaptic activities of neuronal cells. As seen in Figure 1(right), cortical neurons are arranged parallel to each other and point perpendicular to the cortical surface. It is this structural arrangement that allows currents from groups of thousands of neurons to accumulate and generate an equivalent current dipole. Therefore, scalp potentials measured by dEEG can be modeled by the combined electrical potentials (called lead fields) produced by up to 10,000 or more cortex patches. Unfortunately, the scalp potentials are a linear superposition of all the distributed source lead fields and the individual EEG contributors (i.e., the distributed source dipoles) and must be disentangled to determine the dynamics of each brain region by solving a source localization inverse problem. Two general approaches are used to solve the problem: the parametric approach [1,24] and the imaging approach [7,[25][26][27].

The parametric approach
The parametric approach is based on the assumption that the scalp EEG signalˆE EG is generated by one or few current dipoles (less than 10) whose locations r qi and moments d qi (six parameters for each dipole) are unknown [1,24]. These parameters are estimated by minimizing the residual energy E.r qi ; d qi /, E.r qi ; d qi / D jjˆE EG .r/ ˆm odel .r; r qi ; d qi /jj 2 ; (1) using a nonlinear search algorithm [28] (e.g., simplex or simulated annealing). It is also possible to minimize the residual energy using a linear approach [29]. Here,ˆm odel .r; r qi ; d qi / is the lead field at sensor location r corresponding to a current dipole d qi located at r qi . The search starts with a specified set of parameters and proceeds iteratively. This involves solving the forward problem at each step. Various strategies can be applied based on the number of dipoles, which parameters are fixed, and whether to consider the time-series of the EEG data [30][31][32][33][34].
whereˆ2 R .N 1/ is a column vector gathering the electric potential differences at N scalp electrodes measured with respect to a reference electrode, J 2 R 3M 1 is the current density vectors at M cortical dipole locations, is a perturbation noise vector, and K 2 R N 3M is the lead field matrix (LFM). Every row in K is a lead field corresponding to a current dipole obtained by solving the forward problem. Given N -vector scalp EEG measurementsˆE EG at N electrodes and the LFM K, the goal of the inverse problem is to invert Equation 2 to find a linear inverse operator W such that where O J is an estimate of the current densities and W is the inverse linear operator. Because J andâ re linearly related, the inverse problem is reduced to finding a solution of a linear inverse problem for unknown magnitudes (vector J). This is a well-known formulation for numerous image reconstruction problems. The problem is as follows: (i) underdetermined, which results in the existence of infinitely many solutions; and (ii) ill-conditioned, which results in unstable solutions in the presence of noise. To overcome the first issue, methods impose a priori constraints on the solution to select the most likely one. To overcome the second issue, methods take regularization schemes into account. Mathematically, the distributed method obtains the inverse solution by minimizing the data fitting term with an added regularization term in least-square sense, where˛jjJjj 2 represents the constraints and regularization term, and jjˆE EG KJjj 2 is the data fitting term. The advantage of using L2-norm in contrast to L0-norm and L1-norm is that L2-norm solution is analytic and can be obtained directly by matrix multiplication, while using the other norms would require an iterative numerical process. The difference between different distributed inverse methods is in the choice and application of the constraints and the regularization scheme. From this formulation, solving the inverse problem is achieved in four steps: (1) Defining the solution space by deciding a priori the number and locations of the distributed dipoles inside the brain. (2) Specifying the number and locations of electrodes where EEG signals are sampled.
(3) Computing the lead field of the distributed dipoles at the electrode locations; mapping the solution space to the scalp space. (4) Applying an inverse algorithm to find estimates of the dipole moments that best describe a given EEG signal.

Reciprocity
The idea of the reciprocity theorem [35,36] is that the electric field throughout a volume conductor caused by injecting a unit current between two electrodes on the scalp can be used to determine the potential difference between these two electrodes caused by current dipole sources in the volume. This theorem reduces the calculation of the potential difference between two electrodes on the scalp caused by any dipole at any location and with any orientation to one forward calculation. This will reduce the required number of forward calculations to be equal to the number of scalp sensors. Mathematically, the potential difference between a recording electrode A and the reference electrode R on the scalp due to a dipole source at location r can be written aŝ where d is the dipole moment, E AR .r/ D rˆ.r/ is the electric field at location r caused by I AR , and I AR is the current flowing between the source and sink electrodes.

Neurostimulation
In the emerging field of transcranial electrical stimulation of the human brain [37] and in particular in transcranial direct current stimulation [38], it is important to predict and optimize the current densities in the cortical region of interest (ROI) delivered externally. One can see from Equation 5 that the normalized electric field is independent of the external current amplitude (ˆ.r/ is proportional to I AR ) and determined only by the head model geometry structure and tissue conductivities. This vectorial transfer matrix element, for electrode A, defines the local cortical field directionality and strength induced by external current injection between electrode A and reference electrode R. It can be used in the optimization of the active electrode pattern in transcranial direct current stimulation for the optimal delivery of the current to ROI. This is due to the linear superposition principle of the local field produced by injection between electrode A and any electrode B as the difference between T AR and T BR . As a result, the neurostimulation problem to deliver the target current density (or electrical field E) to ROI can be formulated as a linear problem, where I is a vector representing the current injection intensity pattern and T is a transfer matrix of dimension N e N d , where N e , is the number of electrodes and N d is the number of cortical dipoles. Typically, for dense array EEG, N e D 128 or 256 and N d D 2400. For the desired electrical field at ROI, E ROI , one should solve the L2-norm minimization problem similar to source localization optimization problem, For the oriented dipoles case, when dipoles representing cortex patches are normal to cortex and their location and orientation is fixed and known from the MRI (Figure 2), an immediate consequence of the dot product in Equation 5 can be derived: when the optimal normal stimulation is required for a given dipole patch, the optimal stimulating current injection pattern on the scalp is grossly defined by maxima and minima of this dipole forward field projection to the scalp. Indeed, the maximal recorded scalp potential for a given dipole will correspond to the stimulating electrode producing the maximal local field (and current density) delivered externally to this oriented dipole location. The same is true for the negative topography pole with the opposite sign. In practice, there are safety constraints imposed on the level of current to be injected in one electrode pair [39,40], and a typical safe current pattern is distributed over several electrodes to be found by solving the optimization problem in Equation 8 with the safety constraints.
What is important to emphasize in the context of this paper is the fact that the transfer matrix T is computed in the reciprocity mode of the generic LFM calculations as an intermediate product and can be outputted as a resulting file concurrently with LFM for source localization.

EEG DATA SAMPLING
Traditionally, the international 10-20 systems were the standard EEG recording system with only 19 recording electrodes. However, it became clear that this system was insufficient to capture the full scalp potential complexity and that a large number of electrodes would be required, particularly to improve source localization accuracy. Now, dEEG systems are available up to 256 electrodes commercially, and systems with larger electrode numbers are feasible. Several studies evaluated the influence of increasing the number of electrodes on source localization accuracy. It has been shown that an increase from 31 to 63 improves source localization accuracy significantly, and increasing the number from 63 to 123 can deliver further source localization benefits [41]. Another study indicated that interelectrode distances of 2-3 cm are required to avoid distortion of EEG signal [42,43]. However, the optimal number and locations of the electrodes depends on other factors as well. These include the accurate characteristics of the volume conduction (geometry and conductivity) and the specification of the solution space. Further, increasing the number of electrodes will likely improve source localization up to some limit. Beyond that, it could possibly degrade accuracy because of increasing sensitivity to noise.
Several outstanding questions still require further research. What is the optimal number and locations of electrodes to be used? How does it depend on other factors (e.g., noise, head model characteristics, choice of the solution space, and inverse algorithm)? Can nonuniform distributions benefit source localization? Will noise impose a limit? Can oversampling benefit noise reduction? Does the position of the reference electrode matter or not? How much is the effect on localization error and spreading? A direct and systematic evaluation of the optimal number of electrodes is needed and should be re-evaluated in conjunction with improving other factors.

SOLUTION SPACE
Before the application of any inverse algorithm, we must choose the number and locations of the distributed dipoles. The inverse solution highly depends on the choice of the solution space. Several choices and constraints are used and the optimal choice still requires further research and investigation. These choices include the following: (1) Uniformly distributed in 3D brain space.
(3) Uniformly distributed on the cortical surface, with restricted directions normal to the surface. (4) Restricted to vertices of a connectivity network obtained from tractography.
Further, the optimal resolution is still unclear and requires exploration. Resolutions of 7 and 5 mm are widely used, but are these resolutions sufficient, or is it merely the highest resolution that can be used such that computational complexity is sufficiently constrained? How does increasing the resolution affect source localization accuracy? The answers to these questions likely depend on the choices of other factors and choice of the inverse algorithm. For instance, in a four-shell spherical model, we only have one choice, because the cortex and GM are not available. It is likely that the effect on source localization is different when we use 32 electrodes compared with 256 electrodes.

ELECTROMAGNETICS FORWARD MODEL
Modeling the human head as a volume conductor defines the relationship between current source generators in the brain and the measured electrical potentials on the scalp. Given a volume conductor with boundary , current sources within the volume induce electric and magnetic fields which can be calculated on the surface. If the conductivities and the current sources S are known, the electric and magnetic fields inside the volume are fully described by the quasi-static approximation of Maxwell's equations-Poisson equation, r .x; y;´/r .x; y;´/ D S; (9) in with no-flux Neumann boundary conditions on the scalp: .r / n D 0: Here, n is normal to and D ij .x; y;´/ is the conductivity tensor. The solution of Equation 9 depends on the volume conduction properties, geometry, and conductivity. A complete realistic volume conductor model that captures the fine details is not expected. However, specification of its configuration is a key factor in improving the source localization accuracy. The aim is to identify and rank the most influential factors on source localization.

The geometry factor
The simplest head model consists of a single homogeneous sphere [44]. It is far from reality given the significant difference between bone and fluid tissues. As a first improvement, three-shell models representing brain, skull, and scalp are introduced [45]. They show good qualitative agreement with general EEG observations [46]. For further improvement, models that include cerebrospinal fluid (CSF) and GM in four-shell and five-shell models [47,48], and analytic solutions that handle radialto-tangential anisotropy are available [49,50]. These models capture the major tissue layers, and their simple geometry allows for analytic solutions [51]. However, they have obvious limitations. The head tissues do not have uniform thickness and conductivities [10,52], and the skull contains characteristics which are difficult to represent, such as sutures. Structural imaging such as MRI and computed tomography (CT) provide images of anatomical details with resolution better than 1 mm 3 . These images can be segmented to a number of tissues where each is assumed to have uniform electrical properties. The quality and accuracy of the geometric model is directly related to the imaging modality and the quality of segmentation. MRI is sensitive to soft tissue, while CT is sensitive to bones. Forward models obtained from these images have better accuracy compared to spherical models [53,54]. However, their computational complexity is significantly higher.
Several questions require answers and more exploration. How many tissues should be considered and which tissues? How do different geometric models affect source localization? How important is it to model geometric variations such as fissures and different types of bones? What is the required level of detail? The answers to these questions should be in the context of source localization. The influence of these factors impacts various algorithms differently. The importance of one factor cannot be understood in isolation of the others. Therefore, answering these and similar questions must be done via simulations that leverage HPC.

The conductivity factor
Once the tissues are identified from the imaging modalities, their conductivity model and values must be assigned. Unfortunately, the conductivities of the head tissues are poorly known, especially for the skull. In general, the conductivity of a biological tissue is related to the amount of fluid contained in it [55]. Tissues with higher fluid concentration are more conductive. Cell-free fluids such as CSF have a uniform and high conductivity [46,56,57], while compact bones have the lowest. The brain consists of GM and white matter. GM contains neuron cells and is accepted to be homogeneous and isotropic. White matter contains nerves that connect different parts of the cortex. The conductivity along the nerve is nine times greater than in the perpendicular direction (i.e., anisotropic).
The skull conductivity has been poorly known, and the published data are not consistent. Skull bones can be classified according to the fluid concentration in their material: compact bones having low fluid concentration and spongy bones having higher fluid concentration. Sutures are composed of materials that are highly rich with fluids. Therefore, sutures are highly conductive (as experiments confirm), and spongy bones are more conductive than compact bones [58][59][60][61]. The cortical part has a layered structure consisting of a spongy middle layer sandwiched between two compact bones. Measurements show that the lower layer is more conductive than the upper layer and the middle layer is much more conductive than the outer layers [59]. In addition to variations in the bone types, structural variations within the skull such as openings and thin regions have a large impact on the effective conductivity of the skull. These holes and openings are filled with nerves and body fluids, which provide electrical current paths to pass through the skull and consequently increase the effective conductivity. The structural variations effect becomes significantly important in infants and young children, where the skull is not completely developed [62,63].
The electrical properties of different head tissues are inhomogeneous and anisotropic. The skull and the scalp have a multilayer structure with each layer having different electrical properties. This structure can be either modeled as a multilayer structure with isotropic layers or it can be modeled as a single anisotropic tissue. How important is it to include these fine details in the model, and which features are most important? This question requires further analysis and investigations, so an efficient anisotropic solver is needed to enable thorough analysis. Section 7 describes a new algorithm which accounts for anisotropic tissues.

Numerical methods
Realistic head models improve upon spherical models. The boundary element method solves the surface integral equations instead of the volume partial equation. This approach reduces the dimensionality to 2D which improves performance. However, it is restricted to handle only homogeneous, isotropic, and closed tissue compartments. In contrast, the finite difference method (FDM) and finite element method (FEM) are based on digitizing the whole volume into small volumetric elements. Consequently, various modeling properties such as inhomogeneity and anisotropy can be handled, at the cost of computational complexity. The stiffness matrix becomes large, and only iterative methods can be used [64,65]. These are relatively inefficient because they require repeated application for each source configuration. Reciprocity can help in this case [35,36].
FEM is computationally more efficient than FDM because of the freedom in the choice of the computational points compared with the FDM's regularly fixed points. However, constructing a FEM mesh from MRI image is difficult and can be inaccurate because of the complex head geometry. The regular cubed images in FDM map directly to the computational grid. The FDM is also accurate, reliable, able to handle nonuniform conduction characteristics, and computationally efficient.
Given the background on the source localization problem presented earlier, the problem facing a general solution approach is how to understand, quantify, and ideally control (optimize) the different interacting factors (measurement, modeling, numerical, and resolution) that will determine the accuracy, quality, and reliability of the results. Model space exploration, sensitivity analysis, and uncertainty quantification are inherent challenges that are necessary to address, but pose significant computational requirements. The ability to formulate the evaluation methodology and simulation algorithms for HPC involves the fundamental tension between model complexity and computational resource availability. In this section, we describe our general approach to studying the source localization problem and identify two primary computational hurdles that will necessitate large-scale HPC solutions. We present two parallel FDM solvers that we have developed and show their performance on different computing platforms. Our objective is to demonstrate current steps toward an eventual future realization of the solution framework we are proposing.

Approach
Our approach toward source localization is based on the idea of providing a generic lead field matrix (gLFM) that can serve as generators of LFMs. A gLFM maps the amplitudes of generic dipolar sources to generic electrodes potentials. The generic distributed dipoles are placed at every voxel in the GM and the generic electrodes are placed at 1 mm 3 inter-spacings on the scalp. Different gLFMs are constructed for different volume conduction characteristics. Once gLFMs are computed, many different LFMs can be sampled based on different constraints or resolution imposed on the sources (e.g., constraint dipoles on the cortex), the number and locations of the electrodes, and the volume conduction characteristics. This can be achieved efficiently by sampling from the rows and columns of the gLFM appropriately. The idea is to factor out the common and computationally intensive part of the analysis from the application of different inverse algorithms. Because different gLFMs capture different volume conduction parameters, the influence of these factors, number and distribution of electrodes, and different constraints on dipoles can be analyzed in a unified way using, for example, sensitivity analysis procedures.
In the application of the inverse algorithm, a gLFM is selected based on volume conduction characteristics, as shown in Figure 3. Then LFMs are sampled from the gLFM by choosing the appropriate rows corresponding to the electrodes map. In the case of distributed dipole models, the appropriate columns corresponding to imposing constraints on the sources are sampled as well. Then different distributed dipoles algorithms can be applied. In case of the parameteric approach, the nonlinear search proceeds on the columns of the gLFM corresponding to the location of the dipoles and different orientations are considered by the linear superposition of lead fields corresponding to the three orthogonal directions weighted by the dipole orientation.  If we could run 1000 forward calculations simultaneously, we could reduce the time needed to 1 day. However, using the reciprocity principle, the number of forward calculations can be reduced significantly to the number of scalp electrodes (5000 per gLFM in our example). While this is a more tractable computational job, to perform global sensitivity analysis, at least 1000 gLFMs are required, increasing the total forward calculations by an order of magnitude (5,000,000 in our example). Again, all these calculations are independent and can be computed concurrently, but the computational resource burden is prodigious. In principle, assuming the availability of infinite resources, the time required to compute all these gLFMs is equal to the time required for a single forward solution. In practice, the available resources are limited, and consequently, the performance of the forward solver becomes the key limiting factor. Figure 3 shows the gLFM computation factored out from the evaluation analysis.

Conductivity inverse model.
The other crucial problem in the individualized head modeling is the determination of a subject's unique internal head-tissue conductivities. One approach to find these values is the bounded EIT (bEIT) method. In bEIT, low-frequency alternating currents are injected into the head through pairs of electrodes attached to the scalp. Then the response is measured on the other electrodes. Once an appropriate objective function describing the difference between the measured scalp potentials, V , and the predicted potentials, p , is defined (e.g., least square norm), a search for the global minimum is undertaken using nonlinear optimization algorithms (e.g., simulated annealing [66,67] or simplex search).
Using either optimization method, the search for the optimal conductivities requires a large number of forward calculations, on the order of 3000 to 4000 for a single current injection pair. Typically, we consider 60 pairs, which require approximately 200,000 forward calculations in total. (This is a reasonable expectation for four head tissues. The complexity increases for modeling more head tissues.) Because Poisson's equation is nonlinear regarding the conductivities, the reciprocity principle cannot be applied in this case. However, three levels of parallelism can be applied in the calculations. At the highest level, current injection pairs are independent and can be processed simultaneously given available resources. Each injection pair optimization can be parallelized, but is typically constrained to less than 10-way concurrency in the search. At the lowest level, the forward calculations are parallelizable.
Conductivity inverse modeling is necessary for human subjects. However, we might imagine enumerating the parameters across the expected ranges of tissue conductivities. The size of the parameter exploration space will depend on the number of range intervals (e.g., 1000) and the number of tissues being modeled (e.g., 10). For each set of volume conductivity parameters, gLFMs would be generated, producing as an auxiliary result impressed bounded EIT potentials on scalp for any current injection pair of electrodes.

Critical computational component
In both gLFM generation and conductivity optimization, a large number of forward solutions is required. In fact, the forward solver is the critical component in determining the computational viability of our approach. Therefore, any forward solver must be efficient, robust, and accurate. We have implemented two FDM algorithms to solve Poisson's equation (Equation 9) and present them below as example forward solvers optimized for parallel execution. The first is limited to isotropic conductivity of the tissues and based on the alternating direction implicit (ADI) method. The second can handle anisotropic properties of tissues. A parallel implementation of both algorithms using multicore and manycore computing architectures is described. Our objective is to show how current algorithm and computing technology can be applied to begin to realize the general methodological framework proposed.

Alternating direction implicit algorithm. The ADI method finds the solution of Equation 9
as the steady state of the appropriate evolution problem, @ @t C r .x; y;´/r .x; y;´/ D S: At steady state, the time derivative is zero, and the solution corresponds to the original problem. At every iteration step, the spatial operator is split into the sum of three 1D operators, which are evaluated alternatively at each substep. For example, the difference equations in x direction is given as [68], where is a time step and ı x;y;´i s the appropriate 1D second-order spatial difference operator. The finite-difference scheme is used over the solution domain by applying a rectangular grid with spatial spacings of h x , h y , h´in the x, y,´directions, and in time. Using the notation x i D ih x , y j D jh y ,´k D kh´, and t n D n for integer values of i, j , k, and n, the electrical potential at a grid point, .i; j; k/, at time, t n , is written as n ij k D .x i ; y j ;´kI t n /. The notation n q means the solution along the direction q in the time step n. Such a scheme has a second-order accuracy. In contrast with the classic ADI method, the multicomponent ADI does not require the operators to be commutative. In addition, it uses the regularization (averaging) for evaluation of the variable at the previous instant of time.
The ADI algorithm consists of a time iteration loop in which each time step is split into three substeps. In each substep, a tridiagonal system of equations is solved along x, y, and´directions. For instance, in the first substep, the spatial operator acts only on the x direction. This exposes twodimensional parallelism in that all N y N´equations along the x direction are independent and can be solved concurrently. Similarly, in the second substep, all N x N´equations along the y-direction and, in the third sub-step, all N x N y along the´-directions are independent and can be solved concurrently. However, at the end of each substep, all equations must be solved before proceeding to the next substep. This introduces a barrier between substeps.
The parallel algorithm pseudocode is shown in Algorithm 1. Implementation of the parallel algorithm in shared memory architecture using open multiprocessing (OpenMP) [69] is straight forward, where the time loop runs sequentially, and then in each substep, all OpenMP threads cooperate in solving the independent tridiagonal system of equations concurrently. Similarly, within a GPU architecture using compute unified device architecture (CUDA) platform [70], the time loop runs sequentially on the host, and a grid of blocks of threads is executed to solve the tridiagonal systems of equations on the device in each substep. Each thread solves a tridiagonal system of equations. The performance of the GPU code is mainly limited by the global memory access. Threads are coalesced in accessing global memory when solving in the y and´-directions. However, they are not coalesced when solving in x-direction. We used shared memory and intermediate computations which improved performance when computing in the x direction. While CUDA implementation uses single floating point precision, the OpenMP implementation provides either single or double floating point precision.

Vector additive implicit algorithm.
In the 3D anisotropic case, we use the vector additive implicit (VAI) algorithm as introduced in [71]. In this algorithm, a 13-point stencil is used to approximate the differential operator and order the variables. It includes two diagonally adjusted cells with one common symmetry point as shown in Figure 4.
For ordering variables, the calculation domain is split into a set of rectangular cells ordered akin to a 3D checkerboard. The algorithm proceeds by considering only cells of the same color at each where is iterative parameter, P 12 and P 21 are permutation matrices. The matrices A 1 and A 2 are in block-diagonal form with 8 8 diagonal blocks. These matrices are composed from coefficients of finite-difference scheme, and they are complemented parts of the finite-difference operator cells of the stencil. The structure of the VAI method is similar to the implicit block Jacobi method with a preconditioner in the form of a block-diagonal matrix with 8 8 diagonal blocks. Because of the colored-based iteration approach, each cell can be updated in place, avoid the need for replicated data structures. In addition, because each block can be processed independently, this approach is highly parallelizable as the algorithm pseudocode show in Algorithm 2. In a shared memory architecture, the iterative loop runs sequentially, and then at each iteration step, the OpenMP threads cooperate in computing the 8 8 blocks until the termination condition is satisfied. Similarly, in our GPU implementation, the iterative loop runs on the host and at each iteration step, a grid of blocks of threads is executed on the GPU, where each thread performs the computation of one 8

Normative brain database
The approach for addressing the complexity of the source analysis space through the systematic generation of generic LFMs, together with the head modeling algorithms presented earlier, provides a powerful theoretical and methodological basis for their application in practice. However, it is also important to have some existing knowledge of the population demographics regarding human head shape and brain geometries, especially for establishing normative expectations and models. This will allow the source analysis approach to link its outcomes to head/brain types and associated neurological conditions. We have created a neuroimaging database of normal subjects as a resource for understanding both healthy brain function and neurological disorders. The primary goal is to improve the analysis of functional activity of the human brain, primarily dEEG, although improved anatomical constraints. For each of the over 100 subjects in the database, MRI, diffusion tensor imaging (DTI), and dEEG data have been collected. CTs are also available for some subjects. An electrical head model is created, including cortical surface extraction and dipole tessellation, plus skull fitting from database CTs where necessary. Conductivity estimation is done using bEIT methods, with bone mineral density estimation included if a CT is available. Lead fields are generated for all subjects and used for source analysis. Tractography analysis [72] is done on the DTI data. All subject MRIs and tractography are registered with the brain atlases (based on the MNI 152 database), including thalamic, subthalamic, and striatal atlases. In particular, registration of the normative average of individual cortical surfaces is done using major gyral landmarks. The normative average is aligned with various cortical parcellations, including Brodmann areas.
The EEG data are collected for each subject for both resting and task protocols. The dEEG data are source-localized to the cortical surface, using sensor position information, also captured during the measurement. With these results, network analyses are done for both resting and task states, examining source coherence and phase measures with the major cortical and subcortical parcellation schemes. A key future work effort is to contrast the EEG (surface and source) measures of network function with the existing fMRI functional networks, such as described for 1000 fMRI resting datasets by Biswal et al. [73] and Yeo et al. [74].
The ultimate benefit of the normative neuroimaging database is in what can be learned from the source modeling and analysis in diagnostic support for clinical studies.

RESULTS
Evaluating the influence of the main factors that affect the accuracy of source localization and the extraction of conductivities of the head tissues through the bEIT technique requires a forward solver that meets three main requirements: (i) accuracy, in that it solves the Poisson equation accurately with complex geometries; (ii) efficiency, in that it allows studies to be conducted in a practical Figure 7. Cross-verification of using reciprocity principle in the computation of the lead field matrix using alternating direction implicit (ADI) and vector additive implicit (VAI)-isotropic solvers. The figure shows a random column from a lead field matrix computed using the forward model versus using the reciprocity principle for both ADI-isotropic and VAI-isotropic solvers.
software [77] developed at the University of Oregon for this purpose. We used cross verification of ADI and VAI in the isotropic setting to compute a LFM for each resolution case for known conductivities and current source. Although the numerical methods are different, we expect agreement for the isotropic case, as is verified in Figure 6 (left).
Using the computed LFMs, we compared isotropic with anisotropic methods by taking one column of each LFM and plotting the two projections of the same activated dipole. Figure 6 (right) shows the potential differences at the sensor locations. Figure 7 shows a comparison of the lead fields computed by reciprocity and directly by dipole forward projection using ADI and VAIisotropic solvers. Our future work will answer the question of how the potential differences affect source localization accuracy when considering the number of scalp electrodes, the solution space configuration, and the conductivities of the tissues.

Computational performance
The previous section shows that the ADI and VAI solvers produce accurate results, but computational performance is also important to enable our large-scale computational requirements. A Matlab version of both the ADI and VAI forward solvers takes several hours to compute a single solution for a 1 mm 3 head model, which prohibits the computation of even a single LFM. Figure 8 and Table I gives performance results for our OpenMP and CUDA versions of the ADI and VAI for colin27 at 1 mm 3 in terms of average iteration time. (Note, the CUDA and OpenMP codes were developed in-house at the University of Oregon and do not make sure of other numerical libraries, such as CUBLAS or CUSPARSE.) Both ADI and VAI are iterative solvers and will run at a minimum of 400 iterations before convergence with a 1 mm 3 head model. Averages of 500 iterations for ADI and 1000 for VAI are common in our experience. Increasing the cores for OpenMP up to eight continues to deliver performance improvement on the compute nodes we tested. It is clear that the GPUs deliver the best performance returns. While this is true, many nodes do not have GPUs. Thus, both OpenMP and CUDA implementations are important. The memory footprint for each solver is about 800 MB for the colin27 1 mm 3 , making both of them appropriate for most available GPUs.
Forward solvers are the core computational components for the conductivity inverse and LFM calculations. The conductivity inverse problem will need to process the bEIT measurements for up to 64 current injection pairs in the general case. Depending on the number of conductivity unknowns, each conductivity search for a single pair will require many thousands of forward solutions to be generated. Simulated annealing is currently used as the optimization strategy [67], and our parallel Figure 8. Single-node performance for 1 iteration of the alternating direction implicit (left) and vector additive implicit (right) solvers on ACISS and Mist clusters(see Figure 9). VAI completes a single iteration in 20% the time of an ADI iteration for OpenMP and 60% for CUDA. Multinode MPI performance is not shown, but forward solutions are independent of each other, so scaling is good for both gLFM and conductivity inverse problems when using either OpenMP or GPUs.  implementation will support up to 12 simultaneous forward solving. Clearly, conductivity results for each pair can also be done in parallel. The results from all pairs are then analyzed to determine final tissue conductivity estimates. The total computational requirements are prodigious, requiring over 180,000 forward solutions.
Computing gLFMs for all current dipoles is computationally intensive. Because a gLFM requires capturing scalp potentials corresponding to dipoles at any position in the GM and in any orien-tation, it is necessary to calculate the potentials corresponding to the three orthogonal x, y, and -orientations for each dipole location. Then the potential corresponding to any orientation can be constructed by superposition of the potentials corresponding to the three basis vectors. However, by using the reciprocity principle, we only need N e forward solutions to construct such a gLFM, where N e is the number of electrodes. Each forward solution provides the potentials at an electrode corresponding to all dipoles in the GM. We created an isotropic gLFM and an anisotropic gLFM for colin27 based on 1925 generic recording electrodes. This required 1925 forward solutions to be computed for each gLFM, by placing a current source at each electrode and the sink at a common reference electrode and calculating the potentials at that electrode corresponding to every dipole location. Thus, each gLFM is 2:1 6 1925 in size.
From a computational viewpoint, the gLFM generation is fully parallelized because the computation of every dipole forward solution (or when using reciprocity, the computation of the potentials corresponding to all dipoles at every electrodes) is independent, so the application is quite scalable. For instance, a run on the ACISS machine at the University of Oregon calculated an ADI LFM calculation in approximately 11 min using 98 GPUs running simultaneously.

Reliability
Both ADI and VAI solvers are reliable in the sense that anatomical structure of the geometric model of the human head can easily be captured without any preprocessing or mesh generation of the structural MRI and/or CT images. Both solvers handle accurately any fine details of geometric features such as skull holes at the available image resolution (currently at 1 mm 3 ). Once higher resolution images become available, no preprocessing or modification is required on these solvers. This flexibility is important, as it allows for studying the influence of structural details on the source localization solution. Further, in both solvers, the conductivity values can be assigned at the voxel level which allows differentiating the electrical properties at 1 mm 3 scale. This is important if we wanted to consider the influence of fine-detail characteristics, such as sutures. Also, placing dipoles anywhere inside the brain is a matter of placing a current source and sink separated by a voxel. In addition to this flexibility, the VAI algorithm allows for assignment of the anisotropic conductivity tensor at the voxel level. Of course, the price of all this flexibility is the computational performance. However, with access to sufficient computational resources, this scientific workflow could potentially scale to a level that would enable source localization for a large number of individualized head models.

Source localization
In this subsection, we demonstrate the capability of using gLFM in source localization using both parametric and distributed approaches. A gLFM that maps a generic distributed dipoles to a generic electrodes on the scalp is generated using the ADI solver. It maps 412,477 generic dipoles placed at every GM voxel to 5798 generic sensor locations on the scalp as shown in Figure 10. For each dipole location, we generated three lead fields corresponding to three orthogonal orientations of each dipole location. The size of the resulted gLFM is N e 3N d , where N e is the number of generic electrodes and N d is the number of generic dipoles. The resulted gLFM size is 60 GB stored in a binary file.
Sampling sensors. Down-sampling a number N e electrodes from the generic-electrodes is achieved by distributing N > N e points uniformly on the surface of a unit sphere placed at the center of the head as shown in Figure 10. Then a ray is casted from the center of the head through the points on surface of the unit sphere to outside the head. The closest generic sensors to the intersection point between the ray and the surface of the scalp is sampled. We adjusted the number of points on the unit sphere up and down until we get the desired number of sampled sensors. Figure 10 shows the generic scalp sensors and a 128-electrodes sampling. Parametric localization approach. In the parametric approach, the search proceeds to find the locations and orientations of one or few equivalent dipoles that minimize the residual energy between the measured EEG data and the model calculated lead field. Instead of computing the lead field corresponding to every trial point which is computationally intensive, we used the precomputed gLFM.
Once the optimizer generates a new trial point (three parameters for the position and three parameters for the orientations), the corresponding model lead fieldˆm odel is constructed by the following: (i) finding the closest GM voxel to the trial point position r g m ; (ii) extracting the triples lead fields corresponding to the closest GM voxel orthogonal directionsˆ; and (iii) taking the dot product of the dipole orientation with the lead fields. Once the lead field is constructed, the residual energy is computed as follows: E D jjˆE E G ˆM ODEL jj 2 C P enalty.ır/; where Penalty.ır/ is a function of the distance between the closest GM voxel and the trial point position ır. The goal of the penalty function is to force the search to remain in the GM. In this paper, we used the penalty function, P enalty.ır/ D 2 log.ır C 1/: Then, we used simulated annealing algorithm to search for eight preset dipole locations as shown in Figure 11. We considered two orientations for each dipoles, radial and tangential. The search carried out using different numbers of scalp sensors: 32, 64, 128, and 256. The separation of the HPC part from the analysis through the gLFM allowed us to set the simulated annealing solver parameters in a way to better explore the search landscape (number of objective function evaluations Figure 11. Eight preset dipole locations in the occipital, frontal, temporal-left and parital-right regions. In each region, a test dipole is placed on the surface of the cortext and another dipole is placed deeper in the middle region between the center of the head and the surface of the cortex.

Distributed dipole localization
In distributed model approach, in addition to sampling a number of electrodes from the generic electrodes, we sample the solution space from the generic dipoles to obtain a LFM. Sampling the solution space is accomplished by marching a cube of side length equal to the desired resolution. While moving the center of the cube voxel by voxel on the GM, if the cube contains no sampled dipoles, we sample the dipole location at the center of the cube. Otherwise we move the cube by one voxel. Figure 10, shows a distributed dipoles sampling at resolution of 7 mm.
Once a LFM is sampled from the gLFM, we can apply any source localization algorithm and any constraints. In this paper, we used sLORETA [78] algorithm to localize the preset dipoles locations as shown in Figure 11. For each dipole location, we considered 30 different orientations selected by uniformly distributing 30 points on a unit sphere centered at each dipole location as shown for the dipole in the frontal region in Figure 11. For each dipole location and orientation, we considered 30 different configuration of scalp sensors locations (each scalp sensors configuration is a sample from the ensemble of all possible configurations with N e sensors) for each number of sensors 32, 64, 128, and 256. Then, we used distributed dipole grid spacing of 6, 7, 8, 9, and 10 mm to localize each dipole location and orientation and scalp sensors configuration. We considered two measures to evaluate the influence of the number of scalp sensors N e and solution space resolution on the source localization accuracy. The first measure is the Euclidean distance between the actual dipole location and the center of gravity (COG) of sLORETA source estimate scores (localization error) defined as [79] COG DˇP and the second measure is the spatial spreading or blurring of the solution defined as where r test is the actual test dipole location, r i is the location of the ith source, and O J i is the estimate of the dipole strength at location r i .  . Visualization of the current density delivered to the primary motor cortex using 7-electrodes scheme in a dense array EEG montage. One source (yellow) and six nearest neighbors sinks (red).
were placed off the distributed dipoles positions. As the figure shows, the source localization clearly improves as the number of scalps sensors increases. The improvement were significant when the number of sensors increases from 32 to 64, but less significant when the number of sensors increases beyond 128 sensors (error bars overlap). The results were consistent for all shallow test dipoles, but inconsistent for deep sources. Figure 12 (right) shows similar results for the spreading measure, as the number of scalp sensors increases the spatial spreading (blurring) of the signal gets smaller and so the inverse solution focality improves. On the other hand, increasing the resolution of the solution space, slightly improves the localization error and spreading. We note here that the purpose of these results is to demonstrate the use of gLFM and the separation of HPC from the source localization analysis. These results are for sLORETA inverse algorithm and without including noise or other forward solver factors.

Neurostimulation
The transfer matrix T used in solving the neurostimulation inverse problem is computed in the reciprocity mode of the generic LFM calculations as an intermediate product. It is outputted as a resulting file concurrently with gLFM for source localization. In Figure 13, we show an example of computing such a transfer matrix for a realistic MRI/CT-based model from the Oregon Normals database [80] and visualize a combination of the stimulating scalp current injection pattern targeting the primary motor area (M1).

CONCLUSION AND FUTURE DIRECTIONS
The challenge to achieve high-temporal and high-spatial resolution in future human brain neuroimaging will require the integration of HPC to address. The dEEG source localization problem is inherently ill-posed and depends critically on several factors from image geometry to modeling assumptions. This results in a large, multidimensional uncertainty space to explore for optimal solutions. Many head modeling experiments will need to be performed to quantify uncertainty effects, each taking significant computational capabilities. Our research suggests that a systematic methodology to analyze the main factors and study the interdependent parameters that affect the accuracy of the EEG source-mapping solutions can be evaluated in a unified way and studied effectively through the application of HPC tools and techniques.
This paper presents four main contributions to the neuroscience domain. First, we identified and classified the main factors influencing accuracy of source localization solutions and which of these factors require further research and investigations. Second, we provided an approach to study the effect of these factors in a unified manner using different inverse approaches or algorithms. Our approach is based on factoring out the common and computationally intensive part from the analysis, which allows for the application of different inverse algorithms and different constraints. Third, we demonstrated that HPC enables the creation of such constructs (gLFM) at high resolution and detail. Finally, we provided two accurate, efficient, and reliable FDM-based forward solvers parallelized using OpenMP in shared memory and CUDA on GPUs.
With the constant improvements in HPC technologies, we believe that our approach will prove to be valuable in next-generation research and clinical practice. The integration of the methodology with normative databases will provide a platform for knowledge creation and a basis for diagnostic assessment of source results with respect to neurologic conditions. Even more inspiring is the potential for neurostimulation in future brain health. Our activities in integrating HPC in advanced neuroimaging science and engineering are important steps toward this promising frontier.