;+ ; NAME: ; gpu_sphericalfield ; ; PURPOSE: ; Calculates the electric field in a light scattering ; pattern defined by a set of Lorenz-Mie scattering ; coefficients. Uses gpulib for hardware acceleration. ; ; CATEGORY: ; Light scattering, holographic microscopy ; ; CALLING SEQUENCE: ; field = gpu_sphericalfield(x, y, z, a, lambda, $ ; mpp = mpp, k = k) ; ; INPUTS: ; x: [npts] array of pixel coordinates [pixels] ; y: [npts] array of pixel coordinates [pixels] ; z: If field is required in a single plane, then ; z is the plane's distance from the sphere's center ; [pixels]. ; Otherwise, z is an [npts] array of coordinates. ; ; a: [2,nc] array of a and b scattering coefficients, where ; nc is the number of terms required for convergence. ; ; lambda: wavelength of light in medium [pixels] ; ; KEYWORD FLAGS: ; cartesian: If set, return field components in Cartesian ; coordinates. Default: Spherical polar coordinates ; ; OUTPUT KEYWORDS: ; k: scaled wavenumber of light in medium [inverse pixels] ; ; OUTPUTS: ; field: [3,npts] complex values of field at the positions r. ; [0,*]: r component ; [1,*]: theta component ; [2,*]: phi component ; ; If CARTESIAN is set: ; [0,*]: x component (incident polarization) ; [1,*]: y component (transverse component) ; [2,*]: z component (axial component, relative to ; incident beam). ; ; REFERENCES: ; 1. Adapted from Chapter 4 in ; C. F. Bohren and D. R. Huffman, ; Absorption and Scattering of Light by Small Particles, ; (New York, Wiley, 1983). ; ; 2. W. J. Wiscombe, "Improved Mie scattering algorithms," ; Appl. Opt. 19, 1505-1509 (1980). ; ; 3. W. J. Lentz, "Generating Bessel function in Mie scattering ; calculations using continued fractions," Appl. Opt. 15, ; 668-671 (1976). ; ; 4. S. H. Lee, Y. Roichman, G. R. Yi, S. H. Kim, S. M. Yang, ; A. van Blaaderen, P. van Oostrum and D. G. Grier, ; "Characterizing and tracking single colloidal particles with ; video holographic microscopy," Opt. Express 15, 18275-18282 ; (2007). ; ; 5. GPULIB: http://www.txcorp.com/products/GPULib/ ; ; 6. F. C. Cheong, B. Sun, R. Dreyfus, J. Amato-Grill, K. Xiao, ; L. Dixon and D. G. Grier, ; "Flow visualization and flow cytometry with holographic video ; microscopy," Opt. Express 17, 13071-13079 (2009). ; ; MODIFICATION HISTORY: ; Written by David G. Grier, New York University, 5/2007 ; 6/9/2007: DGG finally read Section 4.8 in Bohren and Huffman about ; numerical stability of the recursions used to compute the scattering ; coefficients. Feh. Result is a total rewrite. ; 6/20/2007: DGG Calculate \tau_n(\cos\theta) and \pi_n(\cos\theta) ; according to recurrence relations in ; W. J. Wiscombe, Appl. Opt. 19, 1505-1509 (1980). ; This is supposed to improve numerical accuracy. ; 2/8/2008: DGG. Replaced single [3,npts] array of input coordinates ; with two [npts] arrays for x and y, and a separate input for z. ; Eliminated double() call for coordinates. Z may have 1 element or ; npts elements. Small documentation fixes. ; 4/3/2008: Bo Sun (Sephiroth), NYU: Calculate Lorenz-Mie a and b ; coefficients using continued fractions rather than recursion. ; Osman Akcakir from Arryx pointed out that the results are ; more accurate in extreme cases. Method described in ; William J. Lentz, "Generating Bessel functions in Mie scattering ; calculations using continued fractions," Appl. Opt. 15, 668-671 ; (1976). ; 4/4/2008: DGG small code clean-ups and documentation. Added ; RECURSIVE keyword for backward compatibility in computing a and b ; coefficients. ; 4/11/2008: Sephiroth: Corrected small error in jump code for ; repeated fractions in Mie coefficients. ; 6/25/2008: DGG Don't clobber x input coordinates. ; 9/26/2008: Rewritten for use with CUDA via GPULIB by DGG and ; Jesse Amato-Grill. ; 10/9/2008: Bo Sun (Sephiroth) Rewritten to reduce all complex ; operation into real ones ; 10/11/2008: Bo Sun (Sephiroth) Debug and fixed VRAM leaking. ; 10/13/2008: DGG adapted from GPU_SPHEREFIELD to conform to updated ; DHM code layout. Eliminating temporary GPU variables increases ; speed by nearly factor of 2 :). ; Eliminated RECURSIVE keyword because scattering coefficients are ; now calculated elsewhere. ; 02/05/2009: DGG convert to low-level gpulib API calls for improved speed. ; Explicitly cast constants to single-precision. ; 02/07/2009: DGG eliminate extraneous calls to ; cudaThreadSynchronize(), which appear to be very costly. ; Report CUDA errors, which are reported by cudaThreadSynchronize(). ; 08/23/2009: DGG updated references. ; ; LICENSE ; ; This program is free software; you can redistribute it and/or ; modify it under the terms of the GNU General Public License as ; published by the Free Software Foundation; either version 2 of the ; License, or (at your option) any later version. ; ; This program is distributed in the hope that it will be useful, ; but WITHOUT ANY WARRANTY; without even the implied warranty of ; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ; General Public License for more details. ; ; You should have received a copy of the GNU General Public License ; along with this program; if not, write to the Free Software ; Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA ; 02111-1307 USA ; ; If the Internet and WWW are still functional when you are using ; this, you shold be able to access the GPL here: ; http://www.gnu.org/copyleft/gpl.html ;- function gpu_sphericalfield, x_, y_, z_, ab_, lambda, $ cartesian = cartesian ; project to cartesian coordinates ab = complex(ab_) ; Lorenz-Mie scattering coefficients nc = n_elements(ab[0,*])-1 ; number of terms required for convergence ci = complex(0,1) err = 0 ; gpulib error code ; convert to spherical coordinates centered on the sphere. ; (r, theta, phi) is the spherical coordinate of the pixel ; at (x,y) in the imaging plane at distance z from the ; center of the sphere. ; transfer coordinates to GPU x = gpuPutArr(float(x_), ERROR=e) & err OR= e y = gpuPutArr(float(y_), ERROR=e) & err OR= e npts = x.n_elements if n_elements(z_) eq 1 then begin z = gpuMake_Array(npts, VALUE=float(z_), /FLOAT, ERROR=e) & err OR= e endif else $ z = gpuPutArr(float(z_), ERROR=e) & err OR= e if y.n_elements ne npts or z.n_elements ne npts then $ message, "dimensions of x, y and z must agree." ; rho = sqrt(x^2 + y^2) rho = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e kr = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e e = gpuMultF(npts, x.handle, x.handle, rho.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, y.handle, y.handle, kr.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, rho.handle, kr.handle, rho.handle) err OR= cudaThreadSynchronize() ; r = sqrt(rho^2 + z^2) e = gpuMultF(npts, z.handle, z.handle, kr.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, kr.handle, rho.handle, kr.handle) err OR= cudaThreadSynchronize() e = gpuSqrtF(npts, rho.handle, rho.handle) ;e = cudaThreadSynchronize() ; extra e = gpuSqrtF(npts, kr.handle, kr.handle) err OR= cudaThreadSynchronize() ; polar angle sintheta = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e costheta = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e e = gpuDivF(npts, rho.handle, kr.handle, sintheta.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuDivF(npts, z.handle, kr.handle, costheta.handle) ;err OR= cudaThreadSynchronize() ; extra ; azimuthal angle cosphi = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e sinphi = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e e = gpuDivF(npts, x.handle, rho.handle, cosphi.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuDivF(npts, y.handle, rho.handle, sinphi.handle) ;err OR= cudaThreadSynchronize() ; extra ; scale distances by wavenumber e = gpuAddFAT(npts, 2.*!pi/float(lambda), kr.handle, 0., kr.handle, 0., kr.handle) err OR= cudaThreadSynchronize() ; starting points for recursive function evaluation ... ; ... Riccati-Bessel radial functions, page 478 ; sinkr = sin(kr) ; coskr = cos(kr) ; xi_nm2 = dcomplex(coskr, sinkr) ; \xi_{-1}(kr) ; xi_nm1 = dcomplex(sinkr,-coskr) ; \xi_0(kr) xi_nm2_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e xi_nm2_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e xi_nm1_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e xi_nm1_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e xi_n_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e xi_n_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e e = gpuCosF(npts, kr.handle, xi_nm2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSinF(npts, kr.handle, xi_nm2_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSinF(npts, kr.handle, xi_nm1_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuCosFAT(npts, -1., 1., kr.handle, 0., 0., xi_nm1_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; ... angular functions (4.47), page 95 ;pi_nm1 = 0.d ; \pi_0(\cos\theta) ;pi_n = 1.d ; \pi_1(\cos\theta) pi_nm1 = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e pi_n = gpuMake_Array(npts, VALUE=1., /FLOAT, ERROR=e) & err OR= e ; storage for vector spherical harmonics: [r,theta,phi] ;Mo1n = dcomplexarr(3,npts) ;Ne1n = dcomplexarr(3,npts) Mo1n2_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Mo1n2_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Mo1n3_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Mo1n3_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n1_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n1_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n2_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n2_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n3_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e Ne1n3_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e ; storage for Dermidjian's derivative dn_r = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e dn_i = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e ; Storage for accumulating scattered field, Es ;Es = dcomplexarr(3,npts) Es1_r = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e ; x/r component Es1_i = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e Es2_r = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e ; y/theta component Es2_i = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e Es3_r = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e ; z/phi component Es3_i = gpuMake_Array(npts, /FLOAT, ERROR=e) & err OR= e ; variables swisc = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e twisc = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e tau_n = gpuMake_Array(npts, /NOZERO, /FLOAT, ERROR=e) & err OR= e ; Compute field by summing multipole contributions for n = 1., nc do begin ; upward recurrences ... ; ... Legendre factor (4.47) ; Method described by Wiscombe (1980) ; swisc = pi_n * costheta ; twisc = swisc - pi_nm1 ; tau_n = n * twisc - pi_nm1 ; \tau_n(\cos\theta) e = gpuMultF(npts, pi_n.handle, costheta.handle, swisc.handle) err OR= cudaThreadSynchronize() e = gpuSubF(npts, swisc.handle, pi_nm1.handle, twisc.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSubFAT(npts, 1., pi_nm1.handle, n, twisc.handle, 0., tau_n.handle) ;err OR= cudaThreadSynchronize() ; extra ; NOTE: tau_n appears as -tau_n: define accordingly. ; ... Riccati-Bessel function, page 478 ; xi_n = (2.d*n - 1.d) * xi_nm1 / kr - xi_nm2 ; \xi_n(kr) e = gpuDivF(npts, xi_nm1_r.handle, kr.handle, xi_n_r.handle) err OR= cudaThreadSynchronize() e = gpuAddFAT(npts, 2.*n-1., xi_n_r.handle, -1., xi_nm2_r.handle, 0., xi_n_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuDivF(npts, xi_nm1_i.handle, kr.handle, xi_n_i.handle) err OR= cudaThreadSynchronize() e = gpuAddFAT(npts, 2.*n-1., xi_n_i.handle, -1., xi_nm2_i.handle, 0., xi_n_i.handle) err OR= cudaThreadSynchronize() ; vector spherical harmonics (4.50) ; Mo1n[0,*] = 0.d ; no radial component (as initialized) ; Mo1n[1,*] = pi_n * xi_n ; ... divided by cosphi/kr e = gpuMultF(npts, pi_n.handle, xi_n_r.handle, Mo1n2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, pi_n.handle, xi_n_i.handle, Mo1n2_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; Mo1n[2,]* = -tau_n * xi_n ; ... divided by sinphi/kr e = gpuMultF(npts, tau_n.handle, xi_n_r.handle, Mo1n3_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, tau_n.handle, xi_n_i.handle, Mo1n3_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; dn = (n * xi_n)/kr - xi_nm1 e = gpuDivF(npts, xi_n_r.handle, kr.handle, dn_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuDivF(npts, xi_n_i.handle, kr.handle, dn_i.handle) err OR= cudaThreadSynchronize() e = gpuSubFAT(npts, n, dn_r.handle, 1., xi_nm1_r.handle, 0., dn_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSubFAT(npts, n, dn_i.handle, 1., xi_nm1_i.handle, 0., dn_i.handle) err OR= cudaThreadSynchronize() ; Ne1n[0,*] = n*(n + 1.d) * pi_n * xi_n ; ... divided by cosphi sintheta/kr^2 e = gpuMultFAT(npts, n*(n+1.), pi_n.handle, 1., xi_n_r.handle, 0, Ne1n1_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultFAT(npts, n*(n+1.), pi_n.handle, 1., xi_n_i.handle, 0, Ne1n1_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; Ne1n[1,*] = -tau_n * dn ; ... divided by cosphi/kr e = gpuMultF(npts, tau_n.handle, dn_r.handle, Ne1n2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, tau_n.handle, dn_i.handle, Ne1n2_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; Ne1n[2,*] = pi_n * dn ; ... divided by sinphi/kr e = gpuMultF(npts, pi_n.handle, dn_r.handle, Ne1n3_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, pi_n.handle, dn_i.handle, Ne1n3_i.handle) err OR= cudaThreadSynchronize() ; prefactor, page 93 En = ci^n * (2.*n + 1.)/ n / (n + 1.) an = En * ci * ab[0,n] bn = En * ab[1,n] an_r = real_part(an) an_i = imaginary(an) bn_r = real_part(bn) bn_i = imaginary(bn) ; the scattered field in spherical coordinates (4.45) ; Es += En * (ci * ab[0,n] * Ne1n - ab[1,n] * Mo1n) ;Es1 ; Note that Mo1n1 = 0 e = gpuAddFAT(npts, an_r, Ne1n1_r.handle, -an_i, Ne1n1_i.handle, 0., x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, an_i, Ne1n1_r.handle, an_r, Ne1n1_i.handle, 0., y.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Es1_r.handle, x.handle, Es1_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Es1_i.handle, y.handle, Es1_i.handle) ;err OR= cudaThreadSynchronize() ; extra ;Es2 e = gpuAddFAT(npts, an_r, Ne1n2_r.handle, -an_i, Ne1n2_i.handle, 0., x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, an_i, Ne1n2_r.handle, an_r, Ne1n2_i.handle, 0., y.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Es2_r.handle, x.handle, Es2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Es2_i.handle, y.handle, Es2_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, -bn_r, Mo1n2_r.handle, bn_i, Mo1n2_i.handle, 0., x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, -bn_i, Mo1n2_r.handle, -bn_r, Mo1n2_i.handle, 0., y.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Es2_r.handle, x.handle, Es2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Es2_i.handle, y.handle, Es2_i.handle) ;err OR= cudaThreadSynchronize() ; extra ;Es3 e = gpuAddFAT(npts, an_r, Ne1n3_r.handle, -an_i, Ne1n3_i.handle, 0., x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, an_i, Ne1n3_r.handle, an_r, Ne1n3_i.handle, 0., y.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Es3_r.handle, x.handle, Es3_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Es3_i.handle, y.handle, Es3_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, -bn_r, Mo1n3_r.handle, bn_i, Mo1n3_i.handle, 0., x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddFAT(npts, -bn_i, Mo1n3_r.handle, -bn_r, Mo1n3_i.handle, 0., y.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Es3_r.handle, x.handle, Es3_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Es3_i.handle, y.handle, Es3_i.handle) ;err OR= cudaThreadSynchronize() ; extra ; upward recurrences ... ; ... angular functions (4.47) ; Method described by Wiscombe (1980) ; pi_nm1 = pi_n ; pi_n = swisc + (n + 1.d) * twisc / n temp = pi_nm1 pi_nm1 = pi_n pi_n = temp e = gpuAddFAT(npts, 1., swisc.handle, (n+1.)/n, twisc.handle, 0., pi_n.handle) err OR= cudaThreadSynchronize() ; ... Riccati-Bessel function ; xi_nm2 = xi_nm1 ; xi_nm1 = xi_n temp = xi_nm2_r xi_nm2_r = xi_nm1_r xi_nm1_r = xi_n_r xi_n_r = temp temp = xi_nm2_i xi_nm2_i = xi_nm1_i xi_nm1_i = xi_n_i xi_n_i = temp endfor ; geometric factors were divided out of the vector ; spherical harmonics for accuracy and efficiency ... ; ... put them back at the end. ;Es[0,*] *= cosphi * sintheta / kr^2 ;Es[1,*] *= cosphi / kr ;Es[2,*] *= sinphi / kr e = gpuDivF(npts, cosphi.handle, kr.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es1_r.handle, x.handle, Es1_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es1_i.handle, x.handle, Es1_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_r.handle, x.handle, Es2_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_i.handle, x.handle, Es2_i.handle) err OR= cudaThreadSynchronize() e = gpuDivF(npts, sinphi.handle, kr.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es1_r.handle, x.handle, Es1_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es1_i.handle, x.handle, Es1_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es3_r.handle, x.handle, Es3_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es3_i.handle, x.handle, Es3_i.handle) err OR= cudaThreadSynchronize() ; By default, the scattered wave is returned in spherical ; coordinates. Project components onto Cartesian coordinates. ; Assumes that the incident wave propagates along z and ; is linearly polarized along x if keyword_set(cartesian) then begin ; Ec = Es Ec_r = kr ; temporary storage Ec_i = rho ; rename for clarity ; Ec[0,*] = Es[0,*] * sintheta * cosphi ; Ec[0,*] += Es[1,*] * costheta * cosphi ; Ec[0,*] -= Es[2,*] * sinphi e = gpuMultF(npts, cosphi.handle, sintheta.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es1_r.handle, x.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es1_i.handle, x.handle, Ec_i.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, cosphi.handle, costheta.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es2_r.handle, x.handle, y.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_i.handle, x.handle, z.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Ec_r.handle, y.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Ec_i.handle, z.handle, Ec_i.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es3_r.handle, sinphi.handle, y.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es3_i.handle, sinphi.handle, z.handle) err OR= cudaThreadSynchronize() e = gpuSubF(npts, Ec_r.handle, y.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSubF(npts, Ec_i.handle, z.handle, Ec_i.handle) err OR= cudaThreadSynchronize() Es1_cpu_r = gpuGetArr(Ec_r) Es1_cpu_i = gpuGetArr(Ec_i) ; Ec[1,*] = Es[0,*] * sintheta * sinphi ; Ec[1,*] += Es[1,*] * costheta * sinphi ; Ec[1,*] += Es[2,*] * cosphi e = gpuMultF(npts, sinphi.handle, sintheta.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es1_r.handle, x.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es1_i.handle, x.handle, Ec_i.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, sinphi.handle, costheta.handle, x.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es2_r.handle, x.handle, y.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_i.handle, x.handle, z.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Ec_r.handle, y.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Ec_i.handle, z.handle, Ec_i.handle) err OR= cudaThreadSynchronize() e = gpuMultF(npts, Es3_r.handle, cosphi.handle, y.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es3_i.handle, cosphi.handle, z.handle) err OR= cudaThreadSynchronize() e = gpuAddF(npts, Ec_r.handle, y.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuAddF(npts, Ec_i.handle, z.handle, Ec_i.handle) err OR= cudaThreadSynchronize() Es2_cpu_r = gpuGetArr(Ec_r) Es2_cpu_i = gpuGetArr(Ec_i) ; Ec[2,*] = Es[0,*] * costheta - Es[1,*] * sintheta e = gpuMultF(npts, Es1_r.handle, costheta.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es1_i.handle, costheta.handle, Ec_i.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_r.handle, sintheta.handle, x.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuMultF(npts, Es2_i.handle, sintheta.handle, y.handle) err OR= cudaThreadSynchronize() e = gpuSubF(npts, Ec_r.handle, x.handle, Ec_r.handle) ;err OR= cudaThreadSynchronize() ; extra e = gpuSubF(npts, Ec_i.handle, y.handle, Ec_i.handle) err OR= cudaThreadSynchronize() Es3_cpu_r = gpuGetArr(Ec_r, ERROR=e) & err OR= e Es3_cpu_i = gpuGetArr(Ec_i, ERROR=e) & err OR= e endif else begin Es1_cpu_r = gpuGetArr(Es1_r, ERROR=e) & err OR= e Es1_cpu_i = gpuGetArr(Es1_i, ERROR=e) & err OR= e Es2_cpu_r = gpuGetArr(Es2_r, ERROR=e) & err OR= e Es2_cpu_i = gpuGetArr(Es2_i, ERROR=e) & err OR= e Es3_cpu_r = gpuGetArr(Es3_r, ERROR=e) & err OR= e Es3_cpu_i = gpuGetArr(Es3_i, ERROR=e) & err OR= e endelse gpuFree, [x, y, z], ERROR=e & err OR= e gpuFree, [rho, kr, costheta, sintheta, cosphi, sinphi], ERROR=e & err OR= e gpuFree, [swisc, twisc, tau_n], ERROR=e & err OR= e gpuFree, [xi_n_r, xi_n_i, xi_nm1_r, xi_nm1_i], ERROR=e & err OR= e gpuFree, [xi_nm2_r, xi_nm2_i], ERROR=e & err OR= e gpuFree, [pi_n, pi_nm1], ERROR=e & err OR= e gpuFree, [Mo1n2_r, Mo1n2_i, Mo1n3_r, Mo1n3_i], ERROR=e & err OR= e gpuFree, [dn_r, dn_i], ERROR=e & err OR= e gpuFree, [Ne1n1_r, Ne1n1_i, Ne1n2_r, Ne1n2_i], ERROR=e & err OR= e gpuFree, [Ne1n3_r, Ne1n3_i], ERROR=e & err OR= e gpuFree, [Es1_r, Es1_i, Es2_r, Es2_i, Es3_r, Es3_i], ERROR=e & err OR= e if err ne 0 then $ message, "gpulib error:" + strtrim(err), /inf Es1 = complex(Es1_cpu_r, Es1_cpu_i) Es2 = complex(Es2_cpu_r, Es2_cpu_i) Es3 = complex(Es3_cpu_r, Es3_cpu_i) Es = [[Es1],[Es2],[Es3]] return, transpose(Es) end