I’m trying to compile a cuda test program on Windows 7 via Command Prompt,
I’m this command:
nvcc test.cu
But all I get is this error:
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
What may be causing this error?
asked Nov 14, 2011 at 17:49
1
You will need to add the folder containing the «cl.exe» file to your path environment variable. For example:
C:\Program Files\Microsoft Visual Studio 10.0\VC\bin
Edit: Ok, go to My Computer -> Properties -> Advanced System Settings -> Environment Variables. Here look for «PATH» in the list, and add the path above (or whatever is the location of your cl.exe).
answered Nov 14, 2011 at 18:26
TudorTudor
61.6k12 gold badges102 silver badges142 bronze badges
9
For new Visual Studio cl.exe
is present in path => C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\bin\Hostx64\x64
x64 is for 64bit
x86 is for 32bit
trevorp
1,1611 gold badge14 silver badges19 bronze badges
answered Oct 30, 2019 at 20:30
2
Solve this problem by adding this options to nvcc
nvcc x.cu ... -ccbin "D:\Program Files\Microsoft Visual Studio 11.0\VC\bin"
for example my compiler is VS2012. and cl.exe is in this dir
answered Jul 29, 2014 at 20:48
Prof. HellProf. Hell
72912 silver badges19 bronze badges
5
cl.exe
is Microsoft’s C/C++ compiler. So the problem is that you don’t have that installed where the command line can find it.
answered Nov 14, 2011 at 17:54
Chris DoddChris Dodd
120k13 gold badges135 silver badges226 bronze badges
nvcc is only a front end for the CUDA specific part of the program. It must invoke a full compiler to finish the job. In this case it cannot find the Visual Studio compiler ‘cl.exe’
Check paths, nvcc documentation etc.
answered Nov 14, 2011 at 17:54
Steve FallowsSteve Fallows
6,2945 gold badges48 silver badges67 bronze badges
Solve this problem by adding the path to environment variables, which can vary slightly depending in the version of visual studio installed in your system, and are you using 32bit or 64bit system
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\Hostx64\x64
answered Dec 24, 2022 at 7:59
Arnab DasArnab Das
1493 silver badges3 bronze badges
1
I see that this is an old question but I recently got this error on my Visual Studio 2012 when I tried to build my CUDA project. Apparently I had changed my CUDA project to the Nov 2012 pack, changing it back to the v110 that it usually is by default fixed this error.
In Visual Studio, left click on the CUDA project, ->properties->Configuration Properties-> General -> Platform toolset, and choose: Visual Studio 2012 (v110).
I could probably get it to work with the Nov 2012 pack, but the CUDA code does not use any of the additional functions of that pack, so it is not necessary. (That pack contains the variadic templates for C++11.)
answered Mar 14, 2014 at 18:08
DonnaDonna
413 bronze badges
As a data scientist or software engineer, you may come across a situation where you need to use CUDA to compile code. CUDA is a parallel computing platform that allows you to use NVIDIA GPUs to accelerate the performance of your code. However, when compiling CUDA code, you may encounter errors that can be frustrating to debug. In this blog post, we will discuss common errors that occur during simple CUDA compilation and how to fix them.
How to Fix Errors in Simple CUDA Compilation
As a data scientist or software engineer, you may come across a situation where you need to use CUDA to compile code. CUDA is a parallel computing platform that allows you to use NVIDIA GPUs to accelerate the performance of your code. However, when compiling CUDA code, you may encounter errors that can be frustrating to debug. In this blog post, we will discuss common errors that occur during simple CUDA compilation and how to fix them.
What is CUDA?
CUDA stands for Compute Unified Device Architecture. It is a parallel computing platform that allows developers to use NVIDIA GPUs to accelerate the performance of their applications. CUDA provides a programming model and a set of tools that allow you to write parallel code that can be executed on the GPU. This can result in significant performance gains compared to running the same code on the CPU.
Simple CUDA Compilation
To get started with CUDA, you need to install the CUDA toolkit. The CUDA toolkit contains everything you need to write, compile and run CUDA applications. Once you have installed the toolkit, you can start writing and compiling CUDA code.
Let’s consider an example of a simple CUDA program that adds two arrays. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#define N 1024
__global__ void add(int *a, int *b, int *c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;
if (tid < N)
{
c[tid] = a[tid] + b[tid];
}
}
int main()
{
int *a, *b, *c;
int *dev_a, *dev_b, *dev_c;
// Allocate memory on the host
a = (int*)malloc(N * sizeof(int));
b = (int*)malloc(N * sizeof(int));
c = (int*)malloc(N * sizeof(int));
// Initialize the arrays
for (int i = 0; i < N; i++)
{
a[i] = i;
b[i] = i;
c[i] = 0;
}
// Allocate memory on the device
cudaMalloc((void**)&dev_a, N * sizeof(int));
cudaMalloc((void**)&dev_b, N * sizeof(int));
cudaMalloc((void**)&dev_c, N * sizeof(int));
// Copy data from host to device
cudaMemcpy(dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, N * sizeof(int), cudaMemcpyHostToDevice);
// Launch the kernel
add<<<N/256, 256>>>(dev_a, dev_b, dev_c);
// Copy the result from device to host
cudaMemcpy(c, dev_c, N * sizeof(int), cudaMemcpyDeviceToHost);
// Print the result
for (int i = 0; i < N; i++)
{
printf("%d\n", c[i]);
}
// Free memory
free(a);
free(b);
free(c);
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
return 0;
}
This program adds two arrays a
and b
and stores the result in array c
. The add
kernel is launched with N/256
blocks, each with 256 threads. The kernel adds the corresponding elements of a
and b
and stores the result in c
.
Common Errors in Simple CUDA Compilation
When compiling a CUDA program, you may encounter errors that can be difficult to debug. Here are some common errors that you may encounter when compiling the simple CUDA program we just discussed:
Error 1: Undefined Reference to cudaMalloc
/tmp/tmpxft_00000b2c_00000000-1_simple.cudafe1.stub.o: In function `main':
simple.cu:(.text+0x3e): undefined reference to `cudaMalloc'
This error occurs when the linker cannot find the definition of the cudaMalloc
function. To fix this error, you need to link your program with the CUDA runtime library using the -lcudart
flag. Here is the updated compilation command:
nvcc simple.cu -o simple -lcudart
Error 2: Invalid Device Function
simple.cu(6): error: identifier "cuda" is undefined
simple.cu(8): error: expected a ")"
simple.cu(10): error: identifier "int" is undefined
...
This error occurs when the CUDA headers are not included properly. To fix this error, you need to include the CUDA headers using the #include <cuda.h>
directive.
Error 3: Kernel Launch Failure
[1] 7462 segmentation fault (core dumped) ./simple
This error occurs when the kernel launch fails. In our example program, the kernel is launched with N/256
blocks, each with 256 threads. If the value of N
is not a multiple of 256, then the kernel launch will fail. To fix this error, you need to make sure that the number of threads is a multiple of the block size.
Conclusion
In this blog post, we discussed common errors that occur during simple CUDA compilation and how to fix them. We covered errors such as undefined reference to cudaMalloc
, invalid device function, and kernel launch failure. By following the solutions provided, you should be able to compile your CUDA code without encountering these errors.
CUDA is a powerful platform that can significantly accelerate the performance of your code. It is important to understand how to compile CUDA code and how to debug common errors that may occur during compilation. We hope this blog post has been helpful in providing you with the knowledge you need to get started with CUDA programming.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.
I write my first CUDA code as follows:
#include<iostream>
__global__ void kernel ()
{
}
int main()
{
kernel<<<1, 1>>> ();
std::cout<<"hello world"<<std:endl;
system("pause");
return 0;
}
And I set up Visual Studio 2008 following the instructions on these two pages:
- Easiest Way to Run CUDA on Visual Studio 2008
- How do I start a new CUDA project in Visual Studio 2008?
But after I compile it, it produces an error. I do not know what the problem is, or where I have gone wrong. Here is what the build output window contains when running on a 32-bit Windows 7 system:
1>------ Build started: Project: CUDA, Configuration: Debug Win32 ------
1>Compiling with CUDA Build Rule...
1>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --machine 32 -ccbin "d:\Program Files\Microsoft Visual Studio 9.0\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\\include" -maxrregcount=32 --compile -o "Debug/main.cu.obj" main.cu
1>main.cu
1>Catastrophic error: cannot open source file "C:/Users/露隆/AppData/Local/Temp/tmpxft_000011e4_00000000-8_main.compute_10.cpp1.ii"
1>1 catastrophic error detected in the compilation of "C:/Users/露隆/AppData/Local/Temp/tmpxft_000011e4_00000000-8_main.compute_10.cpp1.ii".
1>Compilation terminated.
1>Project : error PRJ0019: A tool returned an error code from "Compiling with CUDA Build Rule..."
1>Build log was saved at "file://c:\Users\丁\AppData\Local\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\src\CUDA\Debug\BuildLog.htm"
1>CUDA - 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Could you please help me to resolve this problem? I have run some examples in the SDK src
directory, and I can compile and run the example deveicQuery
sucessfully, but when I try to compile BandWithTest
, I get the same error.
Я хотел начать программирование CUDA с C ++ и установил инструментарий v9.0 с официального сайта Nvidia. Я запустил deviceQuery.cpp на VS 2017, и все работало отлично:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 960M"CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
Затем я попытался запустить bandwidthTest.cu, и у меня есть некоторые ошибки компиляции:
Severity Code Description File Line Category Suppression State
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 504
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 505
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 506
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 538
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 1043
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 1558
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 2371
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\type_traits 2371
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xutility 543
Error class "std::enable_if<<error-constant>, int>" has no member "type" C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xtr1common 58
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xutility 3135
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xutility 3662
Error class "std::enable_if<<error-constant>, void>" has no member "type" C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xtr1common 58
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xmemory0 390
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xmemory0 1002
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xmemory0 1322
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xstring 1718
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xutility 298
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\vector 495
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\algorithm 278
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\memory 1540
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\memory 1547
Error constant value is not known C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\memory 2482
Error expression must have a constant value C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\memory 2582
Error more than one instance of overloaded function "std::_Deallocate_plain" matches the argument list: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xstring 1780
Error more than one instance of overloaded function "std::_Deallocate_plain" matches the argument list: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xstring 1780
Error more than one instance of overloaded function "std::_Deallocate_plain" matches the argument list: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xstring 1780
Error more than one instance of overloaded function "std::_Deallocate_plain" matches the argument list: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\include\xstring 1780
Error MSB3721 The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_37,code=\"sm_37,compute_37\" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" --use-local-env --cl-version 2017 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.12.25827\bin\HostX86\x64" -x cu -I./ -I../../common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\/include" -I../../common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -Xcompiler "/wd 4819" -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/bandwidthTest.cu.obj "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.0\1_Utilities\bandwidthTest\bandwidthTest.cu"" exited with code 1. C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations\CUDA 9.0.targets 707
Я просто компилирую существующий bandwidth.cu в 1_Utilities после установки инструментария и использую VS 2017. Я долго искал решение в Интернете, но, похоже, ничего не могу найти. Любая помощь приветствуется.
РЕДАКТИРОВАТЬ: я установил бок о бок набор инструментов MSVC: https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017/
Сейчас большинство моих ошибок ушло, но последняя все еще остается (длинная). Кажется, это ошибка другого рода.
РЕДАКТИРОВАТЬ 2: Кажется, что линия -Bv в командной строке создавал ошибку. Я удалил его, и теперь все мои проекты успешно компилируются. Это похоже на ошибку в новой версии VC ++ и, вероятно, будет исправлено в ближайшее время.
5
Решение
Подводя итог, что я сделал:
В установщике Visual Studio 2017 выберите изменять затем в верхней части нажмите Отдельные компоненты, прокрутите вниз до Компиляторы, инструменты сборки и среды выполнения и проверить Набор инструментов VC ++ 2017 версия 15.4 v14.11 (это собирается проверить Распространяемое обновление Visual C ++ 2017 а также) и затем вы измените его. После этого вы идете в определенное место. Для меня это было здесь:
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\14.11
и скопируйте файл с .реквизит -заканчивая вашим решением. В VS вы щелкните правой кнопкой мыши свой проект, выберите Разгрузить проект и снова щелкните правой кнопкой мыши, чтобы выбрать редактировать -название проекта-. Здесь вы ищете строку, которая говорит
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
и выше этой строки вы добавляете
<Import Project="$(SolutionDir)\Microsoft.VCToolsVersion.14.11.props" />
Вы сохраняете файл и перезагружаете проект, и у вас все получится.
10
Другие решения
Мне удалось написать скрипт в PowerShell, который реализует решение от Wido Seidel. Сделайте резервную копию папки перед запуском скрипта и настройте свои пути.
#path where CUDA samples are located
$files = Get-ChildItem "c:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\" -Recurse -Include "*_vs2017.vcxproj"
#string to find
$stringToFind= '<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />'
#strings that will be added actually it's string with two lines
$stringToAdd = ' <Import Project="C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\14.11\Microsoft.VCToolsVersion.14.11.props" />' + "`r`n" +
' <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />'for($i=0;$i -le $files.count;$i++){
Write-Host $files[$i].FullName
#the path of file that is currently being worked on
$filePath = $files[$i].FullName
#content of the file
$fileToChange = Get-Content $filePath
for($j = 0;$j -le $fileToChange.Count;$j++){
if($fileToChange[$j].Contains($stringToFind)){
#find the string from above and break the for loop
#since we only need to find 1st line and add one line above it
$fileToChange[$j] = $stringToAdd
break
}
}
#writing changes to a file
$fileToChange | Set-Content $filePath
}
Дайте мне знать, если это работает.
1
#c #cuda #cpp
Доброго времени суток! Прошу помочь. Компиляция программы не проходит из-за ошибок типа "identifier is undefined in device code". Пояснение по программе: есть реализация AES от Брайена Гладмана (Brian Gladman, Worcester, UK), которую я хочу использовать в своей cuda-программе. Ошибка, мешающая компиляции, возникает при использовании макроса в коде Брайена. Например, в строчке: ke8(cx->ks, 0); ke8(cx->ks, 1); ke8 - это макрос, его код: #define ke8(k,i) \ { kef8(k,i); \ k[8*(i)+12] = ss[4] ^= ls_box(ss[3],0); \ k[8*(i)+13] = ss[5] ^= ss[4]; \ k[8*(i)+14] = ss[6] ^= ss[5]; \ k[8*(i)+15] = ss[7] ^= ss[6]; \ } Насколько я понял, такая ошибка связана с макросами, и во время компиляции он не определяется cuda-устройством. Хотя, буквально 2-мя строчками выше "ke8(cx->ks, i);" работает без нареканий, а ведь там тоже работает тот же самый макрос. Гугль-поиск решения этой проблемы не дал. Из-за чего может возникать ошибка "identifier is undefined in device code", и как её решить?
Ответы
Ответ 1
Ошибка identifier is undefined in device code возникает в том случае, если идентификатор не определен. Поскольку, глядя на макрос, трудно сказать, во что он раскрывается, рекомендуется посмотреть на файл, полученный из исходника препроцессором. Для этого в C/C++/Preprocessor, выставить Generate Preprocessed File в YES. После компиляции посмотреть на код, оставшийся от макроса, и обратить внимание на те переменные, которые трактуются как необъявленные. Если их реально нет, значит, что-то не подключено (библиотека или заголовок).
Ответ 2
Вам нужно добавить директиву препроцессора GPU, в противном случае фиксированные поисковые таблици не определены в памяти устройства, а в памяти хоста и CUDA не может их найти.