Ошибка code 127

On a Linux system I am trying to call a program at runtime with the system() call.
The system call exits with an return code not equal zero.

Calling WEXITSTATUS on the error code gives «127».

According to the man page of system this code indicates that /bin/sh could not be called:

In case /bin/sh could not be executed,
the exit status will be that of a command that does exit(127).

I checked: /bin/sh is a link to bash. bash is there. I can execute it from the shell.

Now, how can I find out why /bin/sh could not be called ?
Any kernel history or something?

Edit:

After the very helpful tip (see below) i strace -f -p <PID> the process. This is what I get during the system call:

Process 16080 detached
[pid 11779] <... select resumed> )      = ? ERESTARTNOHAND (To be restarted)
[pid 11774] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], 0, NULL) = 16080
[pid 11779] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 11779] rt_sigaction(SIGCHLD, {0x2ae0ff898ae2, [CHLD], SA_RESTORER|SA_RESTART, 0x32dd2302d0},  <unfinished ...>
[pid 11774] rt_sigaction(SIGINT, {0x2ae1042070f0, [], SA_RESTORER|SA_SIGINFO, 0x32dd2302d0},  <unfinished ...>
[pid 11779] <... rt_sigaction resumed> {0x2ae0ff898ae2, [CHLD], SA_RESTORER|SA_RESTART, 0x32dd2302d0}, 8) = 0
[pid 11779] sendto(5, "a", 1, 0, NULL, 0 <unfinished ...>
[pid 11774] <... rt_sigaction resumed> NULL, 8) = 0
[pid 11779] <... sendto resumed> )      = 1
[pid 11779] rt_sigreturn(0x2 <unfinished ...>
[pid 11774] rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x32dd2302d0},  <unfinished ...>
[pid 11779] <... rt_sigreturn resumed> ) = -1 EINTR (Interrupted system call)
[pid 11779] select(16, [9 15], [], NULL, NULL <unfinished ...>
[pid 11774] <... rt_sigaction resumed> NULL, 8) = 0
[pid 11774] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 11774] write(1, "Problems calling nvcc jitter: ex"..., 49) = 49
[pid 11774] rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x32dd2302d0}, {0x2ae1042070f0, [], SA_RESTORER|SA_SIGINFO, 0x32dd2302d0}, 8) = 0
[pid 11774] rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x32dd2302d0}, {SIG_DFL, [], SA_RESTORER, 0x32dd2302d0}, 8) = 0
[pid 11774] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 11774] clone(Process 16081 attached (waiting for parent)
Process 16081 resumed (parent 11774 ready)
child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff0177ab68) = 16081
[pid 16081] rt_sigaction(SIGINT, {0x2ae1042070f0, [], SA_RESTORER|SA_SIGINFO, 0x32dd2302d0},  <unfinished ...>
[pid 11774] wait4(16081, Process 11774 suspended
 <unfinished ...>
[pid 16081] <... rt_sigaction resumed> NULL, 8) = 0
[pid 16081] rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x32dd2302d0}, NULL, 8) = 0
[pid 16081] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 16081] execve("/bin/sh", ["sh", "-c", 0xdda1d98], [/* 58 vars */]) = -1 EFAULT (Bad address)
[pid 16081] exit_group(127)             = ?
Process 11774 resumed

When it comes to the call to /bin/sh it says bad address. Why that ?

Edit:

Here the whole part that involves the failing system (here already the safe copy to a buffer is in place):

  std::ostringstream jit_command;

  jit_command << string(CUDA_DIR) << "/bin/nvcc -v --ptxas-options=-v ";
  jit_command << "-arch=" << string(GPUARCH);
  jit_command << " -m64 --compiler-options -fPIC,-shared -link ";
  jit_command << fname_src << " -I$LIB_PATH/include -o " << fname_dest;

  string gen = jit_command.str();
  cout << gen << endl;

  char* cmd = new(nothrow) char[gen.size()+1];
  if (!cmd) ___error_exit("no memory for jitter command");
  strcpy(cmd,gen.c_str());

  int ret;

  if (ret=system(cmd)) {

    cout << "Problems calling nvcc jitter: ";

    if (WIFEXITED(ret)) {
      printf("exited, status=%d\n", WEXITSTATUS(ret));
    } else if (WIFSIGNALED(ret)) {
      printf("killed by signal %d\n", WTERMSIG(ret));
    } else if (WIFSTOPPED(ret)) {
      printf("stopped by signal %d\n", WSTOPSIG(ret));
    } else if (WIFCONTINUED(ret)) {
      printf("continued\n");
    } else {
      printf("not recognized\n");
    }

    cout << "Checking shell.. ";
    if(system(NULL))
      cout << "ok!\n";
    else
      cout << "nope!\n";

    __error_exit("Nvcc error\n");

  }
  delete[] cmd;
  return true;

Output:

/usr/local/cuda/bin/nvcc -v --ptxas-options=-v -arch=sm_20 -m64 --compiler-options -fPIC,-shared -link bench_cudp_Oku2fm.cu -I$LIB_PATH/include -o bench_cudp_Oku2fm.o
Problems calling nvcc jitter: exited, status=127
Checking shell.. ok!

Edit (first version of the code):

string gen = jit_command.str();
cout << gen << endl;
int ret;
if (ret=system(gen.c_str())) {
  ....

The complexity of the string creation is not the problem here. As strace shows a «bad address» is the problem. Its a legal string. A «bad address» should not occur.

As far as i know the std::string::c_str() returns a const char * that might point to a scratch space of libc++ where a read only copy of the string might be kept.

Unfortunately the error is not really reproduceable. The call to system succeeds several times before it fails.

I don’t want to be hasty but it smells like a bug in either in the kernel, libc or the hardware.

Edit:

I produced a more verbose strace output (strace -f -v -s 2048 -e trace=process -p $!) of the failing execve system call:

First a succeeding call:

[pid  2506] execve("/bin/sh", ["sh", "-c", "/usr/local/cuda/bin/nvcc -v --ptxas-options=-v -arch=sm_20 -m64 --compiler-options -fPIC,-shared -link /home/user/toolchain/kernels-empty/bench_cudp_U11PSy.cu -I$LIB_PATH/include -o /home/user/toolchain/kernels-empty/bench_cudp_U11PSy.o"], ["MODULE_VERSION_STACK=3.2.8", ... ]) = 0

Now the failing one:

[pid 17398] execve("/bin/sh", ["sh", "-c", 0x14595af0], <list of vars>) = -1 EFAULT (Bad address)

Here <list of vars> is identical. It seems its not the list of environment variables that cause the bad address.
As Chris Dodd mentioned the 3rd argument to execve is the raw pointer 0x14595af0, which strace thinks (and the kernel agrees) is invalid. strace does not recognize it as a string (so it prints the hex value and not the string).

Edit:

I inserted print out of the pointer value cmd to see what’s the value of this pointer in the parent process:

  string gen = jit_command.str();
  cout << gen << endl;
  char* cmd = new(nothrow) char[gen.size()+1];
  if (!cmd) __error_exit("no memory for jitter command");
  strcpy(cmd,gen.c_str());
  cout << "cmd = " << (void*)cmd << endl;
  int ret;
  if (ret=system(cmd)) {
    cout << "failed cmd = " << (void*)cmd << endl;
    cout << "Problems calling nvcc jitter: ";

Output (for the failing call):

cmd = 0x14595af0
failed cmd = 0x14595af0
Problems calling nvcc jitter: exited, status=127
Checking shell.. ok!

Its the same pointer value as the 3rd argument from strace. (I updated the strace output above).

Regards the 32bit looking of the cmd pointer: I checked the value of the cmd pointer for a succeeding call. Can’t see any difference in structure. That’s one of the values of cmd when then system call succeeds:

cmd = 0x145d4f20

So, before the system call the pointer is valid. As the strace output from above suggests the child process (after calling fork) receives the correct pointer value. But, for some reason, the pointer value is marked invalid in the child process.

Right now we think its either:

  • libc/kernel bug
  • hardware problem

Edit:

Meanwhile let me post a workaround. Its so silly to be forced to implement something like that… but it works. So the following code block gets executed in case the system call fails. It allocates new command strings and retries until it succeeds (well not indefinitely).

    list<char*> listPtr;
    int maxtry=1000;
    do{
      char* tmp = new(nothrow) char[gen.size()+1];
      if (!tmp) __error_exit("no memory for jitter command");
      strcpy(tmp,gen.c_str());
      listPtr.push_back( tmp );
    } while ((ret=system(listPtr.back())) && (--maxtry>0));

    while(listPtr.size()) {
      delete[] listPtr.back();
      listPtr.pop_back();
    }

Edit:

I just saw that this workaround in one particular run did not work. It went the whole way, 1000 attempts, all with newly allocated cmd command strings. All 1000 failed.
Not only this. I tried on a different Linux host (same Linux/software configuration tho).

Taking this into account one would maybe exclude a hardware problem. (Must be on 2 physically different hosts then). Remains a kernel bug ??

Edit:

torek, i will try and install a modified system call. Give me some time for that.

What Is Exit Code 127? 

Exit Code 127 is a standard error message that originates from Unix or Linux-based systems, and is commonly seen in Kubernetes environments. This exit code is a part of the system’s way of communicating that a particular command it tried to execute could not be found. It’s essentially the system’s way of saying, “I tried to run this, but I couldn’t find what you were asking me to run.”

Exit codes are a typical way for computer systems to convey the status of a process or command, with zero generally representing success and any non-zero value indicating some sort of error or issue. Exit Code 127, specifically, is part of a range of codes (126 to 165) used to signal specific runtime errors in Linux/Unix-based systems.

Common Causes of Exit Code 127 in Kubernetes 

While the generic explanation for Exit Code 127 is that a command could not be found, in a complex environment like Kubernetes, there could be several underlying reasons for this issue:

Incorrect Entrypoint or Command in Dockerfile or Pod Specification

One of the most common reasons for Exit Code 127 in Kubernetes is an incorrect entrypoint or command in the Dockerfile or pod specification. This error occurs when you specify a command or entrypoint that doesn’t exist or is not executable within the container.

For instance, you may have specified a shell script as your entrypoint, but the script doesn’t exist at the specified location within the container. Alternatively, the script might exist but is not marked as executable. Both these scenarios would lead to Exit Code 127.

Missing Dependencies in the Container

Another frequent cause of Exit Code 127 is missing dependencies within the container. This issue occurs when your application or script depends on certain libraries or software that are not available in the container.

For example, your application might be written in Python and require certain Python packages to run. If these packages are not installed in the container, the Python interpreter would be unable to find them, leading to Exit Code 127.

Wrong Image Tag or Corrupt Image

It’s also possible to encounter Exit Code 127 if you’re using a wrong image tag or a corrupt image. Kubernetes uses Docker images to create containers for your pods. If the image specified in the pod specification is incorrect or corrupt, Kubernetes would be unable to create the container and execute the commands, resulting in Exit Code 127.

Issues with Volume Mounts

Lastly, issues with volume mounts can also lead to Exit Code 127. In Kubernetes, you can mount volumes to containers in your pods to provide persistent storage or to share data between containers. If there’s a problem with these mounts, such as a wrong mount path or permissions issue, it could prevent your application from accessing required files or directories, leading to Exit Code 127.

How to Diagnose Exit Code 127 in Kubernetes 

Now that we’ve explored the common causes of Exit Code 127, let’s look at how to diagnose this error when it occurs.

Check the Kubernetes Pod Logs

One of the first steps in diagnosing Exit Code 127 is checking the logs for the affected pod. This can be done using the kubectl logs command, which displays the logs for a specific pod. These logs often contain valuable information about what went wrong, including error messages from your application or script.

Inspect the Pod Description

Another useful step in diagnosing Exit Code 127 is inspecting the description of the affected pod. This can be done using the kubectl describe pod command, which provides detailed information about the pod, including its current state, recent events, and any error messages.

Verify the Dockerfile and Build Process

It’s crucial to verify your Dockerfile and build process when diagnosing Exit Code 127. This involves checking your Dockerfile for any mistakes or omissions, and ensuring that your build process correctly builds the Docker image and pushes it to the image registry.

Check the Pod’s Configuration

It is important to examine the pod’s configuration. This involves a thorough check on the pod’s specifications, such as the command line arguments and environment variables, to ensure they are correctly defined. The configuration could be wrongly set, thus leading to Exit Code 127. A detailed examination of the pod’s logs could shed more light on why the application inside the pod is not executing as expected.

Test the Docker Image Locally

The next step is to test the Docker image locally. This is a crucial step as it helps you verify if the problem lies in the image itself. By running the image on your local machine, you can determine whether the application starts as expected. If it doesn’t, then there’s a good chance that the image is the problem.

Check Volumes and ConfigMaps

Another essential check involves examining volumes and ConfigMaps. These are critical components of your container runtime environment and can cause Exit Code 127 if not correctly configured. This check involves ensuring that the volumes are correctly mounted and that the ConfigMaps are properly defined and accessible.

Related content: Read our guide to exit code 137

How to Fix Exit Code 127 

Once you’ve diagnosed the root cause of Exit Code 127, the next step is to fix it. This section details practical steps on how to address this error.

Correct the Entrypoint or Command in Dockerfile or Pod Specification

The first fix involves correcting the entrypoint or command in your Dockerfile or pod specification. This step involves ensuring that the application’s binary path is correctly specified and that the commands are executable.

For example, if you are using a Dockerfile and your ENTRYPOINT instruction looks like this:

ENTRYPOINT ["./app"]

And if ./app isn’t a valid command within your container, you will have to correct it to the right path. It might be that your application binary is in /app/bin directory and the corrected ENTRYPOINT would look like:

ENTRYPOINT ["/app/bin/app"]

Add Missing Dependencies in the Container

Exit Code 127 could also be a result of missing dependencies in the container. If this is the case, the solution is to add the missing dependencies in the container image or pod specification. 

Suppose your Python application requires the requests library, you would ensure it is installed during image build by adding this in your Dockerfile:

RUN pip install requests

Correct the Docker Image Tag or Build a New Image

Another solution could be correcting the Docker image tag or building a new image. If the image is the problem, correcting the image tag could resolve the issue. Alternatively, building a new image often helps, especially if the initial image was faulty.

To correct the Docker image tag in the Kubernetes pod specification, you might update from:

spec:
  containers:
  - name: my-app
    image: my-app:v0.1

To:

spec:
  containers:
  - name: my-app
    image: my-app:v0.2

To build a new Docker image, you can use this command.

docker build -t my-app:v0.3 .

Resolve Volume Mount Issues

If the problem lies in the volumes, resolving mount issues could fix Exit Code 127. This involves ensuring that the volumes are correctly mounted and that they are accessible by the container. This could involve changing the mount path or adjusting the permissions.

For example, you could change the volume mount path from:

volumes:
- name: app-volume
  hostPath:
    path: /data/my-app
containers:
- name: my-app
  image: my-app:v0.2
  volumeMounts:
  - mountPath: /app
    name: app-volume

To:

volumes:
- name: app-volume
  hostPath:
    path: /data/my-app
containers:
- name: my-app
  image: my-app:v0.2
  volumeMounts:
  - mountPath: /data
    name: app-volume

Use an Init Container

Lastly, using an init container could also resolve Exit Code 127. Init containers are specialized containers that run before application containers and can be used to set up the environment for your application container. This could involve installing necessary software, setting up configuration files, or doing anything else that’s necessary to prepare the environment for your application.

Here’s an example of an init container that creates a necessary directory:

spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'mkdir -p /app/data']
  containers:
  - name: my-app
    image: my-app:v0.2

Note: initContainers was introduced in Kubernetes version 1.8. If you are running an earlier version, you might receive a strict decoding error. To resolve this, update your Kubernetes version.

Kubernetes Troubleshooting With Komodor

Kubernetes is a complex system, and often, something will go wrong, simply because it can. In situations like this, you’ll likely begin the troubleshooting process by reverting to some of the above kubectl commands to try and determine the root cause. This process, however, can often run out of hand and turn into a stressful, ineffective, and time-consuming task.

This is the reason why we created Komodor, a tool that helps dev and ops teams stop wasting their precious time looking for needles in (hay)stacks every time things go wrong.

Acting as a single source of truth (SSOT) for all of your k8s troubleshooting needs, Komodor offers:

  • Change intelligence: Every issue is a result of a change. Within seconds we can help you understand exactly who did what and when.
  • In-depth visibility: A complete activity timeline, showing all code and config changes, deployments, alerts, code diffs, pod logs and etc. All within one pane of glass with easy drill-down options.
  • Insights into service dependencies: An easy way to understand cross-service changes and visualize their ripple effects across your entire system.
  • Seamless notifications: Direct integration with your existing communication channels (e.g., Slack) so you’ll have all the information you need, when you need it.

If you are interested in checking out Komodor, use this link to sign up for a Free Trial.

As a data scientist or software engineer you may have encountered the Subprocess Failed with Code 127 error while working with Python and Hadoop This error can be frustrating as it can halt your progress in executing scripts or running Hadoop jobs In this blog post we will explain what this error means and provide a stepbystep guide on how to fix it

As a data scientist or software engineer, you may have encountered the “Subprocess Failed with Code 127” error while working with Python and Hadoop. This error can be frustrating as it can halt your progress in executing scripts or running Hadoop jobs. In this blog post, we will explain what this error means and provide a step-by-step guide on how to fix it.

Understanding the Error

The “Subprocess Failed with Code 127” error is a common error that occurs when a subprocess, which is a child process launched by a parent process, fails to execute. This error can occur for several reasons, such as incorrect command-line arguments, missing dependencies, or problems with environment variables.

In the context of Python and Hadoop, this error usually occurs when a subprocess is launched to execute a Hadoop job or run a Python script that calls a Hadoop command. If the subprocess fails to launch, it results in the “Subprocess Failed with Code 127” error.

Step-by-Step Guide to Fix the Error

To fix the “Subprocess Failed with Code 127” error, follow these steps:

Step 1: Check Command-Line Arguments

The first thing you should check is the command-line arguments you used to launch the subprocess. Make sure that the command-line arguments are correctly formatted and that all required arguments are included. For example, if you are trying to run a Hadoop job, ensure that you have included the correct input and output paths, as well as the correct mapper and reducer classes.

Step 2: Check for Missing Dependencies

The next step is to check for missing dependencies. Make sure that all required dependencies are installed, and that they are accessible from the location where the subprocess is launched. For example, if you are running a Python script that calls a Hadoop command, ensure that all required Hadoop libraries are installed and accessible.

Step 3: Check Environment Variables

The third step is to check the environment variables. Make sure that all required environment variables are set correctly. For example, if you are running a Hadoop job, ensure that the HADOOP_HOME and HADOOP_CONF_DIR environment variables are set correctly.

Step 4: Check File Permissions

The fourth step is to check file permissions. Ensure that the user running the subprocess has the necessary permissions to access all required files and directories. For example, if you are running a Hadoop job, ensure that the user running the job has read and write permissions for the input and output directories.

Step 5: Check for Syntax Errors

The fifth step is to check for syntax errors in your code. Make sure that your code is correctly formatted and that there are no syntax errors. Syntax errors can cause the subprocess to fail and result in the “Subprocess Failed with Code 127” error.

Step 6: Check for Resource Constraints

The last step is to check for resource constraints. If your subprocess requires a lot of resources, such as memory or CPU, make sure that your system has enough resources available. If your system is running low on resources, it can cause the subprocess to fail and result in the “Subprocess Failed with Code 127” error.

Conclusion

In conclusion, the “Subprocess Failed with Code 127” error is a common error that occurs when a subprocess fails to execute. To fix this error, you should check command-line arguments, missing dependencies, environment variables, file permissions, syntax errors, and resource constraints. By following the step-by-step guide provided in this blog post, you should be able to fix the error and continue with your work in Python and Hadoop.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.

Terminals are very powerful, regardless of what operating system you’re using. This means that while you can perform just about any OS task with a few commands, you need to know these commands to take advantage of them. Of course, different operating systems use different languages in their terminals as well, meaning you need to decide to have mastery over one or use multiple.

In this article, we’re talking about Make error 127, its causes and what you can do to fix the problem.


What causes Make error 127?

The Make error 127 generally occurs when you’re working with a Go operator-sdk project cloned from Github as the command doesn’t generate the bin directory with the required files to run the Make installation. 

As for code 127, it simply indicates a “command not found” error. Since the initial command hasn’t generated the files required, the Make installation fails as the system can’t find the source files to run or read the command.

Also read: How to fix ‘wget command not found’ issue in Bash?


Depending on what’s causing your error, here are two fixes you can try out.

Reinstall a different version of the Operator-SDK

First up, if the problem is being caused by operator-sdk, we can try either rolling back to version 18.0.0 or pushing forward to the latest version available (assuming the latest version has fixed the bug. 

How Facebook's AI is trying to assist suicide prevention

All you have to do is uninstall operator-sdk using brew or apt-get or whatever package manager you’re using. Once that’s done, we reinstall a different version (either version 18.0.0 or the latest one) as well as the latest version of Go. Keep in mind that if you’re using version 18.0.0 of operator-sdk, we recommend installing version 1.17.6 of Go. 

Finally, check if the bin folder has now been created and if it has, you can go ahead and run your Make installation without any errors. 


Check your PATH variables

Since error 127 also indicates that a command or file required to run a command is missing, try checking your PATH variable to see if the command exists there. Alternatively, a simpler way of doing this is opening a terminal in the root directory of whatever command you want to run and then run the problematic command. 

Also read: Fix: Error:03000086:digital envelope routines::initialization error

Error 127 in Bourne shell refers to a command not existing, you do not have python installed most likely, or it’s not in PATH.

Also you have a space in print_menu‘s definition.

Re-install python, fix the error and try again.

You can reinstall by downloading the installer from Python’s website for windows/in general, on linux (ubuntu/debian) you can run

$ sudo apt-get install --reinstall python

Remember to also make sure that Python is added to PATH, after installing and rebooting, run:

$ python --version

If it shows the version it should work, otherwise it was not installed or not added to PATH

Note, the first code shown is not possible to be ran because it has incorrect indentation, and were it fixed it’d have infinite recursion, the correct form of script you most likely desire is the last one shown with the underscore correctly placed.

Понравилась статья? Поделить с друзьями:
  • Ошибка code 119
  • Ошибка cocmainwin32 exe
  • Ошибка coapi 1000 fehler
  • Ошибка cms honda
  • Ошибка cmos это