YARN Container cleanup kill其它进程导致的NodeManager 挂起


一、现象

在我最近的升级过程中,经常发现一些NodeManager无关挂起,并且挂起前没有任何日志,查看dmesg,也没有任何异常。对于这种情况,非常难查原因,经过同事排查,最后确定是由于Yarn Container的cleanup导致的bug。


二、原因及解决方法

这个问题的jira号是YARN-3678,这个问题产生的原因是当container执行结束后会通过状态机执行cleanup的操作,实现的类是ContainersLauncher.java。cleanup的逻辑如下图:
Alt
1. 首先kill SIGTERM pid,让container能够优雅的退出
2. 随后kill SIGKILL pid,直接kill -9
3. 这时候可能会产生一些问题,如果在这250ms之内这个container已经退出,同时这个pid被分配给其它线程使用了,这时候kill掉新启动的线程,如果是同一个用户启动的话就可能kill掉该线程对应的整个进程。
说的极端一点,如果kill的是一个NodeManager新启动的线程,就会造成NodeManager挂起,这就是产生的原因。
但是这个现象产生需要一定的条件,对于Linux Container Executor,如果使用不同的用户去启动,那么即使kill掉这个pid,也不会被杀。对于Default Container Executor,则会出现这一问题。
为什么没有使用不同用户启动container的原因是你需要将所有用户的账号同步的集群中的所有机器中,这对于我们是不现实的。
为此,我们需要修改代码,修改方法也很简单,在kill -9之前ps一下这个进程的pid,查看一下是否是之前执行的containerId,就可以了,具体代码在github

Using ssmtp to send gmail on linux server

using ssmtp to send gmail on linux server


Sometimes, we want to send email on linux server to alert some event for purpose. We can fake the sender’s email address, and send. But unfortunately, most of the email server will treat these emails as spam, that is not very convenient. So we want to use our username and password to send email through gmail. Here is some step to configure and use ssmpt to do it.


My linux distribution is centos , it is okay if you use ubuntu, just use apt instead yum.

1. #yum install ssmtp
2. #vi /etc/ssmtp/ssmtp.conf     //edit configuration
 
Here is the setting you should add  
AuthUser=YOURNAME@gmail.com
AuthPass=YOURPASSWORD
FromLineOverride=YES
mailhub=smtp.gmail.com:587
UseSTARTTLS=YES
Hostname=gmail.com
TLS_CA_File=/etc/pki/tls/certs/ca-bundle.crt

Beware you have to add TLS_CA_File in the setting, if not ,you will get Cannot open smtp.gmail.com:587 Error.
After that, you can test your setting,
echo “test” | ssmtp -vvv TESTEMAIL@ADDRESS
If everything goes well, congratulations, you success. If not, check /var/log/maillog, i think most of the error is “Authorization failed (534 5.7.14 https://support.google.com/mail/answer/78754 uy4sm4234351pbc.69 – gsmtp)”.
The problem is caused by google security policy. You can resolve it as the following
1.Google will send you a email to remainder you a Sign-in attempt prevented event, login your google account, and permit the login from your server
2.then go to this https://www.google.com/settings/security/lesssecureapps and set “Access for less secure apps” to ON
You can test it using the command mentioned before. If you still can not send email, check /var/log/maillog and google the answer yourself.