一次扩容引发的ARP Cache问题

当某个EMR客户进行扩容时,机器接近上千台时,造成了网络通信问题,甚至有机器ping自己都平不通的情况。根据提示,原因是对于大集群来说,默认的centos arp cache配置不适合,需要调整相关参数。
现象是在给一个大客户进行扩容的时候,当机器接近千台的时候,NameNode主节点突然间通信变慢,请求堆积,最后zkfc直接将NameNode进行了failover了,另一台NameNode也是不定期的failover。
查看dmesg,发现了大量的异常日志:

[76391312.109413] net_ratelimit: 97 callbacks suppressed
[76391319.885189] net_ratelimit: 37 callbacks suppressed
[76391325.104167] net_ratelimit: 62 callbacks suppressed
[76391330.508496] net_ratelimit: 60 callbacks suppressed
[76391335.694525] net_ratelimit: 50 callbacks suppressed
[76391343.815606] net_ratelimit: 108 callbacks suppressed

dmesg报错

 

另外,NameNode的gmond metrics 收集也一直报错,无法发送metrics。对于gmond的metrics发送,其实对网络压力很小,如果依然无法发送,说明网络出现了较严重问题。
经过跟其他部门、兄弟团队合作,发现了是由于ARP Cache overflow造成的问题,从而严重的影响了网络性能。
ARP Cache的作用为,ARP表存储了IP地址和MAC地址的映射关系,ARP Cache有以下几个参数:
net.ipv4.neigh.default.gc_thresh1 ARP表小于该数值的时候不做垃圾回收
net.ipv4.neigh.default.gc_thresh2 ARP表大于该数值时,5s内进行垃圾回收
net.ipv4.neigh.default.gc_thresh3 ARP表的最大限额

再从我们系统中取得默认值发现:
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

默认配置偏小,导致了集群机器超过1000台后,网络丢包,不稳定现象,修改相关配置。

追加 /etc/sysctl.conf
net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 10240
net.nf_conntrack_max = 524288

sysctl -p 更新配置

gperftool安装及使用说明

安装方法
1.从github上下载代码到服务器上
2../autogen.sh
需要提前安装autoconf,libtool,gcc-c++,libunwind。先安装libunwind,下载源码包:

./configure && make && make install
yum install -y autoconf libtool gcc-c++

使用方法

export LD_PRELOAD=/usr/lib/libtcmalloc.so:/usr/lib/libprofiler.so
CPUPROFILE=/tmp/cpu java ****
HEAPPROFILE=/tmp/heap java ****

查看结果:
pprof --text /bin/java /tmp/cpu
pprof --text /bin/java /tmp/heap

Linux 各种包download only

一、pip download

pip donwload package

二、yum downloadonly

首先安装包downloadonly包

yum install yum-plugin-downloadonly

使用方法:

yum install --downloadonly --downloaddir=<directory> <package>

网卡配置惊魂记

这是我近期遇到的一个集群的问题,涉及到的主要是硬件的配置导致的软件层问题。起初发现这个问题是由于我们开发的基于YARN的实时流系统某些机器的tps非常低,一开始并没有想到是硬件的问题,所以一直在排查实时框架的问题。最后发现连ping值都很低,才想到是OS配置相关的问题。
这是ping的延迟:

64 bytes from 10.39.****: icmp_seq=34 ttl=63 time=27.4 ms
64 bytes from 10.39.****: icmp_seq=35 ttl=63 time=21.2 ms
64 bytes from 10.39.****: icmp_seq=36 ttl=63 time=5.95 ms
64 bytes from 10.39.****: icmp_seq=37 ttl=63 time=16.5 ms
64 bytes from 10.39.****: icmp_seq=38 ttl=63 time=12.3 ms

底下是正常的ping的延迟

64 bytes from 10.39.****: icmp_seq=13 ttl=63 time=0.230 ms
64 bytes from 10.39.****: icmp_seq=14 ttl=63 time=0.203 ms
64 bytes from 10.39.****: icmp_seq=15 ttl=63 time=0.255 ms
64 bytes from 10.39.****: icmp_seq=16 ttl=63 time=0.229 ms

推测就是网卡的问题,我们是两个千兆网卡bonding,查看/proc/interrupt发现每个网卡只绑到了一个core上,这个core一直跑满,并且丢包,正常的网卡配置会使用多队列,将不同队列亲和到不同的core上,提高tps等。相关的资料可以参见这篇文章:网卡多队列简介.
由于实时系统发送的大多是小包,所以才会有这么大的影响,解决方法比较坑爹,配置好多队列后需要重启服务器。

linux no login user heap dump

有时候我们使用no login用户启动java进程,当需要进行heap dump等操作的时候需要使用以下命令进行。

sudo su - user -c "jmap -dump:format=b,file=dump.bin pid" -s /bin/bash

Using ssmtp to send gmail on linux server

using ssmtp to send gmail on linux server


Sometimes, we want to send email on linux server to alert some event for purpose. We can fake the sender’s email address, and send. But unfortunately, most of the email server will treat these emails as spam, that is not very convenient. So we want to use our username and password to send email through gmail. Here is some step to configure and use ssmpt to do it.


My linux distribution is centos , it is okay if you use ubuntu, just use apt instead yum.

1. #yum install ssmtp
2. #vi /etc/ssmtp/ssmtp.conf     //edit configuration
 
Here is the setting you should add  
AuthUser=YOURNAME@gmail.com
AuthPass=YOURPASSWORD
FromLineOverride=YES
mailhub=smtp.gmail.com:587
UseSTARTTLS=YES
Hostname=gmail.com
TLS_CA_File=/etc/pki/tls/certs/ca-bundle.crt

Beware you have to add TLS_CA_File in the setting, if not ,you will get Cannot open smtp.gmail.com:587 Error.
After that, you can test your setting,
echo “test” | ssmtp -vvv TESTEMAIL@ADDRESS
If everything goes well, congratulations, you success. If not, check /var/log/maillog, i think most of the error is “Authorization failed (534 5.7.14 https://support.google.com/mail/answer/78754 uy4sm4234351pbc.69 – gsmtp)”.
The problem is caused by google security policy. You can resolve it as the following
1.Google will send you a email to remainder you a Sign-in attempt prevented event, login your google account, and permit the login from your server
2.then go to this https://www.google.com/settings/security/lesssecureapps and set “Access for less secure apps” to ON
You can test it using the command mentioned before. If you still can not send email, check /var/log/maillog and google the answer yourself.

kipmi0 导致cpu 100%问题

    最近在做测试的时候,观察ganglia发现几台机器即使没有任务的时候load值依然不低,到机器上top后发现kipmi0这个进程一直占满一个核,非常恶心。google了一下,下面是kipmi0的一些说明:

The kipmi0 process may show increased CPU utilization in Linux. The utilization may increase up to 100% when the IPMI (Intelligent Platform Management Interface) device, such as a BMC (Baseboard Management Controller) or IMM (Integrated Management Controller) is busy or non-responsive.

Fix

No fix required. You should ignore increased CPU utilization as it has no impact on actual system performance.

看着不顺眼的简单处理就是

echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us
更新:
问了下公司的ops,解决方法为,请各位自己评估:

这个问题最近我们也有遇到,在高版本的redhat/centos系统中(比如6.4/6.5),Redhat官网也有说明,是由于操作系统的驱动与BMC交互出现了问题,导致kipmi0进程占用CPU 100%,并且ipmitool操作无响应。

In this case, there is a problem in the interaction between the driver and the hardware/firmware which leads the driver to believe that an operation is still in progress, causing the high CPU load to continue until the system is rebooted.

kipmi0进程优先级是非常低的,当有系统应用需要CPU资源时,kipmi0会释放资源。

 

对于目前的情况,临时措施可以尝试使用命令hot plug驱动来恢复:

echo “remove,” > /sys/modules/ipmi_si/parameters/hotmod

echo “add,” > /sys/modules/ipmi_si/parameters/hotmod.

 

最终解决方案:本次提供的firmware中加入了相应的解决方案,升级后可以解决。ps:需要先恢复后才可以升级。https://access.redhat.com/solutions/21322

logrotate问题

     今天为vps配置反攻击策略,将所有登陆信息扫描,凡是通过密码一天登陆超过5次的直接拒绝访问。最后使用了logrotate工具,详细信息可以参加man logrotate。说一下我遇到的问题,配置好rotate的日志后,logrotate -f -d -v …..conf 后,发现日志都是正确的,但是并没有rotate,折腾了快半个小时,才发现这是debug模式,去掉-d就可以了。

github 利用ssh进行操作

      公司太操蛋,所有https劫持,github连不上,什么都提交不了。检查了一下,可以用ssh进行操作,记录一下。

      首先,查看一下github的说明,加入ssh key所有需要的东西, 链接为 https://help.github.com/articles/using-ssh-over-the-https-port/ 。

      第二,如果一切顺利,通过ssh操作,从自己的仓库中找到ssh地址,然后就可以了,比如 git push git@github.com:jiangyu/scalaLearn.git master

VIP配置方法

    通过系统管理员找到一个VIP地址,添加虚拟网卡命令:ifconfig eth0:0 10.210.225.31 up

    关闭虚拟网卡命令 ifconfig eth0:0 down